Search in this blog

Tuesday, July 26, 2016

Scrapping Facebook Groups almost in real time (FGIR)

Today I want to introduce a project I developed some years ago which I called FGIR (Facebook Groups Information Retriever).

FGIR is a project intended to scrape and organize data from Facebook groups, the data is retrieved using email notifications from Facebook, they are received incredibly fast (this is the reason why I said almost in real time).

Note: Facebook is very picky with automated data collection(https://www.facebook.com/apps/site_scraping_tos_terms.php), so be careful.

So, why to scrape Facebook?
In my location, people started to create groups intended to buy and sell things. It got really popular, I remember to be joined in a group having around 200k people (which I consider to be a lot for a small city). There were really good opportunities to buy cheap things but it was really difficult to find them, groups were messy and so many people were adding posts and comments every second.

I decided to store that valuable information in order to organize it and get the products I wanted to buy.

I turned on email notifications on Facebook and wrote a client to listen for new emails, there are two kind of emails handled:
1.- Added to a new group.
2.- There is a new post in a group.

Each email was parsed to detect which kind of event happened, the first one was easier than the second one, anyway, I stored the retrieved information in a database model.

Also, I wrote a web application to query the database model to be able to find what I wanted (which is lost and I can't share).

I uploaded the project code (FGIR) to github, I'm surprised that it still works.

I hope that it can be useful for you as it was for me when I wrote it.

Thanks for reading.