Search in this blog

Tuesday, July 26, 2016

Scrapping Facebook Groups almost in real time (FGIR)

Today I want to introduce a project I developed some years ago which I called FGIR (Facebook Groups Information Retriever).

FGIR is a project intended to scrape and organize data from Facebook groups, the data is retrieved using email notifications from Facebook, they are received incredibly fast (this is the reason why I said almost in real time).

Note: Facebook is very picky with automated data collection(https://www.facebook.com/apps/site_scraping_tos_terms.php), so be careful.

So, why to scrape Facebook?
In my location, people started to create groups intended to buy and sell things. It got really popular, I remember to be joined in a group having around 200k people (which I consider to be a lot for a small city). There were really good opportunities to buy cheap things but it was really difficult to find them, groups were messy and so many people were adding posts and comments every second.

I decided to store that valuable information in order to organize it and get the products I wanted to buy.

I turned on email notifications on Facebook and wrote a client to listen for new emails, there are two kind of emails handled:
1.- Added to a new group.
2.- There is a new post in a group.

Each email was parsed to detect which kind of event happened, the first one was easier than the second one, anyway, I stored the retrieved information in a database model.

Also, I wrote a web application to query the database model to be able to find what I wanted (which is lost and I can't share).

I uploaded the project code (FGIR) to github, I'm surprised that it still works.

I hope that it can be useful for you as it was for me when I wrote it.

Thanks for reading.

Wednesday, May 4, 2016

Tuenti Challenge 6 Solutions

Tuenti Challenge 6 has finished, I liked it more than the previous one (which I liked it a lot).

I didn't solved all questions but half of them, here is are brief details for them.

Here is the repository containing all the codes: https://github.com/AlexITC/tuenti-challenge-6

I will update this post if I solve another task, I hope you enjoyed the contest as much as I did.


Challenge 1 - Team Lunch:
You are given an integer N representing the number of dinners, you have to output the minimum number of tables required where all dinners can sit together.

One table can have up to 4 dinners, adding one table can have up to 6, so, each time we add a table, we can sit 2 extra dinners, as N is small, we can try every possible value using linear search or binary search the answer, there is a direct formula too.

Notice that N can be 0, so we need to handle this corner case.

Binary search code: https://github.com/AlexITC/tuenti-challenge-6/blob/master/01/TeamLunch.java


Challenge 2 - The Voynich Manuscript:
You are given a list of words and need to answer many queries, each query contains two integers L and R, and you have to tell what are the three most common words in the interval [L, R] (from the list of words).

The list of words if fixed and the number of distinct words is not so high, we can store the indices where the word occurs (for each distinct word), a query can be answered using binary search on each work having O(M * logN) time complexity (M is the number of distinct words and N the total number of times a words occurs).

Implementation: https://github.com/AlexITC/tuenti-challenge-6/blob/master/02/TheVoynichManuscript.java


Challenge 3 - YATM Microservice:
You are given something that seems a code in a unusual programming language, the idea is to understand how it works and print the required output.

There are two ways to solve this problem:
1. Program an parser and a state machine to process all the code:
This would leads to a time consuming and a little bit boring task. Also is easy to made mistakes misunderstanding something in the logic.

2. Notice that the input is a well know format called YAML, the input represents a hash table, there are a lot of parsers for this format, having this, is really easy to code a simple state machine.

I didn't notice the format and coded the first idea, here is my long java code: https://github.com/AlexITC/tuenti-challenge-6/blob/master/03/Microservice.java


Challenge 4 - Hadouken!:
You have a fixed list with 5 strings and you are given a string S, you need to count how many times a string would match as a sub string in S if the last character is missing, in other words, count the number of partial matching requiring to change the last character to get a full matching, with the condition that you will try to match the longer string first.

The statement is confusing but you can throw the two longer strings from the fixed list and count the other 3 only, the strings are small, so brute force will work.

Implementation: https://github.com/AlexITC/tuenti-challenge-6/blob/master/04/Hadouken.java


Challenge 5 - Hangman
This was an unusual but interesting problem, indeed it was one of my favorites, you are given an IP address, a list of words and a Hangman picture.

The given address uses Telnet protocol, connecting to it gives you an interactive hangman game, typing a letter updates the interface setting the positions where the letter occurs in the word or setting an error, you are required to complete the level before the time expires, completing a level move you to the next one having less time and a longer word, the idea is to write a program to play the game fast enough to complete all levels. One important thing is that the word to guess is in the given dictionary.

My strategy to guess a word is as follows:
1. Keep in a list all the words in the dictionary having the required length of the word to guess.
2. Find the letter occurring the most times in our list.
3. If the letter was correct, find the positions where the letter occurs and filter the list with the words containing that letter in these positions only (the list is reduced), continue in 2.
4. If the letter is incorrect, filter the list with the words not containing this letter, continue in 2.

I avoid to count letters already found to not repeat the same letter twice (in step 2).

My program is semi-automatic, it require me to press enter each time a level is completed, I used this to have time to copy the keys manually.

Here is my ugly implementation: https://github.com/AlexITC/tuenti-challenge-6/blob/master/05/Hangman.java


Challenge 6 - Through the Looking-Glass
This was an interesting but frustrating task, you are given something that seems a QR image with a lot of noise.

I cleaned the image (with so many effort): https://github.com/AlexITC/tuenti-challenge-6/blob/master/06/result.png

The QR contains this URL: https://en.wikipedia.org/wiki/Piet_Mondrian

The interesting thing is that there is an Esoteric Programming Language called Piet, the noisy image seems to use the colors required by Piet, there are some interpreters online which didn't worked for me, you can try it yourself, or you can download and compile an interpreter.

Note than the image should be mirrored before giving it to the interpreter.

Also, for more information on Piet Mondrian, check out his page on Artsy: https://www.artsy.net/artist/piet-mondrian


Challenge 7 - Tiles
You are given a matrix of integers representing a tile in an infinite board, you are required to find a sub rectangle in this infinite board having the maximum possible sum or say that sum if infinite.

The sum is infinite if the sum in a row or the sum in a column in the matrix is greater than zero, this is true because the row or column can be used indefinitely and the sum will increase every time.

Assuming the sum is not infinite means that the rectangle is contained in the matrix or is a cyclic rectangle (a suffix + a prefix), we can copy the matrix three times to form a bigger matrix and find the maximum sub rectangle sum (which is a well known problem), this can be done with O(N^3) time complexity extending Kadane's algorithm.

Implementation: https://github.com/AlexITC/tuenti-challenge-6/blob/master/07/Tiles.java


Challenge 8 - Labyrinth
This was another unusual but interesting problem which I enjoyed solving it, you are given an IP address allowing socket connections, it shows you a partial maze with 7x7 tiles, and your current position, you can traverse the maze using the letters 'u', 'd', 'l', 'r', each time you move, the partial maze is scrolled revealing new tiles.

In this challenge you need to get the key which is in the borders of the maze, so you have to explore the whole maze, to make the things more interesting, once a connection is started, it will be expired after 13 minutes.

I used two algorithms, one going always to the left and the other going to the right, after each move I save the new tiles, each algorithm discovered more than half of the maze, after storing the results in a file I merged them into a single one to recover the key.

Implementation: https://github.com/AlexITC/tuenti-challenge-6/blob/master/08/Labyrinth.java


Challenge 9 - 0-1 Immiscible numbers
You are given an integer N, and you have to find the smallest multiple of N composed by ones and zeros only, the ones should occur before zeros.

There is an interesting property I found which I can't proof: any integer X not divisible by 2 or 5 have a multiple composed of 1's only, this multiple is the smallest for our problem.

Another property: Any integer of the form 2^N or 5^N or have a multiple containing exactly N 0's.

The second property tend to be really useful, for any integer X the number of zeros is max(twos, fives), where X is divisible by 2^twos and 5^fives.

We can count the number of zeros required and reduce the number removing prime factors 2 and 5, counting ones can be done with a linear time algorithm, this is quite slow for bit cases but is enough to get the output required.

Implementation: https://github.com/AlexITC/tuenti-challenge-6/blob/master/09/ImmiscibleNumbers.java


Challenge 12 - Pika Virus
You are given two trees and have to check they have the same topological structure, in order words, check is they are isomorphic, there are not so hard algorithms to check this in polynomial time, the constraints are small so we can try a simpler and slower approach.

If the trees are isomorphic we are required to match each node from a tree with the one lexicographically smallest node from the other tree, and the ancestors doesn't need to match. This crazy condition made me to loss several hours trying to figure out why my output was not the expected one, and as I saw in twitter, my case was not the only one.

Implementation: https://github.com/AlexITC/tuenti-challenge-6/blob/master/12/PikaVirus.java


Challenge 14 - SpeedRunner
You are given a matrix representing a board game, there are some rules defined about the movements. In brief, you are standing in a tile and want to exit, before exit you need to grab all the keys, print the minimum amount of frames to exit breaking times with keys pressed, breaking ties with the lexicographically smallest string of keys pressed.

The maximum number of keys is 16, so we can use dynamic programming with bit masks to find the optimal order to grab the keys (see Traveling Salesman Problem), we need a function to compute the optimal path from a tile S to a tile T, removing the lexicographically constraint, a simple Dijkstra's algorithm will work.

We can compute the optimal route from any tile to the tile T (reversing the graph leads to a single Dijkstra), after this we start at S reconstructing the optimal path to reach T, in each step we choose a tile being in one of the optimal paths and keep the lexicographically smallest.

You need to take care implementing this because the code is long and can be tricky.

Implementation: https://github.com/AlexITC/tuenti-challenge-6/blob/master/14/SpeedRunner.java