Site prototype (TwitterAPI + SOLR)

Problem / Goal
Our client was working on a prototype for her new startup – a website with Tweets and RSS messages about sport. Unfortunately, quite a complex architecture (including SOLR, PostgreSQL and several Python services for downloading and processing data) and large amounts of data quickly resulted in serious performance problems. The client also wanted to have the messages marked as being positive or negative. Another requirement was a semi-automatically generated base of sport-related articles for SEO purposes.
First of all, our software architect analyzed the program looking for weaknesses and areas to simplify, as well as detecting bottlenecks. He introduced mechanisms for better data management – archiving stale and aggregating up-to-date messages.

We used WordPress for a base of articles. It is an easy-to-use tool for non-technical people, so that anyone could edit the content. We created a plugin to fulfill our customer’s specification – to collect and regularly update data from other sites (e.g. pictures of sportspeople and meta-data of their Twitter accounts).

Solr is a perfect solution for performing full text searches in large quantities of data. In this case it was used for the analysis of hundreds of millions of tweets and the assignment of a proper category based on the sports discipline, participating teams and score. Additionally, a custom module was designed to extract emotions from the tweets and discover if these are positive or negative. In other words, we created a sophisticated pool mechanism which automatically gathers and analyses data from Twitter.

The prototype’s performance was increased more than enough for satisfactory user experience. Additional value for the application was information about the message authors’ emotions. The customer also got a handy tool for developing her SEO pages.


Robert Pelczarski
Co-founder and Senior Developer
Tomasz Marcinek
Co-founder and Software Architect
Kamila Marcinek
Database Developer