Football Players’ Popularity Analysis through Twitter Streaming API

Inside18yard is bringing to its readers a real time football players’ popularity analysis through Twitter streaming API using PySpark, Python, NLTK and Plotly.

Rivalries in sports inspire intense emotions. Whether its Federer vs Nadal, Sachin vs Lara or Messi vs Ronaldo, everyone has an opinion. Among these rivalries, Messi vs Ronaldo is really a special one because both of them are still at the peak of their abilities and have already shattered all records. They have completely dominated football’s annual Ballon d’Or best player of the year award for the last 7 years. Football is the most popular sport in the world, played by over 190 countries, each having its national team, few having multiple divisions of domestic leagues for men as well as women. Considering this, it is undeniably an amazing feat that these two players have remained at the top for so long and still they don’t show any signs of decline.

TWEET COUNT

Lionel Messi & Cristiano Ronaldo, already legends, inspire fans around the world and thus, it will be interesting to see who has a stronger fan base. The following is a dynamic bar plot for the number of tweets which mention either Messi or Ronaldo. I have ignored the tweets which mention both because those will not count towards the difference in popularity.

Messi emerges as the clear winner as can be observed from the bar chart of tweet counts from 5 minutes of Twitter stream. I regenerated this plot at multiple times on a day when none of them was playing and the results are more or less the same, Messi coming ahead of Ronaldo though not by a huge margin.

SENTIMENT ANALYSIS

But the above graph does not depict the complete picture as the tweets could be critical to the players rather than being positive, thus, as a matter fact negatively affecting their popularity. To solve this problem, I have extracted the sentiment from the tweets and plotted a dynamic time series of the cumulative sentiment for the two players. The graph is for 5 minutes of Twitter stream and again I have ignored the tweets which mention both players.

Messi’s cumulative sentiment comes out to be higher than Ronaldo at the end of 5 minutes. For Real Madrid fans like me, it will be hard to digest but the data speaks truth. Messi is indeed more popular than Ronaldo.

Nevertheless, for Ronaldo if it helps, I love you both equally!! 😀

OVERALL BUZZ ON TWITTER

Finally, I collected the Twitter stream for 2 hours to get an idea, if these two are really the most talked about footballers. Since, I am using PySpark Accumulators for each player, I had to explicitly mention the player names I want to track. This gives a general direction where we hope to find our popularity winners.

As it turns out, Messi still has most tweets while the second place goes to Neymar ahead of Ronaldo. Though, it can always be argued that Portugal has a much smaller population than Argentina and Brazil which certainly has an affect.

player-popularity

Hope you found this article interesting. Inside18yard will be back with more posts. Keep following us and please Like / Share if it was fun!

3 comments

  1. Abhinav Pathak's avatar
    Abhinav Pathak · January 6, 2017

    I am a Messi fan and this analysis is really cool. What do you think should be the strategy to find the overall popularity?(since twitter data is only the popularity on twitter). What data from other platforms (like fb) can be analyzed?

    Like

  2. hi5sahil's avatar
    hi5sahil · January 10, 2017

    Thanks Abhinav! 🙂 The players’ popularity on Twitter might be different from their overall popularity, if there is a bias introduced by difference in attributes of Twitter users vs general population. I will also say that for Big Data use cases, we are not really looking for an exactitude in results but instead an overall direction at this scale will suffice. But you can always explore Facebook Rest/Live API, scrape data from Quora or even use YouTube comments data.

    Liked by 1 person

  3. upadhyayulahk's avatar
    upadhyayulahk · January 10, 2017

    Reblogged this on knowledgemeanspower.

    Liked by 1 person

Leave a reply to Abhinav Pathak Cancel reply