Football News Aggregator through Web Scraping

I like to follow goal.com, givemesport.com and sportskeeda.com for football news. Though, all of the European football interests me but Spanish football is where my allegiance rests. Things are great as they stand now, but if there was just a way to gather the links to news articles about La Liga from these websites and order them by recency without actually visiting these sites. That could save me some effort and frankly, I have some time to kill these holidays.

But before we go further, I want to make it clear that the intention is not to republish articles from these websites. All I am trying to do is, just collect the links to articles about Spanish football from these sites for easy navigation. To acknowledge that these articles originally belong to the respective sites, I have made sure to append the site name to the article title if in case it was missing.

I have tried to keep the script as simple as possible. So, here is how it goes:

  • Import the necessary libraries
  • Provide the url to the home page for Spanish soccer news
  • Hit the url and fetch the response
  • Extract the HTML and identify the tags to La Liga news articles’ links
  • Collect all the links through BeautifulSoup
  • Crawl each of the collected links
  • Extract title and published date through BeautifulSoup after inspecting HTML tags
  • As long as the designs for these websites don’t change, our news aggregator will work just fine

scraping-code-snippet

The image above only shows a snippet of the code. Procedure to crawl goal.com and sportskeeda.com is more or less same, just with different tags.

Once, I appended the details of all articles in a dataframe and sorted it by recency, I got the following output:

scraping-output

I have also written the output to a text file. It serves like a mini magazine with the links to latest updates on Spanish football from my favorite three websites. Sample results are shown at the end of this blog.

This was a short post on web scraping. Inside18yard will be back with more of new and interesting stuff!

———————————————————————————————————————–

##Article 1##

The latest Antoine Griezmann to Manchester United update  | GiveMeSport

2017-01-16

http://www.givemesport.com//964054-the-latest-antoine-griezmann-to-manchester-united-update

 

##Article 2##

Heaviest defeats faced by 5 legendary managers in club football | Sportskeeda.com

2017-01-16

http://www.sportskeeda.com/football/heaviest-defeats-faced-5-legendary-managers-club-football/

 

##Article 3##

RUMOURS: Man City ready to pay £100m for Messi – Goal.com

2017-01-16

http://www.goal.com/en-us/news/88/spain/2017/01/16/31618542/rumours-man-city-ready-to-pay-100m-for-messi

 

 

 

Football Players’ Popularity Analysis through Twitter Streaming API

Inside18yard is bringing to its readers a real time football players’ popularity analysis through Twitter streaming API using PySpark, Python, NLTK and Plotly.

Rivalries in sports inspire intense emotions. Whether its Federer vs Nadal, Sachin vs Lara or Messi vs Ronaldo, everyone has an opinion. Among these rivalries, Messi vs Ronaldo is really a special one because both of them are still at the peak of their abilities and have already shattered all records. They have completely dominated football’s annual Ballon d’Or best player of the year award for the last 7 years. Football is the most popular sport in the world, played by over 190 countries, each having its national team, few having multiple divisions of domestic leagues for men as well as women. Considering this, it is undeniably an amazing feat that these two players have remained at the top for so long and still they don’t show any signs of decline.

TWEET COUNT

Lionel Messi & Cristiano Ronaldo, already legends, inspire fans around the world and thus, it will be interesting to see who has a stronger fan base. The following is a dynamic bar plot for the number of tweets which mention either Messi or Ronaldo. I have ignored the tweets which mention both because those will not count towards the difference in popularity.

Messi emerges as the clear winner as can be observed from the bar chart of tweet counts from 5 minutes of Twitter stream. I regenerated this plot at multiple times on a day when none of them was playing and the results are more or less the same, Messi coming ahead of Ronaldo though not by a huge margin.

SENTIMENT ANALYSIS

But the above graph does not depict the complete picture as the tweets could be critical to the players rather than being positive, thus, as a matter fact negatively affecting their popularity. To solve this problem, I have extracted the sentiment from the tweets and plotted a dynamic time series of the cumulative sentiment for the two players. The graph is for 5 minutes of Twitter stream and again I have ignored the tweets which mention both players.

Messi’s cumulative sentiment comes out to be higher than Ronaldo at the end of 5 minutes. For Real Madrid fans like me, it will be hard to digest but the data speaks truth. Messi is indeed more popular than Ronaldo.

Nevertheless, for Ronaldo if it helps, I love you both equally!! 😀

OVERALL BUZZ ON TWITTER

Finally, I collected the Twitter stream for 2 hours to get an idea, if these two are really the most talked about footballers. Since, I am using PySpark Accumulators for each player, I had to explicitly mention the player names I want to track. This gives a general direction where we hope to find our popularity winners.

As it turns out, Messi still has most tweets while the second place goes to Neymar ahead of Ronaldo. Though, it can always be argued that Portugal has a much smaller population than Argentina and Brazil which certainly has an affect.

player-popularity

Hope you found this article interesting. Inside18yard will be back with more posts. Keep following us and please Like / Share if it was fun!

How do top European football leagues compare against each other?

Inside18yard is back and this fall we are “going data!” in our articles. This blog in particular focuses on a comparison of top European leagues using Descriptive Statistics.

The 2015-16 season for the European football was an exciting one and even more so for the English Premier League. ‘He just did a Leicester’ has become the catch phrase after we saw, certainly, the greatest underdog story of all time in a team sport and perhaps even in all of sporting history. Leicester City in previous season, after being at the bottom of the table as late as April 2014 with just 9 matches to go and facing a very real threat of relegation somehow managed to keep their place in the Premier League. And it will be unfair to call what happened afterwards as anything less than a fairy tale.

Leicester City FC in the next season, against the 5000:1 odds given to them by bookmakers, went on to win the title against the likes of Spurs, Arsenal and the mighty Manchester Clubs. To wrap your head around the achievement one has to see that the last time any club outside Big 5 won the Premier League was in 1994-95 while there was still some economic parity between the teams.

More impressive is their away goals records. Let us look at how do the numbers tally against the winners of La Liga, Bundesliga and Serie A.

goals

The bar plots above show Leicester has been pretty consistent. Irrespective of whether they were playing at their home King Power Stadium or away, they averaged around 1.75 goals per game.

fouls

Also, they have fared well in the number of fouls committed. But this is only half the picture as the playing style in the leagues vary and the referees tend to give more fouls in Spain.

But now we move away from focusing on just winners and rather we will try to investigate how the four leagues compare to each other on an overall level. More interestingly, we wish to add a temporal dimension to the analysis by looking at the trend of key metrics like Fouls, Yellow/Red Cards, Shots on Target and Goals over the last decade for the top 4 EU leagues.

fouls_trend

There is a visibly evident difference in the fouls committed between the four leagues. Probably, teams in Serie-A because of the inherent defensive playing style associated with Italian sides tend to commit more fouls. Another interesting observation is that the number of fouls have been declining steadily over the years. Perhaps, teams in the mid 2000’s had more violent playing style or referees gave out more fouls back then. Who knows?

We cannot comment on the reason from this data. Nevertheless, an interesting observation.

The trends of total Yellow Cards and Red Cards given by referees also reveal some insights.

cards

We see despite greater number of fouls being committed in Serie A than La Liga, the referees in Italy have given out much less Yellow Cards and Red Cards than their counterparts in Spain. Similar can also be observed for EPL and Bundesliga.

Secondly, the more obvious observation is that there is a visibly evident difference in the number of Red Cards and Yellow Cards given out by referees between these countries. This does tell us something about playing style in these countries. While more tussle is acceptable in England and Germany, the Spanish and Italian referees tend to be stricter.

Another metric to comment upon the playing style in these countries is by looking at the trend of total shots on target over the last decade.

shots_trendQuite clearly the La Liga and Serie-A teams are much more clinical in front of goal than EPL teams. English game is faster and intense but players tend to misfire their shots.

Or perhaps, the second possibility is that EPL teams have better goalkeepers??

Hope, we were able to present some interesting insights from the data on how the top EU football leagues compare against each other. We will be back with more fun reads for the loyal readers of Inside18yard. Stay tuned!!

Data Source – http://www.football-data.co.uk/spainm.php

Crisis at Real, where did they take a wrong turn ?

Last weekend the world witnessed the catalan club Barcelona lift their 4th champions league trophy in the last decade. They also became the only team to complete a treble of League Title, League Cup & Champions League twice.

After a trophy less 2013-14 season with the club and a loss to Germany in World Cup final followed by CR7 stealing away the Ballon D’Or Lionel Messi finally decided he is done with the mourning period eating pizzas and to hell with Neymar’s selfies. He found his elder wand at the advent of new year and managed to far more then rescue his club’s season which was in a serious crisis in January.

The champions league final result was a disappointment to the Italians but even more so for Real Madrid. The fans losing their minds, the players losing their sleep and the management losing their hair as the club was knocked out in semi-final by an under-dog Juventus side in an aggregate 3-2 result over two legs. Moreover, two of those three goals for Juventus came from Alvaro Morata – a player who was discarded by Real last season and was sent off to Italy. It got worse as they then saw their bitter rivals win their favourite tournament while they had to be contended with Ronaldo’s Pichichi.

But Florentino Perez, the club’s president, a rationale man guided by pure logic decided the solution to club’s problems was to sack the then acting manager Carlos Ancelotti whom as a matter of fact he tried to sign thrice and who gave them their much awaited La Decima only the previous season. Also an opinion poll of the club’s fans showed that majority of them wanted Ancelotti to stay and even the players demonstrated their support for him on twitter. But Perez was able to successfully justify the club leadership’s decision of sacking the manager for those of us who were less than convinced when he spoke in a press conference, – ” What did Ancelotti do wrong? I don’t know”. The reason is probably the club’s policy of trying to give the club a new impulse, a fresh energy every few years. A successful philosophy which gave them one La Liga in 7 years.

It is difficult to stop making fun of Florentino Perez. During his two presidencies Perez has made some very controversial decisions. The most controversial and the one which hurt the club most was the sacking of Vicente Del Bosque who during his 4 years of managerial stint at Real considered to be the most successful period of the club in the modern history guided them to 7 trophies which includes two Domestic Leagues in 2001 & 2003, two UEFA Champions League in 2000 & 2002 and a place in the last four of Champions League for all the four years he was in-charge. Del Bosque, a calming presence in the dressing room who knew how to handle a team full of personalities and deliver results as also evident by his success with spanish national team, would have guided Real to many more titles if not for the whims and fancies of Perez. Another of his decision which stirred up controversy was his refusal to improve Claude Makelele contract, (one of the best defensive midfielders of that era) who was payed far less then his Galacticos teammates. So, it was not unreasonable that he wanted a hike but Perez let him leave the club and thus jeopardized team’s defense. His inability to acknowledge the importance of defenders and his Galacticos policy saw Real become into an un-balanced side crowded with high profile attackers who had limited defensive ability. Also, his policy to bring in marketable players to boost t-shirt sales and discard the less marketable players or to put it crudely not so good looking ones because they are not fit for the stardom and glamour associated with the Real Madrid would inspire ridicule towards him in any true supporter of the great sport.

You can achieve a short lived success assembling in-form players in their prime from across the world but to achieve continuous success and be regarded as the greats you need to inculcate right practices inherent to the club. Real though have always been an attacking side broadly speaking but they need to have a more stronger philosophy. We need someone like what Johan Cryuff was for Barcelona. Someone who could turn the club into a school of football with a unique philosophy and a playing style. Real like barcelona needs a a crop of footballers from their own academy who have played alongside for years and thus develop a far better understanding between them. But Real’s ignorance of their youth team has cause them to be relegated from Segunda to division 3. It cannot be denied that there is an obvious mis-management at club’s top level.

The appointment of Rafa Benitez as the new manager does not excites me as a Real’s fan. Neverthless, I am hopeful Bale will find his form back this season, Ronaldo will score a lot of goals as he always does and be more accepting and a mentor to Gareth Bale, Modric will be fit to play as the new season kicks-off in August, Rodriguez will build upon his already impressive first season, Cassilas will hit form to have a memorable season (which could very well be his last) and team will win much silverware.