Football News Aggregator through Web Scraping

I like to follow goal.com, givemesport.com and sportskeeda.com for football news. Though, all of the European football interests me but Spanish football is where my allegiance rests. Things are great as they stand now, but if there was just a way to gather the links to news articles about La Liga from these websites and order them by recency without actually visiting these sites. That could save me some effort and frankly, I have some time to kill these holidays.

But before we go further, I want to make it clear that the intention is not to republish articles from these websites. All I am trying to do is, just collect the links to articles about Spanish football from these sites for easy navigation. To acknowledge that these articles originally belong to the respective sites, I have made sure to append the site name to the article title if in case it was missing.

I have tried to keep the script as simple as possible. So, here is how it goes:

  • Import the necessary libraries
  • Provide the url to the home page for Spanish soccer news
  • Hit the url and fetch the response
  • Extract the HTML and identify the tags to La Liga news articles’ links
  • Collect all the links through BeautifulSoup
  • Crawl each of the collected links
  • Extract title and published date through BeautifulSoup after inspecting HTML tags
  • As long as the designs for these websites don’t change, our news aggregator will work just fine

scraping-code-snippet

The image above only shows a snippet of the code. Procedure to crawl goal.com and sportskeeda.com is more or less same, just with different tags.

Once, I appended the details of all articles in a dataframe and sorted it by recency, I got the following output:

scraping-output

I have also written the output to a text file. It serves like a mini magazine with the links to latest updates on Spanish football from my favorite three websites. Sample results are shown at the end of this blog.

This was a short post on web scraping. Inside18yard will be back with more of new and interesting stuff!

———————————————————————————————————————–

##Article 1##

The latest Antoine Griezmann to Manchester United update  | GiveMeSport

2017-01-16

http://www.givemesport.com//964054-the-latest-antoine-griezmann-to-manchester-united-update

 

##Article 2##

Heaviest defeats faced by 5 legendary managers in club football | Sportskeeda.com

2017-01-16

http://www.sportskeeda.com/football/heaviest-defeats-faced-5-legendary-managers-club-football/

 

##Article 3##

RUMOURS: Man City ready to pay £100m for Messi – Goal.com

2017-01-16

http://www.goal.com/en-us/news/88/spain/2017/01/16/31618542/rumours-man-city-ready-to-pay-100m-for-messi

 

 

 

Leave a comment