Scrape Google News with Python

Google News, also known as Google News in Turkey, is a listing system offered by the Google search engine. News sites registered with Google News are accepted into the system after being reviewed by Google. In the News section, which we can easily access on today’s smart mobile devices, prominent news from Google News registered sites according to our interests are listed.

Google News is a service that was launched in September 2002. Google News, a constantly updated network, has thousands of different news sources.

Classifying the news according to categories, Google News aims to enable users to easily access the information they are looking for according to their interests. Google’s news catcher, called Crawler, indexes the information by browsing News recorded sites in specific time periods.

Google news is currently one of the services that offer the most popular news data in its field in a single environment. It provides hundreds of news data of single news to its visitors in just milliseconds. The fact that news and source content is so high is enough to meet the need for data sets needed in various projects. In order to use it directly in line with these and similar needs, let’s scrape the Google news content with the target word type of news with the Python programming language.

Table of Contents

Scraping Google news with Python programming language

First, let’s create a Python file and run the command to install the necessary libraries by typing the following commands from the command line.

pip install requests 
pip install lxml 
pip install beautifulsoup4

Then paste the code below into the Python file we created.

import requests, lxml
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {
    "q": "api design guideline",
    "hl": "en",
    "tbm": "nws",
}

response = requests.get("https://www.google.com/search", headers=headers, params=params)
soup = BeautifulSoup(data.text, 'lxml')

for news in soup.select('.dbsr'):
    news_title = news.select_one('.nDgy9d').text
    news_link = news.a['href']
    news_source = news.select_one('.WF4CUc').text
    news_snippet = news.select_one('.Y3v8qd').text
    news_date_published = news.select_one('.WG9SHc span').text
    print(f'{news_title}\n{news_link}\n{news_snippet}\n{news_date_published}\n{news_source}\n')

The following output is printed to the application console when this code is run.

APIs need style, because they're worth it
https://www.idgconnect.com/article/3657769/apis-need-style-because-they-re-worth-it.html
The world of cloud-native systems depends upon many protocols, standards 
and structures... many of which are interconnected via Application...
2 weeks ago
IDG Connect

Conclusion

The number of machine learning algorithms trained on news data is increasing. Sentiment analysis and category prediction applications on news are used by many artificial intelligence companies today. You can easily access the news data that you may need for many needs with web scraping.

Scrape Google News with Python

RECENT POSTS

TOP POSTS

Scraping Google news with Python programming language

Conclusion

Features

Status