Google News, also known as Google News in Turkey, is a listing system offered by the Google search engine. News sites registered with Google News are accepted into the system after being reviewed by Google. In the News section, which we can easily access on today’s smart mobile devices, prominent news from Google News registered sites according to our interests are listed.
Google News is a service that was launched in September 2002. Google News, a constantly updated network, has thousands of different news sources.
Classifying the news according to categories, Google News aims to enable users to easily access the information they are looking for according to their interests. Google’s news catcher, called Crawler, indexes the information by browsing News recorded sites in specific time periods.
Google news is currently one of the services that offer the most popular news data in its field in a single environment. It provides hundreds of news data of single news to its visitors in just milliseconds. The fact that news and source content is so high is enough to meet the need for data sets needed in various projects. In order to use it directly in line with these and similar needs, let’s scrape the Google news content with the target word type of news with the Python programming language.
Table of Contents
Scraping Google news with Python programming language
First, let’s create a Python file and run the command to install the necessary libraries by typing the following commands from the command line.
pip install requests pip install lxml pip install beautifulsoup4
Then paste the code below into the Python file we created.
import requests, lxml from bs4 import BeautifulSoup headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582" } params = { "q": "api design guideline", "hl": "en", "tbm": "nws", } response = requests.get("https://www.google.com/search", headers=headers, params=params) soup = BeautifulSoup(data.text, 'lxml') for news in soup.select('.dbsr'): news_title = news.select_one('.nDgy9d').text news_link = news.a['href'] news_source = news.select_one('.WF4CUc').text news_snippet = news.select_one('.Y3v8qd').text news_date_published = news.select_one('.WG9SHc span').text print(f'{news_title}\n{news_link}\n{news_snippet}\n{news_date_published}\n{news_source}\n')
The following output is printed to the application console when this code is run.
APIs need style, because they're worth it https://www.idgconnect.com/article/3657769/apis-need-style-because-they-re-worth-it.html The world of cloud-native systems depends upon many protocols, standards and structures... many of which are interconnected via Application... 2 weeks ago IDG Connect
Conclusion
The number of machine learning algorithms trained on news data is increasing. Sentiment analysis and category prediction applications on news are used by many artificial intelligence companies today. You can easily access the news data that you may need for many needs with web scraping.