As a webmaster or SEO specialist, performing backlink audits is an essential part of maintaining a healthy website. In order to keep your site's ranking high and prevent potential penalties, it's important to identify and disavow toxic backlinks. However, manually exporting and correlating all backlink data from Google Search Console can be a daunting task.
For larger websites, the sheer volume of data can make the process of clicking and exporting data from GSC a time-consuming and impractical approach. Fortunately, there is a solution – web scraping. By utilizing a web scraper, you can efficiently extract and analyze backlink data from GSC, allowing you to quickly identify and disavow any harmful links.
One effective way to scrape GSC backlinks is to use Python, a popular programming language for web development and data analysis. With Python, you can use the BeautifulSoup library to parse HTML and extract relevant information from web pages.
Google Search Console – Links Section
To begin, you'll need to install several packages using pip: bs4, requests, re, pandas, and csv. These packages will provide essential functionality for web scraping and data manipulation within Python.
pip install bs4 requests re pandas csv
1. Emulate a user session
To start getting information about backlinks from Google Search Console, you need to simulate a normal user session. This can be done by opening the Links section in GSC through a web browser and selecting the Top linking sites section. When you get to this section, you'll need to right-click and choose "Inspect" to look at the page's source code.
Next, navigate to the network tab within the browser's dev tools and select the first URL that appears as a document type. This should be a request on a URL that follows this format: https://search.google.com/search-console/links?resource_id=sc-domain%3A{YourDomainName}
Click on the URL and look in the Headers tab for the Request Headers section, as shown in the image below:
To successfully emulate a normal user session, we'll need to include the request information from the header in our Python requests.
It's important to note that the request header also contains cookie information, which in Python's requests library will be stored in a dictionary called cookies. The remaining information from the header will be stored in a dictionary named headers.
Essentially, we're taking the information from the header and creating two dictionaries, as shown in the code below. Remember to replace [your-info] with your actual data.
The request header information displayed may vary depending on your individual case. Don't be concerned if there are differences, so long as you can generate the two necessary dictionaries.
After completing this step, execute the cell containing the header and cookies information, as it's time to move onto the first part of the actual script: collecting a list of referring domains that link back to your website.
Remember to replace [your-domain] with your actual domain.
The code provided above initializes an empty Pandas DataFrame, which will later be populated with external domains. The script works by scanning the entire HTML code and identifying all div elements with the "OOHai" class. If any such elements are found, the dfdomains DataFrame will be populated with the names of the external domains.
3. Extract Backlink information for each Domain
Moving forward, we will extract the backlink information for all domains, including the top sites that link to your page and the top linking pages (which corresponds to the third level of data in GSC, taking only the first value).
Web scraping is a powerful technique that can be utilized to extract valuable data from Google Search Console. By leveraging Python and libraries such as BeautifulSoup and Pandas, you can efficiently collect backlink information for your website. While the process may seem complex at first, by following the steps outlined above you can streamline the backlink auditing process and ensure the health of your website's SEO. By utilizing this technique, you can easily identify toxic backlinks and take the necessary steps to disavow them, thereby improving your website's overall ranking and performance.
In the world of web development, staying abreast of the latest frameworks can be both exhilarating and overwhelming. Let's explore Astro.build, a…