How to Check Web page for SEO Using Python

Website SEO aka Search Engine Optimization is important for every modern website. However manually validating SEO criteria is quite cumbersome. It is possible to automate most of the SEO checks using python.

In this article, I’m going to demonstrate some ideas using basic python code snippets. These scripts will work as a good starting point for writing tools that can provide insights about Website SEO. We can take help of various python libraries that are available in pip however we will be using requests_html for now.

Installation

To install requests_html, open up your terminal and type:

pip install requests_html

and that’s it!

It will download all the dependencies and install requests_html. If you face any issues search it on Internet and if you still can’t fix it then feel free to comment below I will try my best to help you.

1. Check gzip/brotli Compression enabled

Compression is necessary for Good performance. Enabling compression for various resources such as js, css etc will improve website performance.
So, we will begin with checking if the website has any content compression or not. The script for this will be quite simple. Even you can do it using an interpreter as well.

import requests_html
session = requests_html.HTMLSession()
r = session.get("https://wasi0013.com/")
print(r.headers.get("Content-Encoding"))

In the above script we are checking the “Content-Encoding” from the request headers to find out whether compressions are enabled or not. If gzip compression is enabled it will output gzip or, if brotli compression is enabled it will print brotli

2. Find Broken Links

The next script that we will write is to check broken links in our website. We will have to make sure there are no broken links as just like Google many other search engine bots dislike broken links and usually penalized website. To find a broken link we can actually use many python tools such as scrapy, requests etc. However, as we already have requests_html installed why not write the script using requests_html? Lets do it!

	import requests_html

	def crawl(base_url):
	broken_links = []
	session = requests_html.HTMLSession()
	try:
	r = session.get(base_url)
	except:
	broken_links = [base_url]
	return broken_links
	if r.status_code != 200:
	return [base_url]

	base_url.replace("http://","").replace("https://", "")
	visit = r.html.absolute_links
	visited = []
	while visit:
	url = visit.pop()
	session = requests_html.HTMLSession()
	try:
	r = session.get(url)
	except:
	broken_links.append(url)
	continue
	if r.status_code != 200:
	broken_links.append(url)
	if base_url in url:
	for link in r.html.absolute_links:
	if link not in visited:
	visit.add(link)
	print(url, r.status_code)
	visited.append(url)
	return broken_links


	print(crawl("https://wasi0013.com"))

view raw find_broken_links.py hosted with ❤ by GitHub

3. Find Images that needs Optimization

Slow website loading hurts SEO. Even Google ranks faster websites higher. One of the most common reason of website’s slow loading is Unoptimized images i.e. images that are not optimized for web and uploaded as is (in original size without compression). Using the following script we can easily trace such drawbacks on any website.

	import requests_html
	def unoptimized_images(url):
	"""
	find unoptimized images in a webpage
	:param url: webpage_url
	:return : tuple of image_count in int, images list of dict
	"""
	session = requests_html.HTMLSession()
	response = session.get(url)
	images = []
	image_count = 0
	for element in response.html.find("img"):
	image_url = element.attrs.get("src")
	try:
	if image_url:
	i = session.get(image_url)
	else:
	continue
	except:
	if image_url[0] == "/" and element.base_url[-1] == "/": image_url = image_url[1::]
	image_url = element.base_url + image_url
	i = session.get(image_url)
	if i.status_code != 200:
	continue
	image_size = None
	try:
	image_size = int(i.raw.info().get("Content-Length"))/1000
	except:
	print("Error fetching image size for:", image_url)

	# check if image size is greater than 1MB
	if image_size is not None and image_size >=1024:
	images.append({
	'url': image_url,
	'size(KB)': image_size,
	})
	image_count += 1
	return image_count, images

view raw optimized_image_check.py hosted with ❤ by GitHub

4. Check HTTP/2 Supported

To check if a website supports HTTP/2 I usually use keycdn website. However, visiting that website and submitting an url just is quite time consuming… So, we will automate this task using a simple scraper like below:

	import requests_html


	def http2_enabled(url):
	"""
	scrape https://tools.keycdn.com/http2-test to find http 2/0 support of the given domain url
	"""
	url = url.replace("https://", "").replace("http://", "")
	session = requests_html.HTMLSession()
	response = session.get("https://tools.keycdn.com/http2-test")
	script = """
	() => {
	var value = ""
	if(jQuery.isReady) {
	$("#public").prop('checked', false);
	$("#url").val("%s")

	value = $.post( "http2-query.php", $('#http2Form').serialize()).done(function( data ) {value = data})
	}
	return value;
	}
	"""%url
	result = response.html.render(script=script)

	# check if the source code contains alert-success class in the bootstrap modal
	return True if "alert-success" in result else False

view raw http_version.py hosted with ❤ by GitHub

Note the script relies on the layout of keycdn website hence, it might need update/fix to make it work if it is broken due to any recent changes in their website.

We wrote only four scripts so far, but it is possible to write tons more… Say for example, Script for getting the ttfb time or, Time To First Byte time. Or Script to check whether the website is using js that blocks render i.e. detect whether contents are generated using Javascripts or not, Or perhaps do some keyword analysis using libraries like nltk and combining it with Google Trend’s python library we can get some useful insights of Website SEO.