
Python lets you run repeatable, large-scale SEO checks that off-the-shelf crawlers either skip or charge for. With a handful of free libraries you can verify status codes and redirects, parse titles and canonicals, pull data from the Search Console and Analytics APIs, and analyze log files. Start small: a single script that checks a list of URLs, then grow it as your needs do.
Why Python for SEO
Most SEO work involves the same checks repeated across hundreds or thousands of URLs. Python is well suited to that for three reasons. First, scale: a script handles 10,000 URLs the same way it handles ten, so audits that would take hours by hand finish in minutes. Second, repeatability: once a check is written, it produces the same result every time and can be scheduled to run weekly or after every deploy. Third, custom logic: you can encode checks that no general-purpose tool offers, such as flagging pages where the canonical points to a redirected URL, or comparing crawl data against your own product database.
Python is also approachable. The libraries used for SEO are mature, well documented, and free, and you do not need to be a software engineer to get value from a 30-line script.
What you can automate
The practical list is long, and most of it relies on a small set of tools:
- Status-code and redirect checks: confirm that URLs return 200, catch unexpected 404s, and trace redirect chains. See our guide to 301 vs 302 redirects for what to look for.
- Crawling: the
requestslibrary fetches single pages, whileScrapyhandles full-site crawls with queuing, throttling, and concurrency. - HTML parsing:
BeautifulSoupextracts titles, meta descriptions, headings, canonicals, and structured data from raw HTML. - Search Console and Analytics data: the Google Search Console API and the Google Analytics Data API let you pull clicks, impressions, and traffic into a spreadsheet or dashboard on a schedule.
- Log-file analysis:
pandasreads server logs so you can see which URLs Googlebot actually crawls and how often. - Sitemap validation: parse XML sitemaps to find URLs that are missing, redirected, or non-indexable.
- Bulk meta and canonical audits: check thousands of pages for missing titles, duplicate descriptions, or mismatched canonicals in one pass.
A simple example
Here is a realistic starting script. It reads a list of URLs, requests each one, and reports the status code, page title, and canonical tag. It uses only requests and BeautifulSoup.
import requests
from bs4 import BeautifulSoup
urls = [
"https://example.com/",
"https://example.com/about/",
]
headers = {"User-Agent": "SEO-Audit-Bot/1.0"}
for url in urls:
try:
r = requests.get(url, headers=headers, timeout=10)
except requests.RequestException as e:
print(url, "ERROR", e)
continue
soup = BeautifulSoup(r.text, "html.parser")
title_tag = soup.find("title")
title = title_tag.get_text(strip=True) if title_tag else "(missing)"
canonical_tag = soup.find("link", rel="canonical")
canonical = canonical_tag["href"] if canonical_tag else "(none)"
print(r.status_code, "|", title, "|", canonical, "|", url)
Run it and you get one line per URL showing the status, title, and canonical. From there you can save the output to a CSV, add a check for missing meta descriptions, or follow redirects and report the final destination. Each addition is a few more lines, not a rewrite.
Python vs off-the-shelf crawlers
Python is not a replacement for tools like Screaming Frog or Sitebulb, and choosing one over the other depends on the job. A packaged crawler is the right choice when you want a complete site crawl with a visual interface, built-in reports, and no setup. It is faster to get a broad picture, and it handles JavaScript rendering and edge cases that a basic script does not.
Python earns its place when you need something the tool does not do: a custom rule, a check that joins SEO data with another data source, a scheduled job that runs without anyone clicking a button, or a one-off analysis across a list that does not fit a crawler's model. Many teams use both, exporting from a crawler and then processing the export with pandas. If your check involves the robots file, our robots.txt reference covers the rules your crawler should respect.
How to start
Install Python 3, then add the core libraries with pip install requests beautifulsoup4 pandas lxml. For full-site crawling, add scrapy. A sensible first project is the script above: feed it a list of your most important URLs and confirm they all return 200 with the titles and canonicals you expect. Once that works, save the results to a CSV with pandas, then schedule it to run on a regular basis. Keep each script focused on one job, and respect the sites you crawl by setting a timeout, identifying your bot in the User-Agent, and limiting request rates.
FAQ
No. Basic checks like the example above can be copied, adjusted, and run with no prior experience. You will learn faster by modifying a working script than by studying theory first.
Plain requests only sees the raw HTML, so content injected by JavaScript will be missing. For rendered pages you need a headless browser tool such as Playwright or Selenium, which run a real browser and return the final DOM.
Yes, if you are considerate. Use timeouts, limit how fast you send requests, and avoid hammering the server during peak traffic. For large crawls, Scrapy's built-in throttling makes this easier to manage.
We combine automated checks with expert review to find the issues that move rankings, then hand you a clear plan to fix them.
Claude Vincent is a technical SEO consultant focused on crawlability, rendering, and AI-search visibility. He writes the field guides and case studies at SEO ProCheck, with a bias toward the durable, unglamorous work that decides whether search engines and AI answer engines can actually read and cite a site.
About SEO ProCheck
Technical SEO consulting and GEO strategy with 20 years of enterprise experience. Case studies, resources, and tools for search and AI visibility.
Work With Me
Technical SEO audits, GEO strategy, site migrations, and international SEO. Hourly consulting for teams who need hands-on support, not just reports.








