TL;DR
Crawl budget is the number of URLs Google can and wants to crawl on your site. It's a combination of crawl rate (how fast Google can crawl without overloading your server) and crawl demand (how much Google wants to crawl). Most sites under 10K pages don't need to worry about it. For larger sites, the biggest budget killers are faceted navigation, duplicate content, redirect chains, and slow server response times. Log file analysis is the best way to see what Google is actually doing on your site.
Google crawls billions of pages every day. But that doesn't mean it crawls yours. According to Botify's research, Google only crawls about 51% of pages on the average enterprise website. Nearly half of your content could be invisible to search engines, and crawl budget is usually the reason.
If you publish a new page and it gets indexed within a couple of days, you probably don't have a crawl budget problem. But if new pages take weeks to appear, or large sections of your site never show up in search results, keep reading.
What Is Crawl Budget?
Crawl budget is the number of URLs Google can and wants to crawl on your site. That's Google's own definition, from a 2017 blog post by Gary Illyes on the Search Central team.
It's a combination of two things:
- Crawl rate limit: How fast Google can crawl without overloading your server. If your server slows down or throws 5xx errors, Googlebot backs off automatically.
- Crawl demand: How much Google wants to crawl. Popular pages get crawled more often. Pages that haven't been updated in months get crawled less. Big site changes (like a migration) spike demand temporarily.
Put simply: crawl budget is the intersection of Google's ability to crawl your site and its motivation to do so. You can have a fast server (high crawl rate), but if Google doesn't think your content is worth revisiting (low demand), it won't crawl much.
When Does Crawl Budget Actually Matter?
Here's something Google has said repeatedly: most sites don't need to worry about crawl budget. John Mueller, Google's Search Advocate, has stated that "for most normal websites, crawl budget is not something you need to focus on at all."
Crawl budget becomes a real issue when:
- Your site has 50,000+ pages. E-commerce catalogs, classified sites, large publishers. The more URLs you have, the more selective Google becomes.
- Faceted navigation creates millions of URL variations. A product category with 10 filters and 5 options each can generate millions of crawlable URL combinations, most showing near-identical content.
- You publish content rapidly. News sites pushing hundreds of articles per day need Google to discover and index content quickly, or it's stale before it ranks.
- Your site relies heavily on JavaScript. Google's rendering service processes JS pages in a separate queue. Martin Splitt from Google has confirmed this can add "a few hours to even weeks" of delay.
- You recently migrated domains or restructured URLs. Migrations spike crawl demand as Google processes redirects and discovers new URL patterns.
Publish a new page, add it to your sitemap, and check Google Search Console. If it's indexed within 1-3 days, crawl budget is not your problem. If it takes weeks or never appears, you likely have a crawl budget issue (or a quality issue).
7 Things That Waste Crawl Budget
Every time Googlebot crawls a low-value URL, that's a slot taken from something that actually matters. Here are the most common budget killers:
1. Faceted Navigation
The single biggest crawl budget problem for e-commerce sites. Color, size, brand, price range, rating, sort order: each filter combination creates a new URL. A site with 5,000 products can easily have 2 million crawlable URLs because of filter permutations.
Most of these pages show nearly identical content. Googlebot doesn't know that until it crawls them.
2. Duplicate Content
HTTP vs. HTTPS. With www vs. without. Trailing slash vs. no trailing slash. URL parameters like ?ref=, ?utm_source=, ?sessionid=. Each variation looks like a separate page to Googlebot.
If your site serves the same page at four different URLs, Google might crawl all four before figuring out they're duplicates. That's 4x the budget for 1x the content.
3. Redirect Chains
URL A redirects to B, which redirects to C, which redirects to D. Google follows up to 10 hops, but each hop burns crawl resources. Sites that flatten redirect chains from 3+ hops to a single hop typically see 10-25% improvements in crawl efficiency.
4. Soft 404s
Pages that return HTTP 200 but display "no results found" or empty content. Google has to crawl and render these before realizing they're useless. Return a proper 404 or 410 status code instead.
5. Orphan Pages
Pages with no internal links pointing to them. They might be in your sitemap, but without internal link signals, they get low crawl priority. Conversely, pages Google finds that you didn't intend to be public waste budget when crawled.
6. Thin Content Pages
Tag pages, author archives with one post, empty category pages. They consume crawl resources but add minimal indexing value. Google's 2024 updates increasingly deprioritize thin content in crawl scheduling.
7. Crawl Traps
Infinite calendars with endless future dates. Relative URLs that create infinite path depth. Faceted URLs that generate endless filter combinations. These can trap Googlebot in loops, burning through budget on pages that don't exist in any meaningful sense.
11 Ways to Optimize Crawl Budget
1. Clean Up robots.txt
Block Googlebot from crawling URL patterns that waste budget:
- Disallow filter/faceted URLs: Disallow: /*?sort=, Disallow: /*?filter=
- Block internal search result pages: Disallow: /search
- Block admin, staging, and test directories
Don't block CSS or JavaScript files. Google needs these to render pages properly. Blocking them can cause indexing issues.
2. Fix Your XML Sitemaps
Your sitemap should be a curated list of your best pages, not a dump of every URL on your site:
- Only include indexable, canonical, 200-status URLs
- Remove noindexed, redirected, or canonicalized URLs
- Keep each sitemap under 50,000 URLs (Google's limit)
- Use accurate lastmod dates (don't set them all to today, Google will learn to ignore them)
3. Flatten Your Site Architecture
Important pages should be within 3 clicks of the homepage. Research from OnCrawl shows that pages buried 7+ clicks deep get crawled significantly less often than pages within 3 clicks.
Use breadcrumbs, hub pages, and strategic internal linking to keep your most valuable content close to the surface.
4. Handle Faceted Navigation Properly
This is where most e-commerce sites bleed crawl budget. Options:
- Block in robots.txt: Fastest fix. Disallow filter parameter patterns.
- Canonical tags: Point filtered pages to the main category page. Google may still crawl them but won't waste indexing resources.
- AJAX-based filtering: Use JavaScript to filter products without changing the URL. No new URLs means no crawl budget waste.
- noindex, follow: Tells Google to crawl links on the page but don't index it. Useful when you want link equity to flow but don't need the page indexed.
5. Consolidate Duplicate URLs
Pick one canonical version of each page and enforce it:
- 301 redirect HTTP to HTTPS
- 301 redirect www to non-www (or vice versa)
- Enforce trailing slash consistency
- Set canonical tags on parameter variations
- Strip unnecessary URL parameters server-side
6. Flatten Redirect Chains
Audit your redirects and make sure every redirect goes directly to the final destination in one hop. Update internal links to point to final URLs instead of relying on redirect chains.
7. Speed Up Your Server
Google has confirmed that server speed directly impacts crawl rate. Faster servers get crawled more aggressively. Sites that improved response time from 800ms to 200ms have reported Googlebot increasing its crawl rate by 2-4x.
Target server response times under 200ms. Use a CDN for static assets. If your server regularly returns 5xx errors, fix that first, because Googlebot will back off entirely.
8. Use HTTP Status Codes Correctly
- Return proper 404 for deleted pages (not soft 404s)
- Use 410 (Gone) for permanently removed content. Google drops 410 pages from the index faster than 404s
- Return 503 for temporary maintenance (tells Google to come back later)
9. Prioritize New Content Discovery
Help Googlebot find your new pages faster:
- Link to new content from high-authority existing pages
- Add new URLs to your sitemap immediately
- Use the URL Inspection tool in GSC to request indexing (limited to 10-20 per day)
- Submit to Bing via IndexNow for faster indexing on non-Google engines
10. Audit and Remove Index Bloat
Sites that reduced their indexed URL count by 40-60% (by removing thin, duplicate, and low-value pages from Google's index) consistently report improvements in crawl efficiency and organic traffic to remaining pages. Google reallocates crawl budget to the pages that matter.
11. Do Log File Analysis
This is the unlock that separates guessing from knowing. Log file analysis shows you exactly which URLs Googlebot is crawling, how often, and what status codes it's getting back.
Without log data, you're optimizing crawl budget based on assumptions. With it, you can see:
- Which pages Googlebot visits most (and whether those are the right pages)
- How often each section of your site gets crawled
- Pages in your sitemap that Googlebot never visits
- Crawl budget wasted on redirect chains, 404s, and parameter URLs
Tools for Crawl Budget Optimization
You don't need an enterprise tool to start. Here's what works at each budget level:
Free: Google Search Console
The Crawl Stats report (Settings > Crawl Stats) shows total crawl requests per day, average response time, download size, and status code distribution. The URL Inspection tool shows when Google last crawled a specific page. Start here.
$259/year: Screaming Frog
Screaming Frog is the industry-standard desktop crawler. It identifies duplicate content, redirect chains, orphan pages, thin content, and broken links. The Log File Analyser module lets you import server logs and see exactly what Googlebot is doing. At $259/year, it's the best value tool for crawl budget work. The free version crawls up to 500 URLs.
Enterprise: Botify
Botify is the only enterprise SEO platform with native log file analysis built into the core product. It combines crawl data, log data, and search analytics in a single view, so you can see exactly which pages Google crawls, which pages get traffic, and which pages are being ignored. The SpeedWorkers feature solves JavaScript rendering issues at the CDN level. Pricing starts around $75,000/year, so it's strictly for enterprise sites with 500K+ pages.
See my Botify pricing breakdown for details.
Mid-Range Options
- Semrush ($139.95/mo): Site Audit tool catches most technical issues. No log file analysis, but good for crawl-related diagnostics alongside keyword research and backlinks.
- Ahrefs ($29/mo): Site Audit covers duplicate content, redirects, and orphan pages. Lightweight but solid for sites under 100K pages.
- Sitebulb ($13.50/mo): Visual site auditor with priority scoring that makes technical issues easy to understand. Desktop and cloud versions available.
- OnCrawl ($69/mo): Cloud-based crawler with log file analysis at a fraction of Botify's price. Worth evaluating if you need log data without enterprise pricing.
What Google's Crawl Stats Report Tells You
The free Crawl Stats report in GSC is your starting point. Here's what healthy numbers look like:
- Response time: Under 500ms average. Under 200ms is ideal. If you're above 1 second, that's directly throttling your crawl rate.
- Status codes: 95%+ should return 200. A high percentage of 301/302/404/500 responses signals wasted crawl budget.
- Crawl requests: Trending upward generally means Google is finding your site valuable. A sudden drop could mean server issues or quality signals.
For more on tracking SEO metrics without expensive tools, see my guide on how to track SEO progress.
The Bottom Line
Crawl budget optimization follows the 80/20 rule. For most sites, fixing the basics (duplicate content, redirect chains, clean sitemaps, fast server) solves 80% of crawl budget problems. The remaining 20% is log file analysis, faceted navigation management, and architectural improvements that matter most for sites with 500K+ pages.
Start with Google Search Console (free). Graduate to Screaming Frog ($259/year) when you need deeper crawl data. And if you're running an enterprise site where organic search drives serious revenue, Botify or Lumar are where the real crawl budget intelligence lives.
For a full comparison of SEO tools at every price point, check my best SEO software roundup.
Sources
Software Mentioned

Botify

Screaming Frog

Semrush

Ahrefs



