When Robots.txt Is Blocking Revenue Pages Without You Realising It

When Robots.txt Is Blocking Revenue Pages Without You Realising ItWhen Robots.txt Is Blocking Revenue Pages Without You Realising It Download

Robots.txt becomes an SEO problem when it blocks URLs or site resources that search engines need to crawl. The site may still work perfectly for visitors, but crawlers may be told not to access important sections of it.

That is risky when the blocked areas include service pages, ecommerce categories, product pages, location pages, pricing pages, or the files needed to render those pages properly.

A robots.txt issue is easy to miss because it does not usually break the visible website. A user can still open the page. The menu can still work. The design can still look fine. But search engines may be receiving crawl instructions that prevent them from checking, refreshing, or understanding parts of the site that support leads and sales.

Before blaming content quality, backlinks, rankings, or Google updates, it is worth checking whether your own robots.txt file is quietly blocking something important.

What robots.txt blocking actually means

Robots.txt is a plain text file that gives crawlers instructions about which parts of a website they are allowed to crawl.

It usually sits at a URL like:

https://seostrategist.co.za/robots.txt

A basic robots.txt rule may look like this:

User-agent: *
Disallow: /admin/

This tells all crawlers that follow robots.txt instructions not to crawl URLs inside the /admin/ folder.

That kind of rule can be useful. Many websites use robots.txt to keep crawlers away from admin areas, internal search results, duplicate parameter URLs, staging sections, or low-value crawl paths.

The problem starts when a rule is too broad, outdated, copied from a staging site, or written without checking which URLs sit inside the blocked path.

Robots.txt does not remove a page from the website for users. It controls crawler access. That is why a business owner, marketer, or developer can open a page in the browser and assume everything is fine while search engines may be blocked from crawling it.

There is another important nuance: robots.txt is not a privacy or security tool. A blocked URL can still be found in other ways, and in some cases a search engine may know the URL exists without being able to crawl the page content properly. That is why robots.txt should be used for crawl control, not for hiding sensitive content.

Why this becomes a revenue problem

Not every blocked URL is bad. Some pages should stay out of crawl paths.

The risk appears when robots.txt blocks URLs or resources that help the business generate visibility, enquiries, sales, bookings, or trust.

This can affect:

core service pages
ecommerce category pages
product pages
city or location landing pages
pricing, package, or comparison pages
lead-generation pages
pages listed in the XML sitemap
JavaScript, CSS, image, or template resources needed for rendering
pages that need to be crawled after updates

For example, if an ecommerce store accidentally blocks /category/, search engines may struggle to crawl a major part of the store structure. If a service business blocks /services/, its main commercial pages may become harder to access. If a WordPress site blocks key theme or script files, search engines may have a weaker view of how the page renders.

The commercial question is not simply “is anything blocked?”

The better question is: does the blocked pattern overlap with pages that search engines need to crawl for the business to compete?

For a deeper technical review of how robots.txt and sitemaps should work together, see our guide to robots.txt and XML sitemap setup.

Examples of risky robots.txt rules

The fastest way to understand robots.txt risk is to look at the rule and then ask what URL patterns it affects.

These examples are simplified, but they show how small rules can create large crawl problems.

1. A staging block accidentally pushed live

User-agent: *
Disallow: /

This tells crawlers not to crawl the entire site.

It is often used on staging or development sites before launch. It becomes a serious issue if it is copied to the live website by mistake during a redesign, rebuild, or migration.

What to check:

Was the site recently launched or migrated?
Is the live domain using a staging robots.txt file?
Are all important URLs blocked in crawler tests?

2. A service-folder block

User-agent: *
Disallow: /services/

This blocks URLs inside the /services/ folder.

For a service business, that could include the pages most closely tied to enquiries. If the business has URLs such as /services/technical-seo/, /services/local-seo/, or /services/seo-audit/, this rule needs urgent review.

What to check:

Are core service pages inside the blocked folder?
Are those URLs in the XML sitemap?
Did the rule come from an old development or duplicate-content decision?

3. An ecommerce category block

User-agent: *
Disallow: /category/

This can be dangerous for ecommerce sites if category pages are important organic landing pages.

A store may intend to block thin filtered URLs or duplicate paths, but a broad category block can affect the pages that organise products and target non-branded searches.

What to check:

Are high-value product categories inside this path?
Do these category pages target search demand?
Are filters or parameters being confused with the main category structure?

4. A parameter rule that catches useful pages

User-agent: *
Disallow: /*?filter=

Parameter rules can be useful on ecommerce sites, but they need careful handling.

This type of rule may help control crawl waste from filtered combinations. But if filtered URLs are part of the site’s SEO strategy, or if the rule pattern catches more than intended, it can hide useful landing-page variants from crawlers.

Pattern matching can vary by crawler. Major search engines may support wildcard-style matching, but not every bot interprets robots.txt patterns in exactly the same way. Test the rule using the search engines and crawl tools that matter for your site.

What to check:

Are filtered pages intentionally indexable or purely low-value duplicates?
Do any filtered pages receive internal links from category pages?
Are product or collection URLs using the same parameter pattern?

5. Blocking WordPress content resources

User-agent: *
Disallow: /wp-content/

This does not usually block the main page URL itself, but it can block files that help search engines render and understand the page.

On a WordPress site, /wp-content/ can include themes, plugins, images, CSS, and JavaScript. Blocking everything in that folder may create rendering or resource-access issues.

What to check:

Are important CSS or JavaScript files blocked?
Are images needed for content or products blocked?
Does Search Console show blocked resources or rendering differences?

6. Blocking a location-page folder

User-agent: *
Disallow: /locations/

If a business uses city or location landing pages, this type of rule can affect local or regional search visibility.

For example, a company targeting Johannesburg, Cape Town, Durban, or Pretoria may rely on location pages to explain service availability. If the whole folder is blocked, crawlers may not be able to access those pages properly.

What to check:

Are location pages part of the search strategy?
Are they unique and useful, or thin duplicates?
Are they linked from service pages, contact pages, or local SEO content?

Common causes behind accidental robots.txt blocks

Most robots.txt problems come from routine website work rather than one obvious mistake. After you have reviewed the risky rule examples above, look for the source of the change.

Common causes include:

a staging or development robots.txt file being pushed to the live domain
an old rule staying in place after the URL structure changed
an SEO plugin, CMS setting, security tool, or ecommerce platform generating rules automatically
a developer blocking a folder during testing and not removing the rule before launch
an ecommerce crawl-control rule being applied too broadly to categories, filters, products, or collections
resource folders being blocked without checking whether CSS, JavaScript, images, or template files are needed for rendering

The cause matters because it affects the fix. A copied staging block may need urgent removal. A filter rule may only need narrowing. A legacy rule may need to be checked against the current sitemap and internal navigation before anything changes.

Signs your robots.txt file may be blocking the wrong pages

Robots.txt problems are not always obvious from the front end. Look for signals across crawl tools, sitemap data, and Google Search Console.

Common warning signs include:

important URLs missing from crawl reports
XML sitemap URLs showing crawl access issues
newly launched pages not being crawled
Search Console showing blocked-by-robots.txt messages
sudden crawl changes after a redesign or migration
service, category, product, or location sections missing from crawl data
important resources blocked during rendering checks
crawl results that do not match the site navigation or sitemap

The next step is to separate harmless blocks from risky blocks.

Blocking /wp-admin/ is usually expected. Blocking /services/, /category/, /products/, or /locations/ needs closer inspection.

What to check before editing robots.txt

Do not treat robots.txt changes as a quick copy-and-paste fix. A useful review should identify what the rule does, which URL patterns it affects, whether those URLs should be crawlable, and what happens if the rule is changed.

Use this workflow.

1. Open the robots.txt file

Review the file directly at the root of the domain.

Look for rules using Disallow, especially broad rules affecting folders, parameters, product sections, service sections, location sections, or recently changed URL paths.

2. Interpret each important rule

For each rule, ask what it actually blocks.

A rule like this:

Disallow: /products/

is very different from this:

Disallow: /products?sort=

The first may block a whole product folder. The second may block a sorting parameter pattern. One may affect core product discovery. The other may be a reasonable crawl-control rule.

3. Test affected URL examples

Do not only inspect the rule. Test real URLs that match the pattern.

For example, if the rule is:

Disallow: /category/

check whether URLs like these are affected:

/category/running-shoes/
/category/office-furniture/
/category/industrial-supplies/

The goal is to understand which templates and page types sit inside the blocked path.

4. Compare robots.txt with the XML sitemap

Your XML sitemap should contain URLs you want search engines to discover and crawl.

If the sitemap includes URLs blocked by robots.txt, the site is sending mixed signals.

A sitemap is effectively saying, “these URLs are important”. The robots.txt file may be saying, “do not crawl them”. That conflict should be reviewed before any broader SEO conclusions are made.

You can read more about this relationship on our robots.txt and XML sitemap setup page.

5. Inspect examples in Google Search Console

Use URL inspection and indexing reports to check affected URLs.

Look for blocked-by-robots.txt messages, crawl access issues, and examples of URLs Google has discovered but cannot crawl properly.

Search Console should not be the only diagnostic source, but it helps confirm how Google is seeing the issue.

6. Run two technical crawls

Run one crawl that respects robots.txt and another controlled crawl that ignores robots.txt.

Compare the difference.

If the second crawl discovers important pages missing from the first crawl, robots.txt is probably hiding useful crawl paths.

7. Prioritise the affected templates

Once you know what is blocked, group the affected URLs by template or page type.

A blocked admin path is low concern. A blocked service-folder template, category template, product template, or location-page template is a higher priority.

8. Decide whether to keep, refine, or remove the rule

The decision should not be “blocked equals bad”. The decision should be based on URL value.

Some rules should stay. Some should be narrowed. Some should be removed. Some should be replaced with a better noindex, canonical, parameter-handling, or internal-linking decision.

Decision table: what to do after finding a blocked URL

Use this as a practical triage guide.

Blocked URL type	Is it usually a problem?	What to check first	Priority level
Admin or login areas	Usually no	Confirm they are not in the XML sitemap and not linked as public pages	Low
Staging or test folders	Usually no, if not live content	Confirm the rule is not blocking the live site or public launch URLs	Medium to high after launch
Core service pages	Usually yes	Check whether service URLs are in the blocked path and sitemap	High
Ecommerce category pages	Often yes	Check search demand, internal links, sitemap inclusion, and whether filters are involved	High
Product pages	Depends	Check product value, uniqueness, availability, and whether products rely on blocked resources	Medium to high
Location or city pages	Often yes	Check whether these pages support local or regional search intent	High if used for acquisition
Internal search results	Usually no	Confirm they are not being used as landing pages	Low to medium
Filter or sort parameters	Depends	Check whether filtered URLs are crawl waste or intentional landing pages	Medium
CSS, JavaScript, images, theme files	Depends	Check rendering, mobile usability, and whether blocked resources affect page understanding	Medium to high
URLs listed in XML sitemap	Often yes	Confirm why a URL is being submitted but blocked from crawling	High

This table is not a substitute for a technical audit, but it helps avoid two common mistakes: unblocking everything without thinking, or leaving important templates blocked because the site looks fine in a browser.

Robots.txt vs noindex vs canonical tags

Robots.txt is often confused with other SEO controls. That confusion can lead to the wrong fix.

Robots.txt controls crawling

Robots.txt tells crawlers whether they are allowed to access certain URLs or resources.

If a page is blocked by robots.txt, search engines may not be able to crawl the page content properly.

Noindex controls indexing

A noindex tag tells search engines not to include a page in search results.

But for a search engine to see a noindex tag, it usually needs to crawl the page. If the page is blocked by robots.txt, the crawler may not be able to access the noindex instruction.

Canonical tags suggest the preferred version

A canonical tag helps indicate the preferred version of similar or duplicate pages.

It is not the same as blocking a page from crawling. It is also not a guaranteed instruction. It works best when search engines can crawl and compare the relevant pages.

The simple rule is this:

use robots.txt to control crawl access
use noindex to keep crawlable pages out of search results
use canonical tags to consolidate similar or duplicate pages where a preferred version exists

Before changing any of them, confirm what problem you are solving.

How to prioritise fixes

After the decision table, prioritisation should be simple: fix blocked patterns that affect acquisition, conversion, product discovery, or rendering first.

Highest-priority checks usually include:

service URLs that explain what the business offers
ecommerce category templates that target non-branded searches
product paths with search demand, unique content, or strong sales value
city or location URLs used for regional visibility
pricing, package, comparison, or enquiry-supporting pages
CSS, JavaScript, images, or template files needed to render key pages correctly
sitemap URLs that are being submitted but blocked from crawling

Lower-priority blocks usually include admin areas, login paths, staging folders, internal search results, and duplicate parameter combinations, provided they are not being used as public landing pages.

The aim is not to make every URL crawlable. The aim is to stop robots.txt from hiding URLs and resources that search engines need in order to understand the parts of the site that support business growth.

When to ask for a technical SEO review

A robots.txt issue can be simple, but it can also be part of a larger crawl and indexation problem.

It is worth asking for a technical SEO audit when:

your site has recently been redesigned or migrated
important pages are not being crawled or discovered
sitemap URLs are blocked by robots.txt
Search Console shows blocked-by-robots.txt issues
an ecommerce store has complex filters or URL parameters
different tools show different crawl results
developers, plugins, or CMS settings may have changed crawl controls
you are unsure which blocked URLs should stay blocked

A proper robots.txt and crawl-access review should not simply delete rules. It should identify which rules are intentional, which URL patterns are risky, which sitemap URLs conflict with robots.txt, and which templates should be prioritised.

That is the difference between a quick technical tweak and useful SEO decision-making.

For broader support with crawlability, indexation, rendering, and site structure, see our technical SEO services page.

Practical takeaway

Robots.txt problems are dangerous because they can be invisible to the people using the website every day.

The page loads. The design works. The navigation looks normal. But crawlers may be blocked from the URLs or resources that help search engines understand the site.

A strong robots.txt review should answer five questions:

What rule is causing the block?
Which real URLs match that rule?
Are those URLs in the sitemap or internal navigation?
Do those URLs support visibility, enquiries, sales, or product discovery?
Should the rule be kept, narrowed, removed, or replaced with a better SEO control?

The mistake is not having a robots.txt file. The mistake is letting old, broad, or untested rules decide which parts of the business search engines can crawl.

Need a robots.txt and crawl-access check?

Not sure whether your robots.txt file is blocking service URLs, category templates, product paths, location pages, sitemap URLs, or key resources?

SEO Strategist can review your robots.txt rules, XML sitemap conflicts, crawl access, rendering signals, and affected page types so you can see what should stay blocked, what should be narrowed, and what needs fixing first.

Request a robots.txt and technical SEO audit

FAQs

Can robots.txt stop a page from ranking?

Robots.txt can prevent or limit crawling, which can affect how search engines discover, understand, or refresh a page. It does not work in exactly the same way as a noindex tag, but it can still create SEO problems when important pages are blocked from crawler access.

Is robots.txt the same as noindex?

No. Robots.txt controls crawling. Noindex controls indexing. A noindex tag tells search engines not to include a page in search results, but search engines usually need to crawl the page to see that instruction. Blocking a page in robots.txt can prevent crawlers from accessing the page content and its directives.

Should all blocked pages be unblocked?

No. Some blocked pages should stay blocked. Admin areas, staging sections, duplicate URL patterns, and low-value crawl paths may need restrictions. The goal is to check whether important public URLs or resources have been blocked by mistake.

How do I know if a page is blocked by robots.txt?

Check the robots.txt file, test the URL pattern, inspect the URL in Google Search Console, compare affected URLs against your XML sitemap, and run a technical crawl. Look for rules that block folders, parameters, templates, or resources connected to important pages.

What is an example of a risky robots.txt rule?

A rule like Disallow: /services/ can be risky for a service business because it may block the main pages that explain what the business offers. A rule like Disallow: /category/ can be risky for an ecommerce store if category pages are important organic landing pages.

When should I get help with robots.txt?

Get help when the site is large, ecommerce-led, recently redesigned, recently migrated, or showing unexplained crawl and indexation issues. It is also worth getting a review when sitemap URLs appear to be blocked or when you are unsure whether a robots.txt rule is still needed.

When Robots.txt Is Blocking Revenue Pages Without You Realising It

What robots.txt blocking actually means

Why this becomes a revenue problem

Examples of risky robots.txt rules

1. A staging block accidentally pushed live

2. A service-folder block

3. An ecommerce category block

4. A parameter rule that catches useful pages

5. Blocking WordPress content resources

6. Blocking a location-page folder

Common causes behind accidental robots.txt blocks

Signs your robots.txt file may be blocking the wrong pages

What to check before editing robots.txt

1. Open the robots.txt file

2. Interpret each important rule

3. Test affected URL examples

4. Compare robots.txt with the XML sitemap

5. Inspect examples in Google Search Console

6. Run two technical crawls

7. Prioritise the affected templates

8. Decide whether to keep, refine, or remove the rule

Decision table: what to do after finding a blocked URL

Robots.txt vs noindex vs canonical tags

Robots.txt controls crawling

Noindex controls indexing

Canonical tags suggest the preferred version

How to prioritise fixes

When to ask for a technical SEO review

Practical takeaway

Need a robots.txt and crawl-access check?

FAQs

Can robots.txt stop a page from ranking?

Is robots.txt the same as noindex?

Should all blocked pages be unblocked?

How do I know if a page is blocked by robots.txt?

What is an example of a risky robots.txt rule?

When should I get help with robots.txt?

Comments

Leave a Reply Cancel reply