Robots.txt becomes an SEO problem when it blocks URLs or site resources that search engines need to crawl. The site may still work perfectly for visitors, but crawlers may be told not to access important sections of it.
That is risky when the blocked areas include service pages, ecommerce categories, product pages, location pages, pricing pages, or the files needed to render those pages properly.
A robots.txt issue is easy to miss because it does not usually break the visible website. A user can still open the page. The menu can still work. The design can still look fine. But search engines may be receiving crawl instructions that prevent them from checking, refreshing, or understanding parts of the site that support leads and sales.
Before blaming content quality, backlinks, rankings, or Google updates, it is worth checking whether your own robots.txt file is quietly blocking something important.

What robots.txt blocking actually means
Robots.txt is a plain text file that gives crawlers instructions about which parts of a website they are allowed to crawl.
It usually sits at a URL like:
https://seostrategist.co.za/robots.txt
A basic robots.txt rule may look like this:
User-agent: *
Disallow: /admin/
This tells all crawlers that follow robots.txt instructions not to crawl URLs inside the /admin/ folder.
That kind of rule can be useful. Many websites use robots.txt to keep crawlers away from admin areas, internal search results, duplicate parameter URLs, staging sections, or low-value crawl paths.
The problem starts when a rule is too broad, outdated, copied from a staging site, or written without checking which URLs sit inside the blocked path.
Robots.txt does not remove a page from the website for users. It controls crawler access. That is why a business owner, marketer, or developer can open a page in the browser and assume everything is fine while search engines may be blocked from crawling it.
There is another important nuance: robots.txt is not a privacy or security tool. A blocked URL can still be found in other ways, and in some cases a search engine may know the URL exists without being able to crawl the page content properly. That is why robots.txt should be used for crawl control, not for hiding sensitive content.
Why this becomes a revenue problem
Not every blocked URL is bad. Some pages should stay out of crawl paths.
The risk appears when robots.txt blocks URLs or resources that help the business generate visibility, enquiries, sales, bookings, or trust.
This can affect:
- core service pages
- ecommerce category pages
- product pages
- city or location landing pages
- pricing, package, or comparison pages
- lead-generation pages
- pages listed in the XML sitemap
- JavaScript, CSS, image, or template resources needed for rendering
- pages that need to be crawled after updates
For example, if an ecommerce store accidentally blocks /category/, search engines may struggle to crawl a major part of the store structure. If a service business blocks /services/, its main commercial pages may become harder to access. If a WordPress site blocks key theme or script files, search engines may have a weaker view of how the page renders.
The commercial question is not simply “is anything blocked?”
The better question is: does the blocked pattern overlap with pages that search engines need to crawl for the business to compete?
For a deeper technical review of how robots.txt and sitemaps should work together, see our guide to robots.txt and XML sitemap setup.
Examples of risky robots.txt rules
The fastest way to understand robots.txt risk is to look at the rule and then ask what URL patterns it affects.
These examples are simplified, but they show how small rules can create large crawl problems.
1. A staging block accidentally pushed live
User-agent: *
Disallow: /
This tells crawlers not to crawl the entire site.
It is often used on staging or development sites before launch. It becomes a serious issue if it is copied to the live website by mistake during a redesign, rebuild, or migration.
What to check:
- Was the site recently launched or migrated?
- Is the live domain using a staging robots.txt file?
- Are all important URLs blocked in crawler tests?
2. A service-folder block
User-agent: *
Disallow: /services/
This blocks URLs inside the /services/ folder.
For a service business, that could include the pages most closely tied to enquiries. If the business has URLs such as /services/technical-seo/, /services/local-seo/, or /services/seo-audit/, this rule needs urgent review.
What to check:
- Are core service pages inside the blocked folder?
- Are those URLs in the XML sitemap?
- Did the rule come from an old development or duplicate-content decision?
3. An ecommerce category block
User-agent: *
Disallow: /category/
This can be dangerous for ecommerce sites if category pages are important organic landing pages.
A store may intend to block thin filtered URLs or duplicate paths, but a broad category block can affect the pages that organise products and target non-branded searches.
What to check:
- Are high-value product categories inside this path?
- Do these category pages target search demand?
- Are filters or parameters being confused with the main category structure?
4. A parameter rule that catches useful pages
User-agent: *
Disallow: /*?filter=
Parameter rules can be useful on ecommerce sites, but they need careful handling.
This type of rule may help control crawl waste from filtered combinations. But if filtered URLs are part of the site’s SEO strategy, or if the rule pattern catches more than intended, it can hide useful landing-page variants from crawlers.
Pattern matching can vary by crawler. Major search engines may support wildcard-style matching, but not every bot interprets robots.txt patterns in exactly the same way. Test the rule using the search engines and crawl tools that matter for your site.
What to check:
- Are filtered pages intentionally indexable or purely low-value duplicates?
- Do any filtered pages receive internal links from category pages?
- Are product or collection URLs using the same parameter pattern?
5. Blocking WordPress content resources
User-agent: *
Disallow: /wp-content/
This does not usually block the main page URL itself, but it can block files that help search engines render and understand the page.
On a WordPress site, /wp-content/ can include themes, plugins, images, CSS, and JavaScript. Blocking everything in that folder may create rendering or resource-access issues.
What to check:
- Are important CSS or JavaScript files blocked?
- Are images needed for content or products blocked?
- Does Search Console show blocked resources or rendering differences?
6. Blocking a location-page folder
User-agent: *
Disallow: /locations/
If a business uses city or location landing pages, this type of rule can affect local or regional search visibility.
For example, a company targeting Johannesburg, Cape Town, Durban, or Pretoria may rely on location pages to explain service availability. If the whole folder is blocked, crawlers may not be able to access those pages properly.
What to check:
- Are location pages part of the search strategy?
- Are they unique and useful, or thin duplicates?
- Are they linked from service pages, contact pages, or local SEO content?
Common causes behind accidental robots.txt blocks
Most robots.txt problems come from routine website work rather than one obvious mistake. After you have reviewed the risky rule examples above, look for the source of the change.
Common causes include:
- a staging or development robots.txt file being pushed to the live domain
- an old rule staying in place after the URL structure changed
- an SEO plugin, CMS setting, security tool, or ecommerce platform generating rules automatically
- a developer blocking a folder during testing and not removing the rule before launch
- an ecommerce crawl-control rule being applied too broadly to categories, filters, products, or collections
- resource folders being blocked without checking whether CSS, JavaScript, images, or template files are needed for rendering
The cause matters because it affects the fix. A copied staging block may need urgent removal. A filter rule may only need narrowing. A legacy rule may need to be checked against the current sitemap and internal navigation before anything changes.
Signs your robots.txt file may be blocking the wrong pages
Robots.txt problems are not always obvious from the front end. Look for signals across crawl tools, sitemap data, and Google Search Console.
Common warning signs include:
- important URLs missing from crawl reports
- XML sitemap URLs showing crawl access issues
- newly launched pages not being crawled
- Search Console showing blocked-by-robots.txt messages
- sudden crawl changes after a redesign or migration
- service, category, product, or location sections missing from crawl data
- important resources blocked during rendering checks
- crawl results that do not match the site navigation or sitemap
The next step is to separate harmless blocks from risky blocks.
Blocking /wp-admin/ is usually expected. Blocking /services/, /category/, /products/, or /locations/ needs closer inspection.
What to check before editing robots.txt
Do not treat robots.txt changes as a quick copy-and-paste fix. A useful review should identify what the rule does, which URL patterns it affects, whether those URLs should be crawlable, and what happens if the rule is changed.
Use this workflow.
1. Open the robots.txt file
Review the file directly at the root of the domain.
Look for rules using Disallow, especially broad rules affecting folders, parameters, product sections, service sections, location sections, or recently changed URL paths.
2. Interpret each important rule
For each rule, ask what it actually blocks.
A rule like this:
Disallow: /products/
is very different from this:
Disallow: /products?sort=
The first may block a whole product folder. The second may block a sorting parameter pattern. One may affect core product discovery. The other may be a reasonable crawl-control rule.
3. Test affected URL examples
Do not only inspect the rule. Test real URLs that match the pattern.
For example, if the rule is:
Disallow: /category/
check whether URLs like these are affected:
/category/running-shoes/
/category/office-furniture/
/category/industrial-supplies/
The goal is to understand which templates and page types sit inside the blocked path.
4. Compare robots.txt with the XML sitemap
Your XML sitemap should contain URLs you want search engines to discover and crawl.
If the sitemap includes URLs blocked by robots.txt, the site is sending mixed signals.
A sitemap is effectively saying, “these URLs are important”. The robots.txt file may be saying, “do not crawl them”. That conflict should be reviewed before any broader SEO conclusions are made.
You can read more about this relationship on our robots.txt and XML sitemap setup page.
5. Inspect examples in Google Search Console
Use URL inspection and indexing reports to check affected URLs.
Look for blocked-by-robots.txt messages, crawl access issues, and examples of URLs Google has discovered but cannot crawl properly.
Search Console should not be the only diagnostic source, but it helps confirm how Google is seeing the issue.
6. Run two technical crawls
Run one crawl that respects robots.txt and another controlled crawl that ignores robots.txt.
Compare the difference.
If the second crawl discovers important pages missing from the first crawl, robots.txt is probably hiding useful crawl paths.
7. Prioritise the affected templates
Once you know what is blocked, group the affected URLs by template or page type.
A blocked admin path is low concern. A blocked service-folder template, category template, product template, or location-page template is a higher priority.
8. Decide whether to keep, refine, or remove the rule
The decision should not be “blocked equals bad”. The decision should be based on URL value.
Some rules should stay. Some should be narrowed. Some should be removed. Some should be replaced with a better noindex, canonical, parameter-handling, or internal-linking decision.
Decision table: what to do after finding a blocked URL
Use this as a practical triage guide.
| Blocked URL type | Is it usually a problem? | What to check first | Priority level |
|---|---|---|---|
| Admin or login areas | Usually no | Confirm they are not in the XML sitemap and not linked as public pages | Low |
| Staging or test folders | Usually no, if not live content | Confirm the rule is not blocking the live site or public launch URLs | Medium to high after launch |
| Core service pages | Usually yes | Check whether service URLs are in the blocked path and sitemap | High |
| Ecommerce category pages | Often yes | Check search demand, internal links, sitemap inclusion, and whether filters are involved | High |
| Product pages | Depends | Check product value, uniqueness, availability, and whether products rely on blocked resources | Medium to high |
| Location or city pages | Often yes | Check whether these pages support local or regional search intent | High if used for acquisition |
| Internal search results | Usually no | Confirm they are not being used as landing pages | Low to medium |
| Filter or sort parameters | Depends | Check whether filtered URLs are crawl waste or intentional landing pages | Medium |
| CSS, JavaScript, images, theme files | Depends | Check rendering, mobile usability, and whether blocked resources affect page understanding | Medium to high |
| URLs listed in XML sitemap | Often yes | Confirm why a URL is being submitted but blocked from crawling | High |
This table is not a substitute for a technical audit, but it helps avoid two common mistakes: unblocking everything without thinking, or leaving important templates blocked because the site looks fine in a browser.
Robots.txt vs noindex vs canonical tags
Robots.txt is often confused with other SEO controls. That confusion can lead to the wrong fix.
Robots.txt controls crawling
Robots.txt tells crawlers whether they are allowed to access certain URLs or resources.
If a page is blocked by robots.txt, search engines may not be able to crawl the page content properly.
Noindex controls indexing
A noindex tag tells search engines not to include a page in search results.
But for a search engine to see a noindex tag, it usually needs to crawl the page. If the page is blocked by robots.txt, the crawler may not be able to access the noindex instruction.
Canonical tags suggest the preferred version
A canonical tag helps indicate the preferred version of similar or duplicate pages.
It is not the same as blocking a page from crawling. It is also not a guaranteed instruction. It works best when search engines can crawl and compare the relevant pages.
The simple rule is this:
- use robots.txt to control crawl access
- use noindex to keep crawlable pages out of search results
- use canonical tags to consolidate similar or duplicate pages where a preferred version exists
Before changing any of them, confirm what problem you are solving.
How to prioritise fixes
After the decision table, prioritisation should be simple: fix blocked patterns that affect acquisition, conversion, product discovery, or rendering first.
Highest-priority checks usually include:
- service URLs that explain what the business offers
- ecommerce category templates that target non-branded searches
- product paths with search demand, unique content, or strong sales value
- city or location URLs used for regional visibility
- pricing, package, comparison, or enquiry-supporting pages
- CSS, JavaScript, images, or template files needed to render key pages correctly
- sitemap URLs that are being submitted but blocked from crawling
Lower-priority blocks usually include admin areas, login paths, staging folders, internal search results, and duplicate parameter combinations, provided they are not being used as public landing pages.
The aim is not to make every URL crawlable. The aim is to stop robots.txt from hiding URLs and resources that search engines need in order to understand the parts of the site that support business growth.
When to ask for a technical SEO review
A robots.txt issue can be simple, but it can also be part of a larger crawl and indexation problem.
It is worth asking for a technical SEO audit when:
- your site has recently been redesigned or migrated
- important pages are not being crawled or discovered
- sitemap URLs are blocked by robots.txt
- Search Console shows blocked-by-robots.txt issues
- an ecommerce store has complex filters or URL parameters
- different tools show different crawl results
- developers, plugins, or CMS settings may have changed crawl controls
- you are unsure which blocked URLs should stay blocked
A proper robots.txt and crawl-access review should not simply delete rules. It should identify which rules are intentional, which URL patterns are risky, which sitemap URLs conflict with robots.txt, and which templates should be prioritised.
That is the difference between a quick technical tweak and useful SEO decision-making.
For broader support with crawlability, indexation, rendering, and site structure, see our technical SEO services page.
Practical takeaway
Robots.txt problems are dangerous because they can be invisible to the people using the website every day.
The page loads. The design works. The navigation looks normal. But crawlers may be blocked from the URLs or resources that help search engines understand the site.
A strong robots.txt review should answer five questions:
- What rule is causing the block?
- Which real URLs match that rule?
- Are those URLs in the sitemap or internal navigation?
- Do those URLs support visibility, enquiries, sales, or product discovery?
- Should the rule be kept, narrowed, removed, or replaced with a better SEO control?
The mistake is not having a robots.txt file. The mistake is letting old, broad, or untested rules decide which parts of the business search engines can crawl.
Need a robots.txt and crawl-access check?
Not sure whether your robots.txt file is blocking service URLs, category templates, product paths, location pages, sitemap URLs, or key resources?
SEO Strategist can review your robots.txt rules, XML sitemap conflicts, crawl access, rendering signals, and affected page types so you can see what should stay blocked, what should be narrowed, and what needs fixing first.
Request a robots.txt and technical SEO audit
FAQs
Can robots.txt stop a page from ranking?
Robots.txt can prevent or limit crawling, which can affect how search engines discover, understand, or refresh a page. It does not work in exactly the same way as a noindex tag, but it can still create SEO problems when important pages are blocked from crawler access.
Is robots.txt the same as noindex?
No. Robots.txt controls crawling. Noindex controls indexing. A noindex tag tells search engines not to include a page in search results, but search engines usually need to crawl the page to see that instruction. Blocking a page in robots.txt can prevent crawlers from accessing the page content and its directives.
Should all blocked pages be unblocked?
No. Some blocked pages should stay blocked. Admin areas, staging sections, duplicate URL patterns, and low-value crawl paths may need restrictions. The goal is to check whether important public URLs or resources have been blocked by mistake.
How do I know if a page is blocked by robots.txt?
Check the robots.txt file, test the URL pattern, inspect the URL in Google Search Console, compare affected URLs against your XML sitemap, and run a technical crawl. Look for rules that block folders, parameters, templates, or resources connected to important pages.
What is an example of a risky robots.txt rule?
A rule like Disallow: /services/ can be risky for a service business because it may block the main pages that explain what the business offers. A rule like Disallow: /category/ can be risky for an ecommerce store if category pages are important organic landing pages.
When should I get help with robots.txt?
Get help when the site is large, ecommerce-led, recently redesigned, recently migrated, or showing unexplained crawl and indexation issues. It is also worth getting a review when sitemap URLs appear to be blocked or when you are unsure whether a robots.txt rule is still needed.

Leave a Reply