akhtar

Robots.txt and Sitemap Error

by @akhtar (157), 6 years ago

Google Search Console is indicating that my has been blocked by robots.txt file. While testing it, through the robots.text testing tool it says ALLOWED.

Wondering why Google mentioned the sitemap contains URLs which are blocked by robots.txt. How to resolve this issue?

4111 Views
3 Upvotes
6 Replies
3 Users
dummy image
by @ms (3855), 5 years ago best answer

Your robots.txt file looks good to me, even though Allow isn't defined in the standard. However, major crawlers should know this directive.

See (from http://tools.seochat.com/tools/robots-txt-validator):

The official standard does not include Allow directive even though major crawlers (Google and Bing) support it. If both Disallow and Allow clauses apply to a URL, the most specific rule - the longest rule - applies. To be on the safe side, in order to be compatible to all robots, if one wants to allow single files inside an otherwise disallowed directory, it is necessary to place the Allow directive(s) first, followed by the Disallow. It is still nonstandard.

I guess empty robots.txt would work just fine for, just like having no robots.txt at all.

Was this answer helpful for you?
Sort replies:
ms
by @ms (3855), 5 years ago

Post your robots.txt and exact URLs you're getting errors for in your Search Console. Can't really tell without knowing your robots file.

akhtar
ms
by @ms (3855), 5 years ago

Your robots.txt file looks good to me, even though Allow isn't defined in the standard. However, major crawlers should know this directive.

See (from http://tools.seochat.com/tools/robots-txt-validator):

The official standard does not include Allow directive even though major crawlers (Google and Bing) support it. If both Disallow and Allow clauses apply to a URL, the most specific rule - the longest rule - applies. To be on the safe side, in order to be compatible to all robots, if one wants to allow single files inside an otherwise disallowed directory, it is necessary to place the Allow directive(s) first, followed by the Disallow. It is still nonstandard.

I guess empty robots.txt would work just fine for, just like having no robots.txt at all.

akhtar
by @akhtar (157), 5 years ago

Thanks Martin for this help and information.

ms
by @ms (3855), 5 years ago

No problem, glad to help @akhtar. You can accept answer if it answered your question.

lishmaliny
by @lishmaliny (175), 4 years ago

Blocked sitemap URLs are typically caused by web developers improperly configuring their robots.txt file.Check for any Disallow rules within your robots.txt file. The robots.txt file should be located in your root directory as follows: https://example.com/robots.txt

Join the forum to unlock true power of SEO community

You're welcome to become part of SEO Forum community. Register for free, learn and contribute.

Log In Sign up