The Google representatives we spoke to said that one of the worst problems people have with the robots.txt is that they accidentally block their home page. This can have some pretty harmful effects on getting spidered. Also, pages that appear both in a robots.txt file and your sitemap are excluded from Google's index; the robots.txt is always obeyed. The robots analysis tool from Google Sitemaps makes it pretty obvious if you have something set terribly wrong. The header tells you right away if your robots.txt is blocking access to your homepage and when Googlebot last downloaded the file.
There's a lot more than just this to the tool. You can also see Google's view of your robots.txt and change it to see what effects it will have. This is Google's editable view of your robots :
You can edit the textbox above, which will help you with the robots tool below. Basically, Google allows you to test drive their various spiders: Googlebot, Googlebot-Mobile, Googlebot-Image, Googlebot-MediaPartners. First, you edit the robots.txt the spider sees above, then enter the URLs to query (seeing if they are blocked or allowed to be indexed), and choose the crawlers you want to test. The results end up looking something like the ones below.
Robots TestAs you can see, in the results at the bottom of the shot, SEO Chat's homepage is allowed to be indexed, while my author bio page is blocked. In case the screenshot above is too small, here is the details Google gives me about the blocked page:
Blocked by line 2 : Disallow: /cp/bio Detected as a directory; specific files may have different restrictions
Sure enough, if you scroll up, line 2 of the robots file is: Disallow: /cp/bio.
Having options to change the user-agents allows you to test if the right crawlers are hitting the right ages, in case your robots.txt makes different demands to different spiders. This is great assurance that you aren't blocking access to the wrong ones by accident, especially since Google is basically validating the instructions itself instead of using a third party script. SEO Chat doesn't give different instructions to different crawlers, so all of them turn out the same for us.
While experienced SEOs might be comfortable with the robots.txt already, it's definitely worthwhile to visit this anyway. Everyone makes typos, and not checking could create real indexing problems. For those a little unsure about whether you robots.txt file is working exactly like you want it to, this tool has everything you should need.






















No comments:
Post a Comment