Robots Running Amok!
by John Greer on 08/05/2009 at 3:29 pm in SEO, SEO Best Practices
Robots (A.K.A. crawlers or spiders) are an automated method to discover the pages and content on a website. Search engines employ robots, making robots integral to natural search traffic and also playing a role in paid search traffic.
To make your way through this future of shiny metal and emotionless automatons, an important but under-appreciated page on a website is the “robots.txt” page. It serves as a guide for robots on how to behave on a site. Other robots besides search engines, including some nefarious robots (kind of like Decepticons) can sometimes be managed by this file as well.
Some tips for ensuring your site is prepared for search engine robots include:
- List all of your XML sitemaps. An XML sitemap is an easy source for robots to find all of your pages, videos, and other files. By including the sitemap URLs here, you are ensuring search engines can send traffic to any of your pages.
- Tell the robots which areas are off-limits. This doesn’t involve Asimov’s Three Laws, but you can prohibit access to files and directories that shouldn’t be found in search engines. Additionally, you can keep robots focused on crawling only important pages and keep them from spending all their time on unimportant content.
- Always serve the same content to robots as users. Otherwise you can be hit with a penalty for misleading the search engines.
- Avoid crawl delay for search engines. In rare cases, a robot can take up a lot of your server’s resources. You can add code to slow the robot down in these cases, but slowing a major search engine down can cause a traffic loss.
- Test your file! Once you’re all done, use the free tools from Google Webmaster Tools and Bing Webmaster Center to make sure everything is working properly.
This page can always be found at www.YourGreatSite.com/robots.txt. Don’t be surprised if it hasn’t been created – even some major sites like Yahoo haven’t made one.