- The Adventures of SEO Boy® - http://www.seoboy.com -
Take Control of Your SEO Destiny with a Robots.txt File
Posted By John On November 20, 2008 @ 3:16 pm In Basic SEO Tips,Nuts & Bolts of Optimization | 1 Comment
One of the worst things a self-respecting SEO can do is feel out of control. “It doesn’t matter what I do, Google will decide how my site ranks in the end.” That’s dangerous talk, for sure. There are plenty of straight-forward tasks that every site owner can perform to take control of their SEO destiny (or rankings if you prefer). Of those tasks, the creation and correct implementation of a robots.txt file is of extreme importance.
What is a robots.txt file? At its simplest form, the robots.txt file  is a text file that instructs the search engine spiders which directory paths and pages should and shouldn’t be crawled. With this text document, you can communicate with all web crawlers at once, or with each individual crawler as needed to pass along any necessary instructions.
Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.
Why is it important to use robots.txt? Put simply, not every page of your website needs to be crawled and potentially ranked in the SERPs. This could be utility pages (like a Terms of Service) or a directory that contains information visitors would have no use for. The robots.txt file allows you to Allow or Disallow that content from your site’s crawl. To think of it another way, the robots.txt can be used to “shape” how the search engines see your site – restricting non-essential content so that all of the robots’ energy can be used to crawl pages that you want to have indexed and ranked in the SERPs. And if all of that weren’t enough, Google’s official Webmaster Guidelines  explicitly recommend their use.
Make use of the robots.txt file on your web server. This file tells crawlers which directories can or cannot be crawled.
Basic Creation of a Robots.txt File
The syntax of a very basic robots.txt use  two rules: User-agent and Disallow. The first designates which robot you’re communicating with (i.e. Googlebot, Slurp, etc.), or can simply communicate to all at once (with an asterisks *). The second is used to list the pages you want blocked!
These two lines are considered a single entry in the file. You can include as many entries as you want. You can include multiple Disallow lines and multiple user-agents in one entry.
Using the Disallow statement, you can block your entire site (Disallow: /), a directory (Disallow: /directory/), a page (Disallow: /page.html), images and even specific file types (Disallow: /*.gif$).
Things to Watch Out For
As with most things SEO, there are several pitfalls and oddities to avoid when using a robots.txt file. Here are some of the most important:
So, as you can see, the robots.txt file is an important weapon on your SEO arsenal. You should never feel that the way search engines crawl and view your website is out of your control. All you need to do is create a basic text file, add a few statements and “whalah!” you’ve taken a step towards being a smart, savvy SEO.
Article printed from The Adventures of SEO Boy®: http://www.seoboy.com
URL to article: http://www.seoboy.com/take-control-of-your-seo-destiny-with-a-robotstxt-file/
URLs in this post:
 robots.txt file: http://www.robotstxt.org/robotstxt.html
 Webmaster Guidelines: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=35769
 robots.txt use: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40360
 here: http://www.searchtools.com/robots/robots-txt.html
 robots.txt: http://tools.seobook.com/robots-txt/
 Teamwork: The Marketing, IT and Finance Struggle for SEO Control: http://www.seoboy.com/teamwork-the-marketing-it-and-finance-struggle-for-seo-control/
Copyright © 2008 The Adventures of SEO Boy. All rights reserved.