Just Grin and Bear It, Your Website Needs a Robots.txt File
Websites are dynamic creations, with loads of content lurking around every dark corner. Those dark corners may be portions of your site that weren’t created with search engines in mind. This poses a unique challenge-how can you instruct search engines to essentially ignore that content and not include it in their index of your site? Create a Robots.txt file. To alleviate your fears, it’s important to remember that a Robots.txt file is just a simple text file that communicates to the search bots what content should and shouldn’t be indexed.
The basic structure of the Robots.txt file contains two parts: the “User Agent” and “Disallow” statements. Following this process serves as a way to precisely instruct search engine robots on which pages they should ignore (not index) in their crawl of your website.
There are two paths you can follow when implementing the User Agent portion of your Robots.txt file. First, you can choose to have ALL search engines follow the same rules. This is represented by entering an asterisk (*), which acts as a “wild card” entry.
- Example: User-Agent: *
The second path you can follow involves separating the search bots out individually, providing different instructions for each search engine.
- Example: User-Agent: Googlebot (Google) or User-Agent: Slurp (Yahoo!)
This is where the fun begins. The Disallow statement allows you to insert commands to block directories, pages/files, images, and even your entire website, if need be. If nothing is listed, all URLs are free game to be crawled. Here are examples of Disallow statements at work:
- Block entire website: Disallow: /
- Block a directory: Disallow: /directoryname/
- Block a page: Disallow: /page.html
- Block specific file types: Disallow: /*.gif$ (exchange .gif for whatever file type you need blocked)
. . . it’s important to remember that a Robots.txt file is just a simple text file..
MYTH: Your site needs a Robots.txt file in order to be indexed.
FACT: No, your site will be indexed whether you create a Robots.txt file or not. A Robots.txt file will not draw robots to your site any faster than normal.
MYTH: Your site needs a Robots.txt file in order to rank higher.
FACT: No, your Robots.txt file will only tell the robots what pages and links can or cannot be indexed. However, the result of having a Robots.txt file will have a secondary effect on your site’s rankings: if you improve your site’s crawlability, you’ll improve your rankings.
MYTH: You can block pages completely by using “Disallow” statements.
FACT: No, though the Disallow statement is powerful, you cannot guarantee a 100% invisible page. Just because a page or directory is listed in your Robots.txt file doesn’t mean the search engines won’t crawl those pages. That’s an important distinction to remember. Robots.txt files block indexation but do nothing to stop crawling. If you want to create an invisible page, you should consider the use of the Meta Robots tag employing “noindex/nofollow.”
MYTH: The more bots accessing your site the better.
FACT: No, some search bots are simply out there to scour your site for e-mail addresses for spamming purposes. Knowing how to block them will aid in the ongoing spam war. Here is a large list of robots that go past normal search and indexing robots. If you are aware of malicious bots crawling your website, add them individually (with separate User Agent statements) to your Robots.txt file, and use Disallow: / to block that bot from your entire site.
|<< Straight Talk on Meta Keywords||Create XML Sitemaps >>|
|Unsexy SEO – Table of Contents|