For all of those out there that suffer from arachnophobia, this post may offend you. Today I’m going to talk all about search engine spiders. These creepy-crawlers (well, maybe not creepy) are the backbone of the world-wide-web. Whether you’re discussing GoogleBot or SLURP (Yahoo!’s bot), they are the tools by which the search engines discover new websites, new content and ultimately play a huge role in the health of your SEO campaign. But more to the point, when talking about the spiders, crawlers and bots of the world, it’s important to know the lingo! Get ready to learn what crawl, index and cache mean to you and your SEO campaign.
Search engines are built around the ability of search spiders to discover new websites and quickly (and accurately) save those pages for future reference. This process in a nutshell is known as “crawling” or the “web crawl.” Many SEOs refer to their website’s crawl rate, or the rate at which the search engine spiders are returning to pull fresh copies of the content. If you launch a brand new website, the crawl is the first sign that you’re on your way to being indexed by the search engines (refer to the next section!).
So, this begs the question: “How do I tell Google to come crawl my website?” There are a few ways you can do this.
- Create accounts at Google Webmaster Tools and Yahoo! SiteExplorer, verify your site and create a sitemap. This is like raising a big red flag saying “I’m over here!”
- But if you really want to get a jump on having your website crawled, start building links from established sites. You can start with strong directories or use any number of link building strategies – the important point is getting links!
After your website is established, you may want to increase or decrease your crawl rate. For Google, the first place to look is in your Webmaster Tools account. Under “Statistics” you can view activity from Googlebot on your website for the past 90 days. If you see a problem and need to speed things up or slow them down, go to the “Settings” section. From here you can take control and set a custom crawl rate. Google recommends you only set a custom crawl rate if you are having “traffic problems on your server.”
While that’s all good for Google, what about all the other search engines? Take a gander at this great post from Search Engine Journal that details 10 ways to increase your crawl rate. I won’t repeat them all here, but my favorite? “Update your content often and regularly.” Good advice!
…the most efficient way to get frequent and deep crawls is to develop a website that search engines see as important and valuable.
To say that the search engines “index” your website’s content is a fancy way of stating that they have your stuff saved on their servers. After one of the search engine spiders has crawled a page on your website, that page’s textual content and other important data is handed off to the “indexer” which stores those pages in a database. You can check how many pages from your website have been indexed in a few different ways:
- Visit Google, Yahoo! or MSN and enter the query – site:mydomain.com. This search will show you the pages that are contained in each search engine’s index for your root domain. You can also check sub-domains by entering – site:www.mydomain.com or site:blog.mydomain.com, etc. (There are issues with primary vs. supplemental index, but that’s a blog post for another day!)
- Utilizing Google Webmaster Tools and Yahoo! SiteExplorer will also give you results for pages in each engine’s index.
So, your crawl rate is tied to how fast and how many pages from your site will be included in the index. Crawl rate will also determine how fast changes you’ve made to a particular page will show up in each search engine’s index. And just because you’re indexed doesn’t mean you’ll rank in the SERPs. You’ve got to do your homework and perform the SEO basics, too.
I like cash, cash is good. Wait, I don’t think we’re talking about the same cash. Oh, you mean CACHE – as in the archived copy of a webpage as indexed by a search engine! If you enter the search query cache:mydomain.com, this will show you the last version of your webpage that was downloaded. The cached version of your webpage is a literal copy of the page that is saved on Google or Yahoo!’s server. So, there is a major difference between index vs. cache – index is text and data, cache is a literal copy. Remember that.
Hopefully, I’ve schooled you on the hip SEO lingo today as it regards to the crawl, index and cache of your website. These are important terms to remember and will help you to navigate through search results and to troubleshoot issues with your SEO campaign.
Do you have anything to add or have a question? Leave me a comment!