Just a few weeks ago I attended the SEOmoz conference in Seattle. I learned a few things about and robots files that I wanted to bring to everyone’s attention.
Robots.txt Regarding PageRank
According to Rand, do not to add pages to your robots.txt file that have generated PageRank, or else you’re blocking that page from passing any of its juice along. Essentially, any page that is added to the Robots.txt file can still be indexed. However if that page has generated any PageRank it’s like the search engines have arrived but can’t pass on any PR to any other page, so you’re technically blocking PR. What you should do instead of adding a page to the robots.txt is to add a meta no index, follow so that page can still pass on PageRank but not be indexed. If you don’t want a page to generate any PageRank and not be indexed then you need to add a meta no follow, no index.
This may explain why you might see an error in your Google Webmaster tools account saying a certain file is being blocked by the robots.txt file. It’s not a good idea to tell the search engines not to crawl a page that has generated PageRank.
Now for the Sitemap.xml file, I have mentioned before in a previous post that sitemaps contain priority and frequency levels that you can set to guide the search engines toward more important pages of your site. While that information is true, Rand did mention that the search engines at this time don’t pay any attention to priority levels or frequency settings in your XML sitemap. Why those two items exist in an XML sitemap, I’m not really sure.