Welcome to SEO Boy, the authority on search engine optimization -- how to articles, industry news, insider tips, and more! If you like what you see, you can receive free and daily updates via email or RSS.
Print This Post Print This Post

Unleash Your Website’s Hidden SEO Potential by Optimizing Your PDF Files

October 30th, 2008 | | Crawlability, Nuts & Bolts of Optimization

PDFs are a widely used file format both online and offline. They are a convenient and multi-purposed file format that allows for restricted editing, compression when images are present and most importantly, PDFs preserve the look and feel of the source document. From the earliest days of the internet, PDFs have been applied to websites for many of those same reasons. However, in the early days of SEO, it was widely considered that PDF files could not be properly crawled by the search engines. That’s just simply not true, but there’s more to the story than just crawlability. Unleash your website’s hidden potential by optimizing your PDF files!

PDF File Creation

The first, and most important action when optimizing PDF files for your website is considering how you created the file. It is widely believed that if you create your PDF with Photoshop, it will remain an image file. To ensure that your text remains as text, create the PDF by starting with MS Word, Adobe Pagemaker, etc. The key here is maintaining text from source file to PDF file.

On Page Optimization

Optimizing for PDFs is no different than optimizing for any ole HTML page on your site! Here’s a quick checklist for what to include in your PDF:

  • When creating your document in Word, utilize the H1, H2 commands to assign proper hierarchical importance to your headings.
  • Insert keywords into your headlines, body content, file name, etc. (Duh!)
  • Link out from your PDF to other pages on your site to pass PageRank and to improve the user experience.

Technical Optimization

Now here’s the kicker for optimizing your PDFs. Every PDF has special properties encoded into them, including Meta data! These are found in the “Document Properties” and you can edit the Title (page title), Subject (meta description), Keywords (meta keywords) and Author. The most important aspects here are the Title and Subject. If you don’t fill these out, you’ll leave yourself open to the search engines pulling text from your document to create your PDF’s search engine listing. Another “duh!” task is to ensure that the PDF is linked to from a prominent page to ensure that it is found by the search engines.

If you can complete these simple tasks, you will open up even more rankable content to the search engines. Have any of you had experience in optimizing and ranking your site’s PDFs? If so, I’d love to hear about it!

Facebook   IN   Stumble Upon   Twitter   Sphinndo some of that social network stuff.
  • http://geekimo.com i-CONICA

    Is there any ideas of file size of PDF files?
    I’m working on a very large site for a customer and they have a 15MB PDF file in a page.

    I’d appreciate some insight as to the limits of file size, will large files be indexed?
    Does the file have to be 100% loaded before it can be indexed? Because presumably googlebot won’t download a 15MB PDF in order to index?


  • http://www.seoboy.com John

    @ i-CONICA,

    Assuming that your PDF file is text-based, there shouldn’t be any problem with Google (or the other search engines) indexing it.

    Have you checked to see if Google has indexed the PDF yet? You can use the site: command and the URL of your PDF to check.

    Big or small, the search engines should index your PDF. That being said, it should be noted that they may not crawl the entire document (if it is 100′s of pages long for instance). In this case, you should make an effort to optimize the “Document Properties” of the PDF and the first few pages with your targeted keywords.

  • http://GeekIMO.com i-CONICA

    Hi, John.
    Thanks for the reply.
    That clears things up, so PDFs are crawled very similarly to the way webpages are crawled.
    Presuming the file would be indexed incrementally, it’s excellent news.
    There are other smaller PDF files indexed in the site, but at the time of writing this, the large file hasn’t been indexed.
    I’ll keep checking every few days. The site gets a relatively high volume of traffic and the domain is over 10 years old so it’s crawled regularly.

    Thankyou for the advice.