1.9.12

Hiding pages from Google search results - Matt Thommes

Matt Thommes / Hiding pages from search results

Matt Thommes

Personal weblog. Written and developed since 2001.

Hiding pages from search results

September 11, 2007 / Filed under: Google, Tips, Web Development

Sometimes it's wise to hide certain pages from search engine crawlers. A good example is having your resume posted on your web site. On one hand, it's helpful to have a direct link to your resume, where anyone can view it upon request. On the other hand, a resume usually contains personal information, as well as company-specific job duties that probably shouldn't be showing up in a random Google or Yahoo search.

Thankfully, Google provides two simple ways to ensure private pages remain hidden from search engines:

robots.txt file
Meta tags

robots.txt

By creating a robots.txt file and placing it in your root directory of your web site, you are providing instructions to "Googlebot" (Google's site crawler) on which pages or directories you'd like hidden from search results.

If your resume is located at /resume.html on your domain, you can stop Googlebot from indexing that page by including this text in the robots.txt file:

User-agent: Googlebot
Disallow: /resume.html

That's it! Include as many rules as you'd like - each on a separate line. Google will ignore these pages or directories, preventing them from showing up in search results.

Meta tags

Although using the robots.txt file to block pages is quick and easy, there's another way that provides an added level of security.

By using meta tags, you provide more specific, page-level instructions.
Simply include this <meta> tag in your HTML document:

<html>
<head>
<meta name="googlebot" content="noindex">
...

Is it working properly?

To test whether Google is properly acknowledging your instructions, you can log into Google Webmaster Tools, choose your domain, and analyze the robots.txt file.

Screenshot of Google Webmaster tools

Other resources

Official Google Blog: Controlling how search engines access and index your website.
Official Google Blog: The Robots Exclusion Protocol.
Official Google Blog: Robots Exclusion Protocol: now with even more flexibility.

50k- IT dump