Matt Thommes
Personal weblog. Written and developed since 2001.Hiding pages from search results
September 11, 2007 / Filed under: Google, Tips, Web Development
Sometimes it's wise to hide certain pages from search engine crawlers. A good example is having your resume posted on your web site. On one hand, it's helpful to have a direct link to your resume, where anyone can view it upon request. On the other hand, a resume usually contains personal information, as well as company-specific job duties that probably shouldn't be showing up in a random Google or Yahoo search.
Thankfully, Google provides two simple ways to ensure private pages remain hidden from search engines:
If your resume is located at
By using meta tags, you provide more specific, page-level instructions.
Simply include this
Thankfully, Google provides two simple ways to ensure private pages remain hidden from search engines:
- robots.txt file
- Meta tags
robots.txt
By creating a robots.txt file and placing it in your root directory of your web site, you are providing instructions to "Googlebot" (Google's site crawler) on which pages or directories you'd like hidden from search results.If your resume is located at
/resume.html
on your domain, you can stop Googlebot from indexing that page by including this text in the robots.txt
file:User-agent: Googlebot Disallow: /resume.html
That's it! Include as many rules as you'd like - each on a separate line. Google will ignore these pages or directories, preventing them from showing up in search results.
Meta tags
Although using therobots.txt
file to block pages is quick and easy, there's another way that provides an added level of security.By using meta tags, you provide more specific, page-level instructions.
Simply include this
<meta>
tag in your HTML document:<html> <head> <meta name="googlebot" content="noindex"> ...
Is it working properly?
To test whether Google is properly acknowledging your instructions, you can log into Google Webmaster Tools, choose your domain, and analyze the robots.txt file.Other resources
- Official Google Blog: Controlling how search engines access and index your website.
- Official Google Blog: The Robots Exclusion Protocol.
- Official Google Blog: Robots Exclusion Protocol: now with even more flexibility.