Sometimes you have pages that you just don’t want the search engines to read. For instance, an online coupon page might only be good for a limited time, or you may need duplicate content on interior pages. If you don’t want to update the robots.txt file to exclude individual pages on your site, you can use the instructions below:
To Block or remove pages using meta tags:
Rather than use a robots.txt file to block crawler access to pages, you can add a
tag to an HTML page to tell robots not to index the page. This standard is described here.
To prevent all robots from indexing a page on your site, you’d place the following meta tag into the section of your page:
meta name=”ROBOTS” content=”NOINDEX, NOFOLLOW”
To allow other robots to index the page on your site, preventing only Google’s robots from indexing the page, you’d use the following tag:
meta name=”GOOGLEBOT” content=”NOINDEX, NOFOLLOW”
To allow robots to index the page on your site but instruct them not to follow outgoing links, you’d use the following tag:
meta name=”ROBOTS” content=”NOFOLLOW”
(note that these metas would have the less than (<) and greater than (>) symbols around them, but when they are rendered in the blog program that way they vanish into the source code.)
If you’re wondering why Google isn’t crawling a page on your site, you can also check the source code to see if anyone put a command like one of these on the page in the past. This is a fairly common problem among our customers who do not have search engine rankings, and it is one of the easiest to fix. It is also sometimes used in the development process to prevent premature indexing, but then forgotten when the site goes live.