Avoid Indexing Tag Pages In Your Websites

SEO
Here's reasons why you should not let search engines crawl your tag and search pages and a solution to fix this problem.

Let's assume you run a website based off any of the popular content manage systems and blogging software used today. Most of these will have facilities to create a taxonomy of your site (aka tagging). While this is helpful for the user, it can be a nightmare for search engines.

Take a look at this website graph below. Here's two websites, dogsite.com (marked in blue), and doglover.com (marked in pink):

Circular references by tagging

  • A, B, and C are pages and posts from your website. The dog tag was used to describe those pages.
  • D is all your pages that have a tag cloud in the sidebar. Dog is in the cloud.
  • E is your home page that has a tag cloud on it.

Now what's going on here is quite ugly. Look at the circular references all coming in and out of the Dog tag page. Spiders will go crazy spinning inside your web pages that are tagged. There is a lot of work going on there and most of it is just a waste of time in processing and storage.

If you think about this, tag pages are really search result pages. They don't offer much quality in content. They are just a list of links.

Duplicate Content

Now here is where you may get duplicate content issues.

Suppose your dog tag page lists the three most recent pages/posts tagged with the word dog (ABC). Google comes in on a Monday, and spiders this page and caches it.

On a Tuesday, you get a visitor to your site. She is a website owner and loves what information you have. She likes it so much that she creates two anchor links from her website pointing at your dog tag page and the search results page for 'dog'.

On Wednesday, Google comes along and spiders the front page of doglover.com with the two links on it. Oh oh! Do you see the problem?

Structurally, the search page and tag page slightly differ. It may have different div structure (sidebar on left rather than on right), different container names, etc. But really, the content, or meat of the page, is just links and they look pretty darn similar don't they? Google may not like this and hit you with a duplicate content penalty.

How To Solve Problem

To solve this problem, you should not let search engines index your tag and search pages. You should go to your robots.txt file and include rules to prevent that from happening. For example:

User-agent: *
disallow: /search
disallow: /tag

In the case where robots.txt is ignored, you can use a meta tag in the tag and search pages. Most CMS's use themes and templates. Check the rendering of those pages in source mode and then change the tag and search templates to add this line:

<meta name="robots" content="noindex, nofollow" />

Also, don't put these pages in your sitemap. If you use a plugin or module to pull this off, check to see if this is disabled.

One additional thing I like to point out is with the dog tag page and the search page with dog results.

If you have a website with a small number of web pages and do not create content frequently, those two pages are going to be mostly the same - just a page of teaser of text and links. Just because your search page may list the links in a different order, its still the same 3 links (A, B, C).

I would recommend tweaking your search page results to look different than your teaser lists in this case. That way they contain different content. Also, it is probably best to avoid having the teaser text in the full article. An idea is to pull out the meta description and use that as your teaser.

Filed under: