cannibalistic URLs

Search Engine Journal - When you build a web site, you create paths to certain pages. Most web developers will put those pages in specific folders like: www.example.com/press-releases/ or www.example.com/store/ to give the site a logical structure. Unfortunately, depending on a slew of technical stuff like servers, file extensions, redirects and internal site links, that pretty path can end up looking like a number of unique paths to both users and search engines even though they’re really the same:

www.example.com/press-releases/
example.com/press-releases/

www.example.com/press-releases/default.aspx
example.com/press-releases/default.aspx
www.example.com/press-releases/?id=1
example.com/press-releases/?id=1

Why is this a problem?

Isn’t this just ugly code to some standards-compliant web freaks?

It’s a problem because when a link goes to that page (either from your site or another site) and it uses different paths (through a mistake or technical error), that path is seen by the search engines as unique pages. And when Google determines page rank from links to your page, if they find multiple pages, you could be splitting your best possible ranking.

Let’s use Play-Doh to demonstrate the principle.

If you have three page paths:

example.com/default.aspx
example.com/default.aspx?id=1
WWW.Example.com/Default.aspx

The first one might have just one or two links to it and a PR of 1. The second might have a few more links and a PR of 2. And let’s pretend the third has even more links with a PR of 3.

Unfortunately, page rank isn’t simple arithmetic, but for the sake of this discussion, if you could make all of those links go to the “same” page, you would be channeling greater link equity to one central location. This could potentially result in a PR of a big, beautiful 6, which should mean increased rankings.

How do you find out if you’re cannibalizing your page rank

First things first… open your website in a browser and type in the domain as www.example.com. Then type in example.com without the “www.” If the domains stay the same when you type either one, you need to designate one version over the other. For detailed instructions on how to do this, read Chris Hooley’s article, Canonicalize with .htacces. Your goal is to make one version redirect to the other.

Now that, that biggie is out of the way, you need to choose how you want your pages to look. I recommend removing extensions entirely if you are using folders (e.g. www.example.com/example/ versus www.example.com/example/index.htm). This is for a maintenance reason, but I am also partial to keeping pages in the root directory as much as possible, which means you have to show an extension like www.example.com/example.htm.

Whatever you choose to do, make sure you stay consistent in how you code your internal links. What I mean is that if you create a path on your site to www.example.com/example/, do not make the page www.example/example in another area of your site. This is one of the few times in life when it’s okay to play favorites!

You should also control the amount of variables appended to your path. Often, for tracking or programming reasons, variables are appended to URLs that can make your paths appear different. Try to limit them and again, be consistent.

To test for variables and path mistakes, create a Google Webmaster Central account and navigate to Webmaster Tools. Then go to the Links tab (after you have verified the site) and scroll through both the internal and external links. You should be able to easily eye serious issues. I also like to use Xenu Link Sleuth, which detects broken links, but also displays a list of paths on your site.

And, that’s about it, though there’s probably a lot I did not cover either from my own misunderstanding or tiredness. Either way, I got to play with Play-Doh and talk about page rank. It doesn’t get much better than that!

No comments: