SEO Chapter 2: Duplication and Canonicalization

After analyzing a website's domain name and general design, my


colleagues and I check for one of the most common SEO mistakes on the Internet, canonicalization. For SEOs, canonicalization refers to individual webpages that can be loaded from multiple URLs.

NOTE In this discussion, "canonicalization" simply refers to the concept of picking an authoritative version of a URL and propagating its usage, as opposed to using other variants of that URL. On the other hand, the book discusses the specific canonical link element in several places,

including in Chapter 5.

Remember that in Chapter 1 I discussed popularity? (Come on, it hasn't been that long.) What do you think happens when links that are intended to go to the same page get split up among multiple URLs? \fc>u guessed it: the popularity of the pages gets split up. Unfortunately for web developers, this happens far too often because the default settings for web servers create this problem. The following lists show the negative SEO effects of using the default settings on the two most common web servers:

Apache web server:

http://www.example.com/

http://www.example.com/index.html http://example.com/

http ://example.com/index. html Microsoft Internet Information Services (IIS):

http://www.example.com/

http://www.example.com/default.asp (or ,aSPx depending on the version) http://example.com/

http://example.com/default.asp (or .aspx)

Or any combination with different capitalization.

Each of these URLs spreads out the value of inbound links to the homepage. This means that if the homepage has 100 links to these various URLs, the major search engines only give them credit separately, not in a combined manner.

NOTE Don't think it can happen to >ou? Go to http://www.mattcutts.COm and wait for the page to load. Now, go tohttp://mattcutts.com and notice what happens. Look at that,


canonicalization issues. Whafs the significance of this example? Matt Cutts is the head of Google's web spam team and helped write many of the algorithms we SEOs study If he is making this mistake, odds are your less informed clients are as well.

Luckily for SEOs, web developers developed methods for redirection so that URLs can be changed and combined. Two primary types of server redirects exist—301 redirects and 302 redirects:

• A 301 indicates an HTTP status code of "Moved Permanently."

• A 302 indicates a status code of "Temporarily Moved."

Other redirect methods exist, such as the meta refresh and various JavaScript relocation commands. Avoid these methods. Not only do they not pass any authority from origin to destination, but engines are unreliable about following the redirect path.

TIP You can read all of the HTTP status codes at

http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.

Though the difference between 301 and 302 redirects appears to be merely semantics, the actual results are dramatic. Google decided a long time ago to not pass link juice (ranking power) equally between normal links and server redirects. At SEOmoz, I did a considerable amount of testing around this subject and have concluded that 301 redirects pass between 90 percent and 99 percent of their value, whereas 302 redirects pass almost no value at all. Because of this, my co-workers and I always look to see how non-canonicalized pages are being redirected.

It's not just semantics. How a page is redirected (whether by a 301 or a 302 redirect) matters.

WARNING Oder vBrsions of IIS use 302 redirects by default. D'oh! Be sure to look out for this. You can see worthless redirects all around

popular NS-powered websites like microsoft.com and

myspace.com. The value of these redirects is being completely negated bya single value difference!


Canonicalization is not limited to the inclusion of letters. It also dictates forward slashes in URLs. Try going to http://www.google.com and notice that you will automatically get redirected to http://www.aooale.com/ (notice the trailing forward slash). This is happening because technically this is the correct format for the URL. Although this is a problem that is largely solved by the search engines already (they know thatwww.google.com is intended to mean the same as www.aooale.comI), it is still worth noting because many servers will automatically 301 redirect from the version without the trailing slash to the correct version. By doing this, a link pointing to the wrong version of the URL loses between 1 percent and 10 percent of its worth due to the 301 redirect. The takeaway here is that whenever possible, it is better to link to the version with the forward slash. There is no reason to lose sleep over this (because the engines have mostly solved the problem), but it is still a point to consider.

CROSSRB1 The right and wrong usage of 301 and 302 redirects is discussed in Chapter 3. The correct syntax and usage of the canonical link element is discussed in Chapter 5.

কোন মন্তব্য নেই:

একটি মন্তব্য পোস্ট করুন