URL Normalization or (URL Canonicalization)

In this tutorial we will have look at a few tricks to avoid URL Canonicalization and some Canonicalization Best Practices.
URL Normalization or URL Canonicalization

URL normalization (or URL canonicalization) is the process of picking the best URL from the available choices. It’s done to reduce and to have a standard URL than having many URL’s.

URL normalization is a major step in Search engine optimization as search engines employ URL normalization in order to assign importance to web pages and to reduce indexing of same pages again which can cause duplicate content, even a web browser may perform normalization to determine if a link has been visited or to check if a page has been cached.

Canonical URL

Canonical URL is not a corret term, Canonicalization as mentioned above, is the process of picking the best URL from the available ones.

More than one URL’s pointing to same page is often called Canonical URL, but it’s not a valid term as we mentioned above. Few example for Canonical URL’s.

  1. example.com/
  2. example.com
  3. example.com/index.html (if html)
  4. example.com/index.php (if php)
  5. example.com/home.asp (if IIS)

In Most of Web sites the above URL displays same content,But technically all of these URL’s are different. A web server could return completely different content for all the URL’s above. Search engine will only consider one of them to be the canonical form of the URL. So it’s necessary that you make a choose a prefered one and make 301 redirect for other versions to the prefered one, inorder to prevent duplicate content and get hign search ranking.

In this article we have mentioned a few other areas where Canonicalization Issues can be found and solution for them. It is recommend that you inspect your website and check for the issues and solve them as early as possible to prevent further troubles.

Note: It is important not to use the remove URL tool to try and fix these domain issues. Doing so may result in your entire domain, as opposed to one page, being removed from the index for 6 months.

WWW and Non WWW Issues

A website can live at www.example.com or example.com. It’s best for your site’s visibility to live at just one URL, or web address. There is no special advantage with any version’s, so it’s your choice.

You’ll want to create a 301 redirect to the URL you choose from the other URL. To do that you can check our article on WWW to Non WWW redirect with .htaccess for more details.

The internet Giant Google allows the web master to set the prefered version to be displayed in the search in Googles Web Master Tools.

Forward slash issue with Directories

Suppose you want your default URL http://example.com/ or url for directory http://example.com/folder/ with out a forward slash i.e http://example.com and http://example.com/folder.

You need to set a 301 redirect to the forward slashed version. Check Forum post on Insert forward slash for directory with htaccess.

Another area with directories and Defult URL is whether to have the default docoument name after directory or URL.example.com/index.html (or any) and example.com/fokder/index.html (or any), you must choose any one version and do a 301 redirect as we did earlier for other issues

Check our article on 301 Directory Index to Root 301 Redirect for more details.

Another place where duplicate content can arise is when you have a permenant IP and your website displays the same contents as it is when accessed with the IP rather that making a 301 redirect to the domain name. Though this is not a common one as people will not like to your web site with IP, but whey you use some SEO or Site Analysing tools it may some times point to your site in IP.

USE rel=canonical

rel=canonical is a tag introduced by Google in 2009 and is currently supported by many other search engines also, this tag let you tell search engines the correct URL for any page. More precisely you can inform the visiting search bots to index just that version, and to direct all link authority to that one URL you mentioned in rel=canonical tag.

But there are few limitations in using this as this can be used by spammers for lot of other purposes, you can only mention URL with in same domain you cannot link it to another domain,cross domain is not supported, it will be regarded as a spam and you can be blocked out. Using this between subdomains are allowed as the Google Engineer mattcutts informed in one of his video tutorial.

This cannot guarantee you that the Bolt will index only the version you specified, it’s recommended that you use the other functionalities to stay on safer side as all search engines may not support it.

How Can I Avoid Canonicalization Issues? Canonicalization Best Practices

Before we winding, i would like to recommend a keeping a check list before you make a new website or do analysis with your present one, to prevent Canonicalization issues.

  1. Maintain Consistent Linking Conventions.
  2. 301 Redirect Non-www to www , Or Vice Versa.
  3. Workaround For https if you use it.
  4. Don’t Link To Multiple Versions Of The Page.
  5. Use 301s, not 302s On Internal Affiliate Redirects.
  6. Specify preferred URL’s in Google Webmaster Tools.
  7. 301 redirect permenant IP to main domain (if you have a permenant IP).

Although these cannot assure you that your site will be free from Canonicalization issues, but these will solve more that 90% problems. To make it 100% you need to analyses the site regularly.