What is the rel="canonical" attribute: how to use it

Content of the article

/01 What is the "canonical" attribute?
/02 The impact of rel="canonical" on SEO
/03 What are the signs of a canonical URL?
- A canonical URL that links to itself
- A canonical URL that links to another page
/04 How do I set the canonical URL of a page using the rel="canonical" attribute?
- 1. Select the page to use rel="canonical"
- 2. Add the HTML attribute rel="canonical"
/05 What mistakes do you make when normalizing URLs?
/06 Conclusions

What is the rel=”canonical” attribute: how to use it

The rel=”canonical” attribute is a tool for solving the problem of duplicate content. WEDEX shares how the attribute can be used and what common mistakes can occur.

What is the “canonical” attribute?

The rel=”canonical” attribute, also known as the «canonical link», «canonical URL», or «pointer to the canonical URL», is an HTML markup element that tells the search engine the main (or canonical) URL of a page and helps to avoid duplicate content. The rel=”canonical” HTML attribute is an important part of your website’s SEO, so you should know how to use it.

The idea behind the attribute is simple: if there are several copies of the same page, one canonical version is selected, and the HTML attribute points to it to search engines. The addition of rel=”canonical” explains to the search engine that this selected page should be among the results.

Why is it necessary to specify a canonical URL?

There are several reasons why you should mark a page with the rel=”canonical” attribute:

Avoiding duplication of content. The main reason for using rel=”canonical”. This HTML attribute helps search crawlers find the main version of a page among similar or identical variants. This is especially important if the URL link is filled with various attributes or if the site has a lot of duplicate content.
Optimize website crawling and indexing. To help Google crawlers better understand the content and subject matter of your site and to be more efficient when crawling pages, you should use rel=”canonical”.
To group signals from similar and identical pages. With rel=”canonical”, search engines can group signals from several similar pages on the same site and link them to the same URL.
To make it easier to get content statistics. If content is placed on several pages with different URLs, it can make it difficult to get general statistics on the effectiveness of such content. The rel=”canonical” HTML attribute solves this problem.

What does the canonical URL look like and where can I find it?

You can define a canonical URL using rel=”canonical” in several places on your site. Here are some basic examples.

Canonical URL on the site

The canonical URL for a page on a website is always placed in the <head> section of the page’s source code. In HTML markup, it looks like this:

<head>

<link rel=”canonical” href= “https://example.com” >

</head>

Advantages of using rel=”canonical” in HTML markup:

Allows you to mark any number of pages with an HTML attribute.
The easiest and most common way to use an attribute.

Disadvantages:

If you are marking up pages for a large website (for example, an online store with tens of thousands of product pages), implementing an attribute can be difficult, especially if it hasn’t been used since the beginning of website development.
This method allows you to work only with HTML pages, not with files.

HTTP header

The Internet consists not only of pages and websites, but also of various types of files. For example, your site’s server may contain documents that are available online, usually in PDF format. You can also set up rel=”canonical” for them to ensure that these files are indexed correctly. This is done in the HTTP header and looks like this:

HTTP/1.1 200 OK

Link: <https://example.com/original.pdf>; rel=”canonical”

Please note: as of today, only Google’s search engine supports URL detection and indexing using the HTML rel=”canonical” attribute in the HTTP header. As for images, none of the existing search engines support their rel=”canonical” in HTTP. Canonical links for images should be specified in the HTML code of the page on which they are located.

Advantages of rel=”canonical” in HTTP:

Does not affect the page size.
The only option for indexing files in the Google search engine.

Disadvantages:

Marking a large number of files will require significant resources.

Site map (Sitemap)

Another option for using the canonical URL is to implement it in the sitemap, also known as a sitemap. In simple terms, a sitemap is an xml file that contains information about the pages of a website that should be indexed by search engines.

Advantages of rel=”canonical” in a Sitemap:

An additional indication to search engines for a more thorough crawling of the site.

Disadvantages:

Unlike the canonical URL in HTML and HTTP markup, the canonical URL in a Sitemap is not a guarantee of indexing.

Other ways

There are two more ways to canonize links.

You can canonicalize a link through a 301 redirect, i.e. through the server. The principle of operation is to redirect all duplicate URLs to the main canonical one. The advantages include the fact that this method strictly indicates the required URL. Among the disadvantages: using this method will lead to the complete removal of access to non-canonical links, direct access to the server is required.
An example looks like using nginx:rewrite ^/old-page/$ https://example.com/new-page/ permanent;
The second method is realized with the help of Google Search Console. This tool allows you to choose a priority domain among others and use it as the main version of your site. Among the disadvantages: it is not a direct alternative to rel=”canonical” and works only for Google.

What is the best way to use rel=”canonical”?

Taking into account the functionality of each method, we can derive the following formula for use:

Basic normalization of pages by address – using rel=”canonical” in <head>.
Additionally, there is a 301 redirect for duplicate pages.
For files (PDF, video, images) – HTTP.
For general indexing – Sitemap.xml.

Yes, for the basic scenario, the direct use of the HTML attribute in the <head> part is enough, but it is the use of several methods at the same time that can increase the chances of high-quality indexing of a canonical page by search engines.

When should you use canonical addresses?

Despite the fact that in the previous section we mentioned the «disadvantages» of this or that type of rel=”canonical” usage, these «disadvantages» are only apparent if you compare these methods with each other. The use of the attribute really helps to improve website indexing and, consequently, SEO.

Not all web users are aware that the Big Three (Google, Bing, and Yahoo) rely heavily on canonical URLs when crawling. It is the HTML attribute rel=”canonical” that helps search engine crawlers understand which pages should be displayed and which should be hidden in the search results. Using the canonical URL and defining it with the help of the attribute is a good idea in the vast majority of cases.

The impact of rel=”canonical” on SEO

The rel=”canonical” HTML attribute is a technical solution for managing duplicate content on a website. It is well known that search engines, in particular Google, do not like duplicate content on the web. Duplicate information within a single website can be an even bigger problem. Let us explain with an example.

The conditional website example.com contains two sections for selling guitars. Due to the change of content manager, the owner temporarily fills the site with content on his own. Another customer call distracted the owner, so he duplicated the page of one guitar and placed the copy of the link with minimal changes in different sections. Now the URLs look like this:

https://example.com/guitars/black/no-name-model
https://example.com/guitars/no-name-model-black

The search engine crawlers are in despair: they have crawled both pages and don’t understand which one should be added to the search results. As a result, both pages will be lower in the rankings, and the site’s SEO will need to be improved.

To avoid SEO problems, the owner should have indicated that one of these addresses is canonical. This would have saved the main page and helped the search engine to understand that one of these pages clearly meets the user’s request and can be pulled higher in the search results.

Another way to use the canonical URL through the rel=”canonical” HTML attribute is to point to the original article or blog. For example, you wrote a guest article for your friend’s website. To use it on your site, you can provide a link to the original article. This will help SEO optimization of your site, because search engines will not take the copied material as a direct duplication.

What are the signs of a canonical URL?

Link canonicalization is the process of selecting one of the existing links as the canonical one. If there are two almost identical pages for the same product on your website, you will have to choose one of the links as the main one. How to do it?

Pay attention to the link structure. Some URLs have a more logical and convenient structure, and therefore are better suited to the role of a canonical link: shorter length, appropriate use of keywords, and an understandable format for users. But this choice is not always obvious. Let’s go back to the example of guitar stores:

https://example.com/guitars/black/no-name-model
https://example.com/guitars/no-name-model-black
https://example.com/products/no-name-model-01

The owner has once again duplicated the page to another, new section! Which of these links should you choose? The answer is: any of them. If the links are equally «low-quality» and you don’t know which one is better to choose, then choose any of the available ones.

The absence of a canonical URL on a duplicate page is worse than canonicalizing a conditionally incorrect link.

However, if the situation allows you to choose a more concise and human-readable canonical URL, then choose it.

A canonical URL that links to itself

If there is only one version of the page, you should make sure that rel=”canonical” refers to itself. This is a key point because it is a clear signal to search engines: «This page is the only one of its kind, it is the only one that should be indexed and considered canonical!»

A canonical URL that links to another page

If the current page is a duplicate of another page, make sure that rel=”canonical” refers to the original. Here are some cases when a canonical link can solve a problem with page indexing:

when the query parameters are used in the duplicate;
if the pages are complete or almost complete duplicates of each other;
when very similar versions of the same page are created on purpose (for example, for different target audience groups).

There is also another scenario for using rel=”canonical” – when there are two pages with the same content, but created for different devices. For example, one version of example.com is designed for PCs, and the other m.example.com is designed for mobile devices. In this case, you should use as the canonical and alternative address to inform the search engine about their connection and the difference between them. As of now, only Google supports this implementation.

The rel=”canonical” HTML attribute can also be used for a more non-standard indexing option, namely cross-domain normalization. This was described a little bit above, in the example with the article for a friend’s website. Content that is published on several pages of different domains should be marked with rel=”canonical” to clearly signal to search engines which version of the page should be indexed and considered canonical.

How do I set the canonical URL of a page using the rel=”canonical” attribute?

Let’s say that two pages on your website are identical in content but differ in location in different sections. Both pages have a certain amount of links to them from other resources, so the content itself is valuable. Which version of this page should be used as the canonical one?

1. Select the page to use rel=”canonical”

Let’s remember the recommendations on which URL is better: concise and clear. If it is not, then pay attention to the page metrics: which page has more links, which page receives more orders and visits, etc. If the pages are identical in all respects or their difference is within the margin of error, then choose the version you like best.

2. Add the HTML attribute rel=”canonical”

The next step is to add the rel=”canonical” attribute. You can do this manually from an FTP server or CMS plugins like WP File Manager for WordPress. The main thing is to have access to file editing.

The canonical link must be located in the <head> section of the page, otherwise the indexing instructions will not be followed by search engines.

In general, that’s all. All further work will be on the shoulders of search engines. However, this doesn’t mean that you can set up an HTML attribute once and forget about it – in order to make sure that everything works as it should, you should check the indexing of your pages for problems and errors during search.

What mistakes do you make when normalizing URLs?

Here are some common mistakes that happen when using rel=”canonical” and canonicalizing pages:

Using robots.txt file for normalization instead of Sitemap.
Using different URLs as canonical URLs for the same page. For example, it’s a bad idea to specify one «canonical» through a Sitemap and another through an attribute.
Using a part of the URL for canonicalization. The rel=”canonical” HTML attribute requires the full canonical URL (including https:/).
Use noindex to disallow a page as canonical for other pages. This action will lead to a complete blocking of the page in Search.

You can learn more about the recommendations for use in the official Google documentation.

Conclusions

The rel=”canonical” HTML attribute is a powerful tool for SEO optimization of any website. It will be especially useful for large websites and online stores that contain many duplicate pages. However, incorrect use of the attribute and URL normalization can lead to problems with indexing and visibility of pages in Search.