I’ve never worked with an ecommerce platform that was entirely free of duplicate content. Some platforms are better at containing the sprawl. But one change in the settings or the code could accidentally produce duplicates — i.e., different pages with different URLs for the same piece of content.
The impact of bots discovering and indexing those duplicates is wasted crawl equity, slow discovery of new content, and split link authority.
Depending on the platform and implementation, any of the following seven issues could produce duplicate content.
I’ve never worked with an ecommerce platform that was entirely free of duplicate content…
Keyword-based URLs
Keyword URLs mask the unfriendly versions that an ecommerce platform would otherwise create. The system still generates the unfriendly URL for its own purposes and then maps your keyword URL to it. For example:
- Native, unfriendly URL: /shop/en/US845US845/69i57j0l72750j1j1.htm
- Keyword URL: /house/widget.htm
The good news is that unfriendly URLs have no consumer value. To fix, 301 redirect unfriendly URLs to their friendly variant.
Category-based URLs
Products that reside in multiple categories or subcategories could have different URLs. For example, a widget in the “House” and “Apartment” categories could have identical pages with separate URLs that reflect the category, as in:
- /house/widget.htm
- /apartment/widget.htm
Shoppers need to see all of the product page variants as they navigate, so we can’t 301 redirect one to the other. We have two choices to fix.
You could create unique product pages for each category. This is suitable for products that have multiple use cases. Conversely, for products with a single use, assign one category’s product URL as primary and apply a canonical tag to each secondary URL. (Canonical tags are snippets of HTML code that designate a different URL as the primary, or “canonical,” version for that page.)
Click-path URLs
If your faceted navigation is indexable, visitors may be able to take different click paths to arrive at the same content. Displaying that click path in the URL provides rich keyword signals for both visitors and search engines, but it also produces duplicate content, such as:
- /house/blue/large/widget.htm
- /house/large/blue/widget.htm
To fix, insert a canonical tag. As long as the facets are represented in the URL, the order does not matter.
Capitalization
Uppercase and lowercase letters are different characters and thus could host different URLs for the same content, which search engines could separately index. For example:
- /page.htm
- /PAGE.htm
- /pAge.htm
- /pagE.htm
Some of these would be indexed only if someone linked to them accidentally, such as from typos in URLs.
Visitors don’t need to see different case variants. You can therefore apply canonical tags or 301 redirects to one URL, typically the lowercase version.
Protocols and Subdomains
The protocol (http vs. https) and subdomain (with and without www) variations of URLs also introduce duplicate content. For example, the following URLs could generate four pages of identical content that search engines could index:
- http://domain.com/page.htm
- https://domain.com/page.htm
- http://www.domain.com/page.htm
- https://www.domain.com/page.htm
To fix, 301 redirect to the canonical version.
Sorting Parameters
The sorting function that enables visitors to arrange products by price, popularity, ratings, or other criteria is essential for usability. But it generates many low search-value pages. For example, only the first one below should be indexed.
- /house/widget.htm
- /house/widget.htm?num=24
- /house/widget.htm?price=low
- /house/widget.htm?pop=high
To fix, use a canonical tag to tell search engines which URL is the canonical version. Alternately, code the sorting function using a form of AJAX that search engines can’t crawl, thereby removing the possibility of duplicate content.
Past Versions of a Site
Changing your site in a way that impacts how products are categorized or labeled could alter URLs for those pages. Many platforms just abandon the old URLs and move forward using new ones. For example, say your site had a “Home and Garden” category with the URL /home-garden/. Splitting into separate categories would produce two new URLs: /home/ and /garden/.
Later, if you renamed the “Home” category to “House,” the URL could change accordingly, to /house/. The result could be two orphaned URLs:
To fix, 301 redirect the two orphaned pages or, alternatively, return a 404 error. A 301 is better, however, as it will cause the search engines to deindex the URL eventually and, importantly, attribute the orphaned URL’s accumulated link authority to the destination page.