Implementing a search engine optimization strategy typically requires tools that can be dangerous when mishandled. Knowing those tools — when to use them and how — can make all the difference.
This is the 12th installment in my “SEO How-to” series. Previous installments are:
Technical SEO focuses on a few critical areas: crawling, indexation, and defining content types.
Regulating Search Engine Crawlers
The most fundamental organic search requirement is allowing search engine crawlers (bots) to access your site. Without the crawl, search engines can’t index and rank your pages.
Crawl tools allow you to open or close the door to search bots on a page by page basis. Use the tools to block friendly bots from content that you don’t want in Google’s index, such as shopping cart and account pages.
Robots.txt file, located in the root directory of your domain, tells bots which pages to crawl. For example, Practical Ecommerce’s robots.txt file is at Practicalecommerce.com/robots.txt.
Access to the full-site is the default — you don’t need to enable access. Issuing disallow commands disables reputable search bots from accessing one or more pages. Nuisance bots, such as scrapers that copy your content to repost on spam sites, won’t obey robots.txt files. For SEO purposes, however, the robots.txt file works well.
See my post in April for more on robots.txt.
Meta robots noindex tag. Applied to individual pages, the noindex attribute of the robots metatag — usually just called a noindex tag — can prevent bots from indexing individual pages. It sits in the head of your page’s HTML code with your title and meta description tags.
The meta noindex tag can be powerful but also dangerous. When used in a page template, the meta noindex tag cuts off indexation for every page in that template.
Other attributes, such as nofollow, nocache, and nosnippet, are available with the robots meta tag to, respectively, restrict the flow of link authority, prevent page caching, and request that no snippet of the page’s content show in search results.
See my April post for tips on managing noindex tags.
Enabling Indexing
Indexing tools guide search engines to the content you want to appear in organic search results.
XML sitemap. Unlike an HTML sitemap, which many sites link to in the footer, XML sitemaps are a stark list of URLs and their attributes. Bots use XML sitemaps to augment the list of pages they discover when crawling your site. XML sitemaps invite bots to crawl the pages but do not guarantee indexing.
I addressed last year the structure and limitations of XML sitemaps.
Google Search Console and Bing Webmaster Tools. Once you have an XML sitemap, submit it to both Google Search Console and Bing Webmaster Tools. That, and referencing the XML sitemap URL in your robots.txt file, ensures that the bots can find it.
Sitemap submission is not the only reason to sign up for Google’s and Bing’s webmaster toolsets, though. They also serve as performance dashboards for each search engine. Moreover, Google’s Search Console includes a URL Inspection tool to request indexing of any URL on your domain.
Removing Indexed URLs
Be sure that you want content crawled and indexed before it goes live. It’s a lot easier to prevent indexing than to remove it afterward. However, if you need to remove pages from a search engine index, such as for duplicate content or personally identifiable info, consider these methods.
404 file not found. The fastest way to remove a page from a search index is to remove it from your web server so that it returns a 404 file-not-found error.
However, 404 errors are dead ends. All the authority that the live page had earned over time (from other sites that linked to it) dies. Whenever possible, use another method to deindex content.
See my post on 404 errors.
301 redirects are header requests from the webserver to the user before a page loads signaling that the requested page no longer exists. It is powerful because it also commands search engines to transfer all the authority from the old page to the page being redirected to, strengthening that receiving URL. Use 301 redirects whenever possible to remove content, preserve link authority, and move the user to a new page.
See my post on 301 redirects.
Canonical tags. Another form of metadata found in the head of a page’s code, the canonical tag tells search engine crawlers whether the page is the canonical (i.e., authoritative) source. Canonical tags can deindex pages and aggregate link authority to the canonical version.
Canonical tags are handy for managing duplicate pages — a common occurrence with ecommerce product catalogs.
Canonical tags are a request, not a command like 301 redirects. Still, they are effective when you need humans to access a page, but you don’t want search engines to index it.
See my post on canonical tags.
Google Removals tool. Another feature in Google Search Console, the Removals tool can temporarily remove pages from Google’s index. Be careful, however, as I’ve seen entire sites accidentally removed with a single click.
The Removals tool is a good choice when you need to delete outdated or sensitive information from search results quickly. If you want the removal to be permanent, however, you’ll need to remove the page from your site (to return a 404 error) or place a noindex tag on it. Otherwise, Google will recrawl and reindex the page within six months.
For more, see Google’s “Removals Tool” explanation.
Defining Content
Lastly, structured data can define content types to assist search engines in understanding it. Structured data can also trigger the placement of rich snippets and knowledge panels in Google’s organic search results.
Usually coded using JSON-LD or the microdata standard, structured data places bits of metadata into your existing page templates. The code surrounds existing data elements, such as price, ratings, and availability.
I addressed structured data for ecommerce product pages last year.