Whenever we approach search engines with our queries, we will get result pages with many websites enlisted with the answers to our questions. How do search engines like Google choose these sites in the priority order we see? The search result pages we see are there after passing through web crawling and indexing. Crawling, indexing, and ranking are the three steps of web crawling and indexing that Google follows to enlist the sites on result pages as per priority.
What Is Crawling?
Crawling is the process by which a search engine, for example, Google, tries to visit the whole pages of your website using Google bots. This is called the Google crawling process. You can expect Google to crawl your site anywhere between 4 to 30 days, depending on your website’s activity. Usually, Google takes anywhere from days to weeks to crawl a site. You need to be patient and conscious about monitoring the progress using the URL inspection tool or the index status report.
Google search engine crawlers or Google crawlers will discover and scan your websites by automatically visiting publicly accessible web pages and following the links. The two types of Google bots or Google site crawlers used in the Google crawling process are Googlebot Desktop and Google Smartphone.
What Is A Crawl Error?
It is after the process of crawling is done, your website gets indexed and ranked in search engine result pages(SERPs). Google Crwal Bot does crawl the pages first. Then it will index the complete contents on the page and add all the links on these pages to the pages that are yet to be crawled. You must ensure the search engine bot gets to every page on the site without any obstacles. Any barriers leading to the failure of this crawling process by Google bots are called Crawl Errors, divided into two groups by Google; 1. Site Errors, and 2.URL Errors.
- Site Errors
Site errors are the crawl errors that can block Google bots from entering your site and crawling it. Google will show these errors in the crawl errors dashboard for the last 90 days. These are crucial errors that can affect your website; hence they can’t be avoided. The most common site errors are Server Errors, DNS Errors, and Robots Failure.
How To Fix DNS Errors?
If the Googlebot encounters DNS (Domain Name System) issues, it can’t connect with your domain via a DNS lookup issue or a DNS timeout issue. Hence these are the most prominent errors. When it comes to fixing them;
- The first thing to consider is nothing other than using Google’s URL inspection tool to get a side-by-side comparison of how Google views your site compared to a user; the slower process of Fetch and rendering is helpful.
- If Google fails to fetch and render your site properly, take further action by checking with your DNS provider to find out where the issue is.
- Ensure your server shows a 500 or 404 error code instead of displaying a failed connection. These codes are more precise than having DNS errors.
How To Fix Server Errors?
Server error means your server is taking too long to load or respond, so the request is timed out. Google bot will give up trying after a certain amount of time if your site takes too long to load. Server errors can happen if a site gets too much traffic for the server to handle. Though increased traffic is a merit for your website, make sure it is ready to drive that much traffic.
There are many server errors; timeout, connection reset, connection timeout, truncated headers, connection refused, connection failed, no response, etc. Hence specifically find out which type of server error you have to fix. You can get help from Google Search Console help in diagnosing specific errors.
How To Fix Robots Failure?
When Google fails to retrieve your robots.txt file, it can be denoted as a Robots failure. Surprisingly robots.txt file is only necessary if you want to avoid Google crawling certain pages.
Make sure that your robots.txt file is configured correctly. As all other pages will get crawled by default, double-check which pages you instruct the Googlebot not to crawl. Triple-check all-powerful lines of “Disallow:/” to ensure that line does not exist unless, for some reason, you don’t want your site to be shown in Google search result pages. If your file is apparently in order and yet receiving errors, you can use a server header checker tool to find out if your file is returning a 404 or 200 error.
Robots Failure will result in disastrous consequences for your website. Make sure you are checking it often.
- URL Errors
Unlike site errors, URL errors won’t affect your website as a whole. They only affect certain pages of the website. Google Search Console can show you the top URL errors per category-desktop, feature phone, and smartphone. Always remember that Google ranks the most significant errors first, and some of them may already be solved. If you happen to see many errors, mark them as fixed and check back on them in a few days. When you do so, your errors will be cleared temporarily from the dashboard, but Google will bring them back as it crawls your site again. If the errors were indeed fixed in the past, they would not be laid bare again. If the errors still appear, you can know that they still impact your website. The most expected URL errors are 404, Acces Denied, and Not Allowed.
How To Fix 404 Errors?
If Googlebot tries to crawl a page that does not exist on your website, it is termed a 404 error. 404 pages are found by Google when other pages or sites link to that non-existing page. 404 errors are urgent if essential pages on your site are showing up as 404s and need to be fixed immediately.
- Ensure your page is not deleted or in draft mode. Be sure that the page is published from your content management system.
- Ensure the 404 error URL is the correct page, not another variation. Check if these errors show up on the www vs. non-www versions of the site and the http vs. https versions of your site.
- If you want to redirect the page to another without it being revived, make sure you 301 redirect it to the most appropriate related page.
Just ignore it if your 404 error URL is meant to be long gone, as Google suggests. But to prevent them from happening again, you will have to work on a few more things. Go to your Crawl Errors > URL Errors section to find the links to your 404 page. Then click on the URL which you need to fix. Search your page for the link. You can view the source code faster and find the link in question.
Although it is a meticulous process, you must remove the links to that page from every page linking to it, including other websites, if you want to get rid of the old 404s.
How To Fix Access Denied Errors?
Access denied error means that Googlebot cannot crawl your page, which means that you must take immediate action unless you want that page not to be crawled and indexed. Remove the elements that block Goglebot’s access to fix access-denied errors.
- Check the robots.txt file to confirm the pages listed there are meant to be blocked from crawling and indexing. Use the robots.txt tester to test individual URLs against your file and to see warnings on your robots.txt file.
- Remove the login from pages you want Google to crawl, whether it is a popup login prompt or an in-page.
- Scan your site with Screaming Frog, which will prompt you to log in to pages if the page requires it.
If the wrong pages are blocked, access denied errors would affect your website’s ranking unsatisfactorily. Hence fixing urgent issues are critical for keeping your site on track.
How To Fix Not Followed Errors?
Not followed errors are often confused with the “not follow” link directive. Not followed error means that Google is not able to follow a particular URL. Not followed issues are vital if you are encountering them on a high-priority URL, and these errors are of less importance in the case of old or no longer active URLs.
You can find a lot more information regarding the Not Followed section in the Search Console API.
The following are some other tools that you can use for dealing with not followed errors:
- Screaming Frog SEO Spider
- Raven Tools Site Auditor
- Moz Pro Site Crawl