Recently Google changed the way they view sites. In the past search engines tended to see a website much as a user using a text-only browser would. This has since changed. Now search engines have switched to looking at sites as a human would a modern web browser; rich with imagery, video content and a variety of other media.
This can cause problems if a sites robots.txt accidently blocks JavaScript, CSS and images from being crawled. This can happen when a CMS (content management system) has a default robots.txt that blocks the key files that are required to display a page from Google’s crawler’s gaze. If you do this Google have gone on record as saying:
Disallowing crawling of JavaScript or CSS files in your site’s robots.txt directly harms how well our algorithms render and index your content and can result in suboptimal rankings.”
This is a major change and indicates that how a site looks to a human user now factors as a ranking metric. After all Google want to provide the end user the best possible sites. And in this day and age a text-based site simply wont do. We have come to expect imagery, video and a host of other points of interaction. A Robots.txt file is a powerful tool, but the downside is that even a minor error in the robots.txt file can cause major disruption. We have seen sites block so much of their JavaScript, images and CSS that they appear to Google as a text-only website. In the past we have seen sites accidentally block themselves entirely, which had a massive impact on the traffic to their site.
Google now provide tools as part of Google Webmaster Tools that allow you to see the results of a Google crawl and if any elements on a page are blocked. You or your developers should not just use a default robots.txt for a particular CMS without checking each line to ensure that it’s required or if additional lines need to be added. Make sure that every site has a robots.txt even if it is empty. Sometimes when Google’s crawler bots can’t find a robots.txt file it can assume that the entire site no longer exists and you could take a substantial rankings hit and ultimately lose traffic. Another problem that is often encountered with robots.txt files comes during the launch of a new site. Developers put robots.txt files to block all crawlers during the development phase. This is done so as not to allow a half built site to rank. Forgetting to remove these is a problem I see more often than I would like to. If you would like any further clarification on this issue do not hesitate to get in contact with a member of the team. We’d be happy to help.