A robots.txt is a text file webmasters for well-behaved search engines like Google and other such search engines. This file provides information to the search spiders on the directories that have been to skipped or disallowed in your website. The Robots.txt is a part of the Robot Exclusion Standard Protocol (or simply Robot Exclusion Protocol), a group of web standards that regulates how robots crawl the web, access and index the content etc.
Robots.txt protocol is as important as site structure, site content, search engine friendliness and Meta descriptions. If it is implemented incorrectly, it can easily trip up websites. Small errors in the Robots.txt file can prevent your website from being looked up by search engines. It can also change the way the search engines index your website and this can have an adverse effect on your SEO strategy.
General Robots.txt format
The Robot.txt file has to be placed in the root of your domain.
For Example: domain.com/robots.txt.
The general format used to exclude all robots from indexing certain parts of a website is given below:
When you use the above-mentioned syntax then an information will be given to the search engine robots to avoid indexing the /cgi-bin, /temp and /junk directories on the website.
Some Examples of Robot.txt
Example 1: Allow indexing of everything
Example 2: Disallow indexing of everything
Example 3: Disallow indexing of a specific folder
Example 4: Disallow Googlebot from indexing of a folder, except for allowing the indexing of one file in that folder.
Disallow: /folder 1/
Example 5: Allow only one specific Robot access
Example 6: To exclude a single robot
Why it is beneficial to use Robots.txt?
- Robots.txt file will disallow directories that you would not want the search engine robots to index. For example directories like /cig-bin/, /scripts/, /cart/, /wp-admin/ and other directories that may contain sensitive data.
- Some directories in your website may contain duplicate contents such as print versions of articles or web pages. You can use ‘Robots.txt’ to allow search engine robots to index only one version of the duplicate content.
- You can ensure that the search engine bots index the main content in your website.
- You can avoid search engines from indexing certain files in a directory that may contain scripts, personal data or other kinds of sensitive data.
What to avoid in Robots.txt?
Avoid the use of comments in the ‘robots.txt’ file.
Robots.txt file does not have a ‘/allow/’ command. Therefore, avoid using such commands in the file.
Do not list all the files as it will give others information regarding the files you want to hide. Try to put all files in a single directory and disallow that directory.
To know more about on-page optimization check out the on-page optimization checklist 2018.