A robots.txt is a text file webmasters for well-behaved search engines like Google and other such search engines. This file provides information to the search spiders on the directories that have been to skipped or disallowed in your website. The Robots.txt is a part of the Robot Exclusion Standard Protocol (or simply Robot Exclusion Protocol), a group of web standards that regulates how robots crawl the web, access and index the content etc.
Robots.txt protocol is as important as site structure, site content, search engine friendliness and Meta descriptions. If it is implemented incorrectly, it can easily trip up websites. Small errors in the Robots.txt file can prevent your website from being looked up by search engines. It can also change the way the search engines index your website and this can have an adverse effect on your SEO strategy.
The Robot.txt file has to be placed in the root of your domain.
For Example: domain.com/robots.txt.
The general format used to exclude all robots from indexing certain parts of a website is given below:
When you use the above-mentioned syntax then an information will be given to the search engine robots to avoid indexing the /cgi-bin, /temp and /junk directories on the website.
Example 1: Allow indexing of everything
Example 2: Disallow indexing of everything
Example 3: Disallow indexing of a specific folder
Example 4: Disallow Googlebot from indexing of a folder, except for allowing the indexing of one file in that folder.
Disallow: /folder 1/
Example 5: Allow only one specific Robot access
Example 6: To exclude a single robot
Avoid the use of comments in the ‘robots.txt’ file.
Robots.txt file does not have a ‘/allow/’ command. Therefore, avoid using such commands in the file.
Do not list all the files as it will give others information regarding the files you want to hide. Try to put all files in a single directory and disallow that directory.
To know more about on-page optimization check out the on-page optimization checklist 2018.