What is Robots.txt?

Introduction
General Robots.txt format
Some Examples of Robot.txt
Why it is beneficial to use Robots.txt?
What to avoid in Robots.txt?

Introduction

A robots.txt is a text file webmasters for well-behaved search engines like Google and other such search engines. This file provides information to the search spiders on the directories that have been to skipped or disallowed in your website. The Robots.txt is a part of the Robot Exclusion Standard Protocol (or simply Robot Exclusion Protocol), a group of web standards that regulates how robots crawl the web, access and index the content etc.

Robots.txt protocol is as important as site structure, site content, search engine friendliness and Meta descriptions. If it is implemented incorrectly, it can easily trip up websites. Small errors in the Robots.txt file can prevent your website from being looked up by search engines. It can also change the way the search engines index your website and this can have an adverse effect on your SEO strategy.

General Robots.txt format

The Robot.txt file has to be placed in the root of your domain.

For Example: domain.com/robots.txt.

The general format used to exclude all robots from indexing certain parts of a website is given below:

User-agent: *
Disallow: /cgi-bin/
Disallow: /temp/
Disallow: /junk

When you use the above-mentioned syntax then an information will be given to the search engine robots to avoid indexing the /cgi-bin, /temp and /junk directories on the website.

Some Examples of Robot.txt

Example 1: Allow indexing of everything

User-agent: *
Disallow:

Example 2: Disallow indexing of everything

User-agent: *
Disallow: /

Example 3: Disallow indexing of a specific folder

User-agent: *
Disallow: /folder/

Example 4: Disallow Googlebot from indexing of a folder, except for allowing the indexing of one file in that folder.

User-agent: *
Disallow: /folder 1/
Allow: /folder1/myfile.html

Example 5: Allow only one specific Robot access

User-agent: *
Disallow: / 
User-agent: Googlebot
Disallow: /

Example 6: To exclude a single robot

User-agent: BadBot
Disallow: /

Why it is beneficial to use Robots.txt?

Robots.txt file will disallow directories that you would not want the search engine robots to index. For example directories like /cig-bin/, /scripts/, /cart/, /wp-admin/ and other directories that may contain sensitive data.
Some directories in your website may contain duplicate contents such as print versions of articles or web pages. You can use ‘Robots.txt’ to allow search engine robots to index only one version of the duplicate content.
You can ensure that the search engine bots index the main content in your website.
You can avoid search engines from indexing certain files in a directory that may contain scripts, personal data or other kinds of sensitive data.

What to avoid in Robots.txt?

Avoid the use of comments in the ‘robots.txt’ file.

Robots.txt file does not have a ‘/allow/’ command. Therefore, avoid using such commands in the file.

Do not list all the files as it will give others information regarding the files you want to hide. Try to put all files in a single directory and disallow that directory.

To know more about on-page optimization check out the on-page optimization checklist 2018.

Writen by

Aiswariya K

Posted On

October 20, 2023

Introduction

General Robots.txt format

Some Examples of Robot.txt

Why it is beneficial to use Robots.txt?

What to avoid in Robots.txt?

We are located in Infopark , Kochi

Our Office