Knowledge Base

How Can We Help?

How to Block Bots using Robots.txt File?

You are here:

 

The robots.txt file is a straightforward textual content file positioned in your internet server which tells internet crawlers that if they need to entry a file or not. The robots.txt file controls how search engine spiders see and work together along with your webpages. Attributable to improperly configured ROBORTS.TXT information, the search engine is prevented from indexing an internet site. Additionally, the robots file is used to dam the search engine from indexing an internet site. It might additionally forestall an internet site from being listed on a search engine.

In some circumstances, this bots hits in your web site after which it consumes quite a lot of bandwidth and on account of this your web site will decelerate. It is rather vital to dam such bots to stop such conditions. There’s a likelihood to get quite a lot of visitors in your web site which might trigger issues resembling heavy server load and unstable server. Putting in Mod Safety plugins will forestall most of these points.

 

Correcting the Robots.txt from Blocking all web sites crawlers

The ROBOTS.TXT is a file that’s sometimes discovered on the doc root of the web site. You possibly can edit the robots.txt file utilizing your favourite textual content editor. On this article, we clarify the ROBOTS.TXT file and discover and edit it. The next is the widespread instance of a ROBOTS.TXT file:

Consumer-agent: *

Disallow: /

The * (asterisk) mark with Consumer-agent implies that every one engines like google are allowed to index the positioning. By utilizing the Disallow choice, you’ll be able to limit any search bot or spider for indexing any web page or folder. The “/” after DISALLOW signifies that no pages might be visited by a search engine crawler.

By eradicating the “*” from the Consumer-agent an additionally the “/” from the Disallow choice, you’re going to get the web site listed on Google or different search engine and in addition it permits the search engine to scan your web site.  The next are the steps to modifying the ROBOTS.TXT file:

 

1) login to your cPanel interface.

2) Navigate to the “File Supervisor” and go to your web site root listing.

Robots.txt File

 

3) The ROBOTS.TXT file needs to be in the identical location because the index file of your web site. Edit the ROBOTS.TXT file and add the under code and save the file.

Consumer-agent: *

Disallow: /

 

You too can block a single unhealthy Consumer-Agent in .htaccess file by including the under code.

RewriteEngine On

RewriteCond %HTTP_USER_AGENT Baiduspider [NC]

RewriteRule .* – [F,L]

 

In case you wished to dam a number of Consumer-Agent strings without delay, you might do it like this:

RewriteEngine On

RewriteCond %HTTP_USER_AGENT ^.*(Baiduspider|HTTrack|Yandex).*$ [NC]

RewriteRule .* – [F,L]

 

You too can block particular bots globally. To do that please login to your WHM.

You then would want to navigate to Apache Configuration >> Embrace Editor >> go to “Pre Predominant Embrace” >> choose your apache model (or all variations) >> then insert the code under and click on Replace after which restart apache.

<Listing “/house”>

SetEnvIfNoCase Consumer-Agent “MJ12bot” bad_bots

SetEnvIfNoCase Consumer-Agent “AhrefsBot” bad_bots

SetEnvIfNoCase Consumer-Agent “SemrushBot” bad_bots

SetEnvIfNoCase Consumer-Agent “Baiduspider” bad_bots

<RequireAll>

Require all granted

Require not env bad_bots

</RequireAll>

</Listing>

 

It will undoubtedly scale back the server load and can provide help to to enhance your web site efficiency and velocity.

 

In case you want any additional assist, please do attain our assist division.

 

Leave a Comment