Support Central » Knowledgebase » Knowledgebase Article Not logged in. Login
 
Announcements

Knowledge Base

Submit a Ticket

Client Login
Article » [How To] - Block Bad Bots from accessing your Website

Rating:
Was this article helpful? YES NO MODERATELY
Views: 3370
Printable Version


What is a Bad Bot?
They can be thought of as the bots or spiders that do more harm than good to your website.

An example of a bad bot would be an email harvester which scans your web page code for email addresses that can then be used to send spam to. Another example is an unwanted bot which consumes too much bandwidth, or causes the load to go up on your server, causing it to go slow or at worst, completely offline due to overload.

While the worst of the "Bad Bots" will ignore your robots.txt directives completely, there are some bots that are not necessarily intending to be a Bad Bot, but may they may be unwanted by you. For the bots that ignore your robots.txt file, they would need to be blocked by using user-agent directives in your .htaccess file, but that topic is beyond the scope of this simple guide.

For the bots that are not intending to be malicious, but sometimes are, we can take care of them in your robots.txt file. For example, if you have a site based in the USA, you may not want bots from Non-English speaking countries coming in and eating up your bandwidth or other resources. Many bots will follow your rules, and this simple guide can help you to control the bots which access your site.

How to block Bad Bots
Follow these steps to block the bad bots and spiders from accessing your website.

Step 1:
Open your favorite text editor and create a file called robots.txt.

Step 2:
Place the following code in this file.
CODE:
# Deny all robots that we do not specifically want to allow
User-agent: *
Disallow: /

# Allow these robots only
User-agent: googlebot
Allow: / 


The code above will block all bots from accessing your website, with the exception of Google (googlebot).

**See the end of this post for more search engines / robots that are safe to add to your robots.txt file.

Step 3:
Save the file and upload it to your public_html directory. You can upload it via FTP or through the cPanel file manager.

More Good Bots to allow
The example above only uses Googlebot. There are others that you may want to add to your robots.txt file. Here are a few.


Googlebot-News - Google News
Googlebot-Image - Google Images
Googlebot-Mobile - Google Mobile
MSNBot - Microsoft MSN
Teoma - Teoma Search
bingbot - Bing Search
Slurp - yahoo! Search
Scooter - AltaVista Search
Scrubby - Scrub the Web


You can add them into the robots.txt file in the following format:
CODE:
User-agent: BOTNAME
Allow: /


Where BOTNAME is the name of the bot listed above.

So one example of a robots.txt file which bans all robots except yahoo, bing, and google might look like this:

CODE:
# Deny all robots that we do not specifically want to allow
User-agent: *
Disallow: /

# Allow these robots only
User-agent: slurp
Allow: / 

User-agent: bingbot
Allow: /

User-agent: googlebot
Allow: /



If you have any further questions, please feel free to register and post a reply in this thread.

Follow-up and discuss this topic in our forums
Help Desk Powered By ProSupport v0.9.1