Apache: Bots, Crawlers & htaccess

  Apache

Just grabbed the user agent out of apache (combined) log file:

cat my_logfile.log | awk '{print$14}' |cut -d'/' -f1 |sort |uniq -dc |sort -nr
 xxxx SemrushBot
 xxxx MJ12bot

Apache (combined) log file

cat my_logfile.log | awk '{print$12}' |cut -d'"' -f2 |cut -d'/' -f1 |sort |uniq -dc |sort -nr

xxxx DomainCrawler
xxxx Mozilla
xxxx Baiduspider
xxxx Yandex

 

Dropped them in an htaccess (apache) file

BrowserMatchNoCase AhrefsBot bad_bot
BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase DomainCrawler bad_bot
BrowserMatchNoCase DotBot bad_bot
BrowserMatchNoCase DnyzBot bad_bot
BrowserMatchNoCase MJ12bot bad_bot
BrowserMatchNoCase SemrushBot bad_bot
BrowserMatchNoCase SeznamBot bad_bot
BrowserMatchNoCase Yandex bad_bot
Order Deny,Allow
Deny from env=bad_bot

Old way was this:

RewriteEngine on 
RewriteCond %{HTTP_USER_AGENT} ^DomainCrawler [NC] 
RewriteCond %{HTTP_USER_AGENT} ^SemrushBot [NC] 
RewriteCond %{HTTP_USER_AGENT} ^MJ12bot [NC] 
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [NC] 
RewriteCond %{HTTP_USER_AGENT} ^Yandex [NC] 
RewriteRule .* - [R=403,L]