Just grabbed the user agent out of apache (combined) log file:
cat my_logfile.log | awk '{print$14}' |cut -d'/' -f1 |sort |uniq -dc |sort -nr xxxx SemrushBot xxxx MJ12bot
Apache (combined) log file
cat my_logfile.log | awk '{print$12}' |cut -d'"' -f2 |cut -d'/' -f1 |sort |uniq -dc |sort -nr xxxx DomainCrawler xxxx Mozilla xxxx Baiduspider xxxx Yandex
Dropped them in an htaccess (apache) file
BrowserMatchNoCase AhrefsBot bad_bot BrowserMatchNoCase Baiduspider bad_bot BrowserMatchNoCase DomainCrawler bad_bot BrowserMatchNoCase DotBot bad_bot BrowserMatchNoCase DnyzBot bad_bot BrowserMatchNoCase MJ12bot bad_bot BrowserMatchNoCase SemrushBot bad_bot BrowserMatchNoCase SeznamBot bad_bot BrowserMatchNoCase Yandex bad_bot Order Deny,Allow Deny from env=bad_bot
Old way was this:
RewriteEngine on RewriteCond %{HTTP_USER_AGENT} ^DomainCrawler [NC] RewriteCond %{HTTP_USER_AGENT} ^SemrushBot [NC] RewriteCond %{HTTP_USER_AGENT} ^MJ12bot [NC] RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [NC] RewriteCond %{HTTP_USER_AGENT} ^Yandex [NC] RewriteRule .* - [R=403,L]