过去几个月总是在过一段时间后收到服务器资源负载过高的警告,基本上每次上机检查日志都会发现某个网站被奇怪的恶意爬虫给完整检查了一遍。而且不知道为什么 MJ12bot 总是会检查一些无限重定向的链接,导致服务器资源被无意义地消耗。


几个资源消耗比较高的爬虫包括:

  • dotbot
  • SemrushBot
  • MJ12bot
    • 我特别建议屏蔽 MJ12bot,因为一些版权方会使用这家的爬虫来批量检查网站中是否包含侵犯版权的文件
  • SMTBot

对于这些爬虫建议直接在 robots.txt 中屏蔽掉它们,目前复查日志发现至少它们确实是遵守 robots.txt 规则的

User-agent: dotbot
Disallow: /

User-agent: SemrushBot
Disallow: /

User-agent: MJ12bot
Disallow: /

User-agent: SMTBot
Disallow: /

User-agent: PetalBot
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: CheckMarkNetwork
Disallow: /

User-agent: DigiCert DCV Bot
Disallow: /

912sy.com download resources are from the network, only for learning and reference use, the copyright belongs to the original author, do not use for commercial purposes, please remove yourself within 24 hours after downloading.
If the content published on this site inadvertently violates your rights and interests, please contact us and the site will be deleted within one business day.If you encounter any problems please contact customer service QQ:2385367137
912sy " 屏蔽 MJ12bot 等恶意网页爬虫