首页 >> 网站资讯 >>建站知识 >> apache、iis6、ii7独立ip主机屏蔽拦截蜘蛛抓取(适用vps云主机服务器)
详细内容

apache、iis6、ii7独立ip主机屏蔽拦截蜘蛛抓取(适用vps云主机服务器)

c0364b79ab184df3db5e065dc902c7a9_1774835466834_sy7ktvn2cu.png

Linux下 规则文件.htaccess(手工创建.htaccess文件到站点根目录)

<IfModule mod_rewrite.c>
RewriteEngine On
#Block spider
RewriteCond %{HTTP_USER_AGENT}   "Bytespider|Amazonbot|YisouSpider|ClaudeBot|GPTBot|meta-externalagent|SemrushBot|DotBot|BLEXBot|SMTBot|PetalBot|Apache-HttpClient|SemrushBot|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|curl|perl|Python|Wget|Xenu|ZmEu|^$"   [NC]
RewriteRule !(^robots\.txt$) - [F]
</IfModule>

windows2003下 规则文件httpd.conf 

#Block spider
RewriteCond %{HTTP_USER_AGENT}   (Bytespider|Amazonbot|YisouSpider|ClaudeBot|GPTBot|meta-externalagent|SemrushBot|DotBot|BLEXBot|SMTBot|PetalBot|Apache-HttpClient|SemrushBot|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|curl|perl|Python|Wget|Xenu|ZmEu|^$)   [NC]
RewriteRule !(^/robots.txt$) - [F]

windows2008下 web.config

<?xml version="1.0" encoding="UTF-8"?>
  <configuration>
      <system.webServer>
       <rewrite>  
         <rules>         
<rule name="Block spider">
      <match url="(^robots.txt$)"   ignoreCase="false" negate="true" />
      <conditions>
        <add   input="{HTTP_USER_AGENT}"   pattern="Bytespider|Amazonbot|YisouSpider|ClaudeBot|GPTBot|meta-externalagent|SemrushBot|DotBot|BLEXBot|SMTBot|PetalBot|Apache-HttpClient|SemrushBot|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|curl|perl|Python|Wget|Xenu|ZmEu|^$"   ignoreCase="true" />
      </conditions>
      <action   type="AbortRequest" />
</rule>
        </rules>  
        </rewrite>  
       </system.webServer>
  </configuration>

Nginx对应屏蔽规则

代码需添加到对应站点配置文件server段内

if ($http_user_agent ~* "Bytespider|Amazonbot|YisouSpider|ClaudeBot|GPTBot|meta-externalagent|SemrushBot|DotBot|BLEXBot|SMTBot|PetalBot|Apache-HttpClient|Bytespider|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$"   )
{
  return 444;
}

注:规则中默认屏蔽部分不明蜘蛛,要屏蔽其他蜘蛛按规则添加即可

附各大蜘蛛名字:

google蜘蛛:googlebot

百度蜘蛛:baiduspider

百度手机蜘蛛:baiduboxapp

yahoo蜘蛛:slurp

alexa蜘蛛:ia_archiver

msn蜘蛛:msnbot

bing蜘蛛:bingbot

altavista蜘蛛:scooter

lycos蜘蛛:lycos_spider_(t-rex)

alltheweb蜘蛛:fast-webcrawler

inktomi蜘蛛:slurp

有道蜘蛛:YodaoBot和OutfoxBot

热土蜘蛛:Adminrtspider

搜狗蜘蛛:sogou spider

SOSO蜘蛛:sosospider

360搜蜘蛛:360spider

 


seo seo