Block image bots / crawlers

You want search engines spiders to crawl your site content, but you don’t want your images being crawled.

If all images are under a certain directory, the simplest way is to update robots.txt file with:

User-Agent: *
Disallow: /images/

To prevent some well known image bots:

User-Agent: Googlebot-Image
Disallow: /
User-Agent: Yahoo-MMCrawler
Disallow: /
User-Agent: psbot
Disallow: /

According to google’s support document, images will be eventually deleted when this is implemented in robots.txt. (Update: one month after this was added, images are still there! Wonder how long it takes to be ‘eventually’ deleted?)

Note

  • there are many image bots / crawlers and these are the common ones. If you know any please add in comment and I’ll update this list.
  • Not all bots respect robots.txt. You will have to block bad bots via configuring web server or firewall.
  • This will not work if you’re using free web services such as blogger, livejournal, wordpress.com etc.

Leave a Reply

Your email address will not be published. Required fields are marked *