The Facebook Crawler scrapes the HTML of a website that was shared on Facebook via copying and pasting the link or by a Facebook social plugins on the website. The crawler gathers, caches, and displays information about the website such as its title, description, and thumbnail image.
Rangeheader of the crawler request or it should ignore the
The Facebook crawler user agent strings:
To get a current list of IP addresses the crawler uses, run the following command.
whois -h whois.radb.net -- '-i origin AS32934' | grep ^route
These IP addresses change often.
... route: 184.108.40.206/21 route: 220.127.116.11/21 route: 18.104.22.168/20 route: 22.214.171.124/20 route6: 2620:0:1c00::/40 route6: 2a03:2880::/32 route6: 2a03:2880:fffe::/48 route6: 2a03:2880:ffff::/48 route6: 2620:0:1cff::/48 ...
You can simulate a crawler request with the following code if you need to troubleshoot your website:
curl -v --compressed -H "Range: bytes=0-524288" -H "Connection: close" -A "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" "$URL"
You can label pages and objects to change how long Facebook's crawler will wait to check them for new content. Use the
og:ttl object property to limit crawler access if our crawler is being too aggressive.