The Facebook Crawler scrapes the HTML of a website that was shared on Facebook via copying and pasting the link or by a Facebook social plugin on the website. The crawler gathers, caches, and displays information about the website such as its title, description, and thumbnail image.
Rangeheader of the crawler request or it should ignore the
The Facebook crawler user agent strings:
To get a current list of IP addresses the crawler uses, run the following command.
whois -h whois.radb.net -- '-i origin AS32934' | grep ^route
These IP addresses change often.
... route: 18.104.22.168/21 route: 22.214.171.124/21 route: 126.96.36.199/20 route: 188.8.131.52/20 route6: 2620:0:1c00::/40 route6: 2a03:2880::/32 route6: 2a03:2880:fffe::/48 route6: 2a03:2880:ffff::/48 route6: 2620:0:1cff::/48 ...
You can simulate a crawler request with the following code if you need to troubleshoot your website:
curl -v --compressed -H "Range: bytes=0-524288" -H "Connection: close" -A "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" "$URL"