A brief twitter experiment

So I've recently posted a few links on Twitter, and I see followers clicking them. But also I see random hits.

Tonight I posted a link to http://transient.email/, a domain I use for "anonymous" emailing, specifically to see which bots hit the URL.

Within two minutes I had 15 visitors the first few of which were:

IP	User-Agent	Request
199.16.156.124	Twitterbot/1.0;	GET /robots.txt
199.16.156.126	Twitterbot/1.0;	GET /robots.txt
54.246.137.243	python-requests/1.2.3 CPython/2.7.2+ Linux/3.0.0-16-virtual	HEAD /
74.112.131.243	Mozilla/5.0 ();	GET /
50.18.102.132	Google-HTTP-Java-Client/1.17.0-rc (gzip)	HEAD /
50.18.102.132	Google-HTTP-Java-Client/1.17.0-rc (gzip)	HEAD /
199.16.156.125	Twitterbot/1.0;	GET /robots.txt
185.20.4.143	Mozilla/5.0 (compatible; TweetmemeBot/3.0; +http://tweetmeme.com/)	GET /
23.227.176.34	MetaURI API/2.0 +metauri.com	GET /
74.6.254.127	Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp);	GET /robots.txt

So what jumps out? The twitterbot makes several requests for /robots.txt, but never actually fetches the page itself which is interesting because there is indeed a prohibition in the supplied /robots.txt file.

A surprise was that both Google and Yahoo seem to follow Twitter links in almost real-time. Though the Yahoo site parsed and honoured /robots.txt the Google spider seemed to only make HEAD requests - and never actually look for the content or the robots file.

In addition to this a bunch of hosts from the Amazon EC2 space made requests, which was perhaps not a surprise. Some automated processing, and classification, no doubt.

Anyway beer. It's been a rough weekend.

Tags: twitter | 2 comments

A brief twitter experiment

Comments on this entry

Recent Posts