Statistics
I remember I wrote a script some time ago to analyze outbound links from a tumblelog using the Tumblr API to see if I could spot a pattern; it was one of those Japanese tumblelogs that update a billion times a day. I was wondering if it was spam. The data were inconclusive (bummer!), but it later dawned on me that the concept could be extended to provide some more general statistics on a tumblelog’s linking, as well as content in general. So now I have some fun stats for Daily Meh. If you want, you can get stats, too. But before that, here’s what mine look like:
1078 total posts. The distribution between content types looks like this:
- 66 regular text posts (~6%)
- 641 link posts (~59%)
- 254 photo posts (~23%)
- 114 quote posts (~10%)
- 3 chat posts (~0%)
- 0 audio posts (~0%)
- 0 video posts (~0%)
Here are the 30 most linked-to websites:
- en.wikipedia.org, linked to 139 times
- www.xkcd.com, linked to 37 times
- flickr.com, linked to 33 times
- news.bbc.co.uk, linked to 25 times
- anarchaia.org, linked to 24 times
- numblr.nostrich.net, linked to 23 times
- dailymeh.tumblr.com, linked to 21 times
- cameron.io, linked to 18 times
- www.flickr.com, linked to 16 times
- www.nytimes.com, linked to 16 times
- www.amazon.com, linked to 14 times
- tumblelog.marco.org, linked to 13 times
- cubicle17.com, linked to 13 times
- www.imdb.com, linked to 13 times
- reddit.com, linked to 12 times
- 3quarksdaily.com, linked to 12 times
- www.wired.com, linked to 11 times
- strangemaps.wordpress.com, linked to 11 times
- szymon.tumblr.com, linked to 10 times
- scienceblogs.com, linked to 10 times
- www.linesandcolors.com, linked to 9 times
- obsoleteskills.com, linked to 9 times
- www.kottke.org, linked to 9 times
- toldorknown.tumblr.com, linked to 9 times
- blog.simoncrowley.net, linked to 9 times
- 3quarksdaily.blogs.com, linked to 9 times
- www.telegraph.co.uk, linked to 8 times
- programming.reddit.com, linked to 8 times
- www.economist.com, linked to 8 times
- revista.tumblr.com, linked to 8 times
If you want, you can have the script (written in Ruby, using the awesome Hpricot XML library written by the ever-wonderful _why) that generated the above stats: tumblr.rb. It requires Ruby as well as the Hpricot gem (gem install hpricot if you don’t have it). Then run it like this:
ruby tumblr.rb [url of your tumblelog] [optional filename to write stats to, defaults to ./stats.html] [optional parameter to decide how many of the top outbound links to show, by default 30]
The script will then use the Tumblr api, Hpricot, a regular expression and some Ruby magic and output the statistics, like above. Note: the links are literally every link, not just link posts, but also in regular text posts and descriptions of other posts, in quotes, in quote sources, and so on. The script isn’t exactly extensively tested, but it was able to generate the above statistics on my machine, so I know it works under optimal conditions. The worst that can happen is it terminating with an error message.
By the way, the xkcd stats are heavily skewed by this post.