Feb 21, 2008

Statistics

I remember I wrote a script some time ago to analyze outbound links from a tumblelog using the Tumblr API to see if I could spot a pattern; it was one of those Japanese tumblelogs that update a billion times a day. I was wondering if it was spam. The data were inconclusive (bummer!), but it later dawned on me that the concept could be extended to provide some more general statistics on a tumblelog’s linking, as well as content in general. So now I have some fun stats for Daily Meh. If you want, you can get stats, too. But before that, here’s what mine look like:

1078 total posts. The distribution between content types looks like this:

  • 66 regular text posts (~6%)
  • 641 link posts (~59%)
  • 254 photo posts (~23%)
  • 114 quote posts (~10%)
  • 3 chat posts (~0%)
  • 0 audio posts (~0%)
  • 0 video posts (~0%)

Here are the 30 most linked-to websites:

  1. en.wikipedia.org, linked to 139 times
  2. www.xkcd.com, linked to 37 times
  3. flickr.com, linked to 33 times
  4. news.bbc.co.uk, linked to 25 times
  5. anarchaia.org, linked to 24 times
  6. numblr.nostrich.net, linked to 23 times
  7. dailymeh.tumblr.com, linked to 21 times
  8. cameron.io, linked to 18 times
  9. www.flickr.com, linked to 16 times
  10. www.nytimes.com, linked to 16 times
  11. www.amazon.com, linked to 14 times
  12. tumblelog.marco.org, linked to 13 times
  13. cubicle17.com, linked to 13 times
  14. www.imdb.com, linked to 13 times
  15. reddit.com, linked to 12 times
  16. 3quarksdaily.com, linked to 12 times
  17. www.wired.com, linked to 11 times
  18. strangemaps.wordpress.com, linked to 11 times
  19. szymon.tumblr.com, linked to 10 times
  20. scienceblogs.com, linked to 10 times
  21. www.linesandcolors.com, linked to 9 times
  22. obsoleteskills.com, linked to 9 times
  23. www.kottke.org, linked to 9 times
  24. toldorknown.tumblr.com, linked to 9 times
  25. blog.simoncrowley.net, linked to 9 times
  26. 3quarksdaily.blogs.com, linked to 9 times
  27. www.telegraph.co.uk, linked to 8 times
  28. programming.reddit.com, linked to 8 times
  29. www.economist.com, linked to 8 times
  30. revista.tumblr.com, linked to 8 times

If you want, you can have the script (written in Ruby, using the awesome Hpricot XML library written by the ever-wonderful _why) that generated the above stats: tumblr.rb. It requires Ruby as well as the Hpricot gem (gem install hpricot if you don’t have it). Then run it like this:

ruby tumblr.rb [url of your tumblelog] [optional filename to write stats to, defaults to ./stats.html] [optional parameter to decide how many of the top outbound links to show, by default 30]

The script will then use the Tumblr api, Hpricot, a regular expression and some Ruby magic and output the statistics, like above. Note: the links are literally every link, not just link posts, but also in regular text posts and descriptions of other posts, in quotes, in quote sources, and so on. The script isn’t exactly extensively tested, but it was able to generate the above statistics on my machine, so I know it works under optimal conditions. The worst that can happen is it terminating with an error message.

By the way, the xkcd stats are heavily skewed by this post.

About
Daily Meh is written and edited by Simen (contact me). It is, basically, about whatever interests me. Some things that have held my interest over time: philosophy, photography, logic, the internet, pop culture, not-at-all-popular culture, computer science, linguistics and speculative fiction. Among other things. You might also like to know that I live and go to school in a small town in Norway. You can subscribe via RSS.