A simple perl script to interrogate the Technorati API
Sometimes (for instance when I’m doing the research for the blogger typology) you need to get a whole load of Technorati data for a whole load of blogs.
This research can (of course) be done by hand. And (of course) for a long list of blogs this would take a great deal of time. Handily, Technorati provides developers with an API that lets you automate those queries. An API (for those of you who don’t know) is an Application Programming Interface – a toolkit provided by a service or application (in this case by Technorati) that lets other computer applications ask it questions and use the answers for their own purposes. It may be helpful to think of APIs as being like the knobs on top of a Lego brick that let you stick other Lego on to it without in any way changing the nature of the brick itself. On the other hand it may not be so helpful after all.
After much struggling with a Yahoo! Pipe to query the Technorati API for a list of blogs, I was forced to abandon my attempt. I would have liked to have shared that Pipe with the world (if you’re good with Yahoo! Pipes, do please take a look at it and see if you can help me!) [Tuesday January 6, 2009: Thanks to help and encouragement from Bob Briski, this now looks like it's on its way to working!]
Instead, I’ve written a perl script to do this. Perl isn’t as easy for people to use for themselves as Pipes, but if you are comfortable with a command prompt, then you’re half way there.
What this script does is take a list of blog urls, and for each item in the list queries Technorati for the following information:
- Blog title
- Inbound blogs (the number of unique external blogs linking to the blog over the past six months, this is also known as “Technorati Authority”)
- Inbound links (the total number of links into the site)
- Technorati Rank (a sort of overall score)
The script
[code lang="perl"]#!/usr/bin/perl
# use modules
use LWP::Simple;
use XML::Simple;
# set up variables
open(INFILE, $ARGV[0]) or die "Can't open list of blogs to read: $!";
$apikey='enter your Technorati API key here';
# create object
$xml = new XML::Simple;
# read each line, and make the Technorati API call
while (
chomp;
&callTechnoratiAPI;
}
sub callTechnoratiAPI {
$url = 'http://api.technorati.com/bloginfo?format=xml&key='.$apikey.'&url='.$_;
# get XML file from Technorati
$content = get $url;
die "Can't get $url" unless defined $content;
# read XML file
$data = $xml->XMLin($content);
# access XML data and print TSV to screen
# (you can fiddle with this as much
# or as little as you like)
print ""$data->{document}->{result}->{weblog}->{name}"t";
print "$data->{document}->{result}->{url}t";
print "$data->{document}->{result}->{weblog}->{inboundblogs}t";
print "$data->{document}->{result}->{weblog}->{inboundlinks}t";
print "$data->{document}->{result}->{weblog}->{rank}n";
}[/code]
How to use it
I can’t give you any real advice on how to run perl on your system. If you want to play around with it, Macs come with perl already installed, Windows users should download and install the free ActivePerl. But you’ll need to install the perl bundle XML::Simple, and I don’t know where to begin telling you how to do that if you don’t already know how perl and CPAN work. You see why I wanted to use Yahoo! Pipes?
If all of that doesn’t bother you, you’ll also need to sign up for a Technorati account (if you’re into this sort of thing, you should already have an account), and get your free API key. This key will let you make 500 queries in a 24-hour period, so you’ll need to plan how you use it.
The script as it’s listed above outputs tab-separated values to screen like this:
matm% ./parse_technorati.pl bloglist.txt
"Chris Gilmour's Diary Vol. 14" http://www.illandancient.blogspot.com 6 10 861604
"The Red Rocket: Technology, PR and social media marketing" www.theredrocket.co.uk 15 29 397843
"Going Underground's Blog" http://london-underground.blogspot.com 254 467 13332
The blog’s title and url are followed in order by the inbound blogs (authority) count, the inbound links count, and the Technorati rank.
I use tab-separated values because that makes it simple to cut-and-paste directly into Excel or Google Spreadsheets for further analysis.
Known bugs
Right now, the script occasionally throws out something like this:
matm% ./parse_technorati.pl bloglist.txt
"Lytham Villa" http://lythamvilla.blogspot.com/ HASH(0x8ff7a0) HASH(0x8ff7f4) 4978471
"KickTime || A Driftless Regional Webspace" http://kicktime.org HASH(0x908e0c) HASH(0x908db8) 1951828
I’ll work on this, but if anyone can point me in the right direction, I’ll be most grateful.




I’ve created a small pipe that outputs what you want in the form of an RSS feed. It’s published here: http://pipes.yahoo.com/pipes/pipe.info?_id=OgntVi3a3RG6XTW_1L3fcQ. I haven’t quite figured out how to do line breaks in pipes yet so I’ve separated the data with a double dash. I’ve been playing with pipes a lot lately. I explain a few others at semdevel.com.
Mat, did you hand-code this stuff yourself, or did your lackeys do it? Good stuff, man.
@Bob B. Thank you so much – I love what you’ve done with the Pipe. The string builder is a v. smart way around the problem. It looks to me like you can add “</br>” or “</p>” in place of the “–”, although I’d consider commas as well. In an ideal world I’ll take the data into Google Docs at the end of the process.
Interestingly I’d already come across (and bookmarked) your Top Keywords – Yahoo Pipes post; it helped me solve one of the early problems I’d faced (back when my Pipe was actually working.)
One of the problems I stumbled into (and which I still don’t understand) was the sudden and unexpected failure of Pipes to retrieve my CSV file. Every version I’d used up till then had worked, then – out of nowhere – it stopped working. Cloning the pipe didn’t “reboot”. When I started again from a fresh canvas it fetched the pipes, but couldn’t build the URLs.
I know pipes is a bit beta-y sometimes, but this was just frustrating. Stuff that had worked stopped working. Gah!
Will take a clone of your pipe, if that’s OK, and give you full recognition and links on the new post. Am so pleased. Pipes is a far better tool for sharing with others than the perl script.
@themetz – the perl and Excel stuff I make myself in the evenings and weekends. Lackeys do the complicated stuff like Java.
Actually – my team at Porter Novelli are getting really good at doing this stuff themselves. Part of why I’m doing more of this stuff is so that I can keep just far enough ahead of them that they don’t overtake me!
The web really helps, of course. In the old days, someone with limited coding abilities like myself would have to give up when faced by complicated problems (or buy a reference book, which is often the same thing, I find.) These days, Google and the forums help you solve most things; it’s rare that we find that we’re the first person to have discovered a problem; most times we find that someone has trodden the path before and a solution (or solutions) have been provided. Part of the purpose of this blog is to share stuff back into the community that I will have patched together from other people’s work!
[...] iFAQ « A simple perl script to interrogate the Technorati API [...]
Mat,
Using a <br> worked. Excellent! Thanks for the tip.
[...] Over the holidays, I started playing with a new Yahoo! pipe to pull information from Technorati into a spreadsheet. The reasons why I wanted to do this are covered in this post about the quantitative analysis of blogs, and my eventual perl-based solution to the problem is covered in this post. [...]
i just wanted to say that I love this site