The #interestingOPMLexperiment (stage 1)

Interesting OPML experiment

A couple of weeks ago, I asked a bunch of people to send me their OPML files (for those of you who aren’t aware, an OPML file is what tells your RSS reader what feeds you’ve subscribed to — it can act as a way of moving your subscriptions between readers.) Some of the more trusting among them agreed, and that gave me the raw material for the first bit of my experiment.

Some red herrings

Along the way I uncovered a couple of things that were interesting but not (entirely) relevant to the experiment.

  1. Some people are cagey about sharing their list of feeds: whether they consider it intellectual property, or whether they think that it may be too revealing, I don’t know.
  2. Lots of people said things like “oh — my RSS reader? Haven’t looked at that in a while. I get all my news off Twitter these days.”

So what’s the experiment about?

You’ll probably know that I’m interested in networks of people and how information flows through those networks. I’m also interested in things like influence and whether and how we can identify and track its effects.

I’d heard about a word of mouth marketing campaign which was set up along the following lines: a sample set of pupils at a given school were asked “who’s the coolest kid in the school?” Clearly some names came up more often than others, whereupon the researchers went to those kids and asked them the same thing. By the end of the process, they had a good idea of who might be the key influencers.

This seemed like a good sort of thing to be doing. It was a simple idea, and apparently easy to execute. Everything else I was working on (the citation analysis and the network analysis in particular) seemed to be complementary.

A little history

So last year we did a version of this experiment for a client. We emailed and phoned a whole load of journalists and analysts whose beats covered our client’s interests and asked them:

Who do you read on a regular basis?

Who (other than you) should we be talking to?

If you were looking for information, where would you start?

Then we approached the people that they’d recommended and asked them the same questions. The purpose (as I’m sure you’ll have guessed) was to create a network map of who was whose go-to guys and girls. We’d take the data that we collected, push it through the usual processes and hey presto! we’d know who was really influential.

But what we actually found was that the journalists and analysts we asked seemed not to have any specialist sources. The people whom they cited were (in no particular order) their colleagues, the companies they covered (and their public relations representatives and agencies). Oh, and Google. Google was cited by everyone.

There were two obvious explanations for this:

  1. It’s the truth: they really didn’t have any better sources. We’d all read about the collapse of journalistic standards; perhaps we were encountering it at first hand?
  2. The privacy and intellectual property argument: maybe the journalists and analysts were constitutionally loathe to reveal their sources?

All my training and experience, however, points to the following reason:

  1. Poor questionnaire design led to an inability to think (when put on the spot) of names and sources whom they believed were their influencers.

This is less unlikely than it may seem. If I were to ask you who your big influencers were, would you be able to answer?

All in all, it was a deeply unsatisfying exercise. We had envisioned what wild success would look like and this wasn’t even close.

Where we are today

I thought that it might be easier to ask people for their OPML files than it had been to ask them who were their influences. This is the equivalent of — say — asking to see a musician’s CD collection, instead of asking them about their musical influences. It’s not necessarily more accurate, but it might help expose a different picture.

Seven of my friends, colleagues and acquaintances sent me their OPML files, and that was enough to get started.

Early results

Between them, the first seven people subscribed to just over 1.5K RSS feeds, which gave me a lot of data to process. Here’s a picture of the network before I started processing the data.

First pass from the OPML experiment

You can see (I think) pretty clearly that there are several blogs (or RSS feeds — I’m using them interchangeably here) in the middle of the map that are linked to by several of the respondents. And there are some (around the edges) that are linked to by a couple of respondents. And there are lots that are linked to by only one respondent.

At this stage we’re interested in “indegree” — or “the number of OPML files in which we find RSS feed X.” Clearly the great majority (just over 1.3K) of the RSS feeds are single hits — that is, they appear in only one respondent’s OPML file. But that still leaves 180 RSS feeds with multiple hits:

Interesting OPML experiment

Those 180 RSS feeds with multiple hits left us with a nice (and fairly predictable) distribution that looks like this:

Clearly the first-generation results aren’t very meaningful or accurate. Most often cited were author Seth Godin, ex-blogger-turned-lifestreamer Steve Rubel, and PR blogger Stephen Davies who were each read by five of the seven respondents. But more than half the respondents read my blog. Despite what I’d like to think, this is clearly an artefact thrown up by the sampling frame (my friends, colleagues and acquaintances).

Where next?

I’m hoping that we can extend this out a generation — and then keep iterating. I’m going to approach everyone with an indegree of 3 or above, and see how many of them will send me their OPML files.

I don’t hold out much hope of getting OPML files from people like Messrs Godin and Rubel, but if I don’t ask, I’ll never know, will I?