Tag Archive for 'research'

The #interestingOPMLexperiment (stage 1)

Interesting OPML experiment

A couple of weeks ago, I asked a bunch of people to send me their OPML files (for those of you who aren’t aware, an OPML file is what tells your RSS reader what feeds you’ve subscribed to — it can act as a way of moving your subscriptions between readers.) Some of the more trusting among them agreed, and that gave me the raw material for the first bit of my experiment.

Some red herrings

Along the way I uncovered a couple of things that were interesting but not (entirely) relevant to the experiment.

  1. Some people are cagey about sharing their list of feeds: whether they consider it intellectual property, or whether they think that it may be too revealing, I don’t know.
  2. Lots of people said things like “oh — my RSS reader? Haven’t looked at that in a while. I get all my news off Twitter these days.”

Continue reading ‘The #interestingOPMLexperiment (stage 1)’

Can we calculate party affiliation? (the US Congress Edition)

Using nothing more than their public twitter relationships, is it possible to predict whether a US Congressperson is a Republican or a Democrat? The answer seems to be a guarded “yes” — our tools predict correctly 40/46 times (or around 87% of the cases.)

Calculated Party Affinity US Congress

This post follows on from a post earlier today in which I asked, “can we calculate party affiliation?” The data set in the earlier post was gathered from the 16 members of the UK parliament who are on Twitter and the relationships between them.

Tweetcongress maintains a list of US congresspeople on Twitter. Today (February 13, 2009) there are 76 congresspeople on the service, but when I collected my data set of “who follows who” on February 3, 2009 there were only 65. Of these 65, fully 19 (29%) lived a life of noble isolation with regards the network — none of their peers linked to them, and they in turn linked to none of their peers. Removing these Miss Havishams from the data set leaves me with 46 twittering congresspeople who form a network.

Now as both social network analysis and Aesop would have it, “a man is known by the company he keeps.” What I mean by this is that given the partisan nature of politics, we should expect that Democrats will link to other Democrat twitterers more often than they link to Republican twitterers and vice versa. So that’s what NetDraw[1] , the software I’m using for most of this stuff, looks for, or more accurately:

To identify factions, NetDraw software iteratively searches for a distribution of nodes among a selected number of factions to minimise the number of connections between factions and to maximize the number of connections within factions.

Whatever. So I let NetDraw loose on the data, and here’s what it did.

Calculated Party Affinity US Congress

I coloured the nodes red for Republican and blue for Democrats[2], labeled the nodes by party (for the sake of clarity, and for the hard-of-thinking, that’s “R” for Republican and “D” for Democrat) then counted all the nodes where label said one thing but colour another. There were six of these nodes; so NetDraw got the answer right 40⁄46 of the time (just about 87%.) This is less than the astonishing 93.75% accuracy we got with the Westminster twittering members of parliament in the previous post. Nevertheless I think we can safely say that it’s not a particularly integrated (or bipartisan) network if we can predict party affiliation with quite such success.

Here’s exactly the same map with the errant sheep re-labeled with their proper names so it’ll be easier to refer to them (if it helps, you can click on the image to view or download a larger version.)

congress guesswork incorrect labels

You’ll see, I hope, that NetDraw has made a pretty good fist of the job. Where it has gone wrong on the whole is where the data clearly suggests something else. So Rep. Jared Polis for instance follows (and is followed by) no Democrat peers. Rep. Nancy Pelosi (D) and Sen. Richard Durbin (D) follow each other, but since Pelosi is followed by several Republicans and none of her other Democrat peers you can see why the algorithm has made the incorrect guess that the two of them are Republicans. Long-serving member Neil Abercrombie, as discussed in a previous post on US Congress Twitter folk, forms a bit of a bridge between the two parties, so despite his membership of the Congressional Progressive Caucus and liberal voting record, from the Twitter network point of view, his affiliation is somewhat ambiguous.

Sen. McCain follows none of his peers, and appears to inherit his incorrect attribution from Sen. Susan Collins. For the life of me, I can’t work out what makes it think that Sen. Susan Collins is a Democrat. She really isn’t, you know.

Note 1: NetDraw is a free program written by Steve Borgatti from the University of Kentucky. If you’re interested in playing around with this stuff, you’ll need to get yourself a copy.

Note 2: Actually, that’s not true. Despite a friend sharing the simple mnemonic that “‘Republicans’ and ‘red’ begin with the same letter,” I just can’t get it out of my English head that the Republicans should be blue and the Democrats red. As a result I waste precious minutes re-colouring these maps in Illustrator. It is worth pointing out that I also have problems with “left” and “right” on occasion — preferring instead the binary opposition “left” and “No! no! The other left, for God’s sake!”

Can we calculate party affiliation? (The Westminster edition)

This is a follow-up post to Why doesn’t the Tory MP have Twitter friends? — a report on some early research into the interrelationships between the few Westminster MPs who are on Twitter.

According to Tweetminster, the number of UK MPs on Twitter has doubled since this time last month. Where there were eight Twittering MPs, there are now sixteen. Here’s the map that shows who follows whom (the labels may be too small to read — if you want to see a larger image, click on the map.

Actual factions among Westminster MPs on Twitter

I’ve coloured each node to show party affiliation; for those of you who are unfamiliar with British politics, Labour (our left-of-centre party) shows up in red, Conservatives (our right-of-centre party) in blue, and Liberal Democrats (what it says on the tin) in yellow.

The size of each node represents the individual’s “betweenness centrality” — a network analysis term that helps us place a value on individuals within a network. To give you a sense of what it means, the higher the betweenness centrality of an individual, the greater the impact when you take them out of the network. For those of you who work in large companies, it may be worth noting that senior management’s personal assistants generally have very high betweenness — something that is mostly remarked upon when they go on holiday (simultaneous translation: “take a vacation”.)

So far so good. By now, regular readers will probably be kissing their teeth and thinking “so what?” I’ve done a lot of these Twitter maps in the past and the novelty must be wearing off on you by now.

So here’s the thing. There are a few network analysis techniques that let one identify cliques and factions. What we’ve got here is a small set where we already know what people’s affiliations should be. How interesting, I thought, would it be to see how well the calculated result fits the real world data? Here’s what I found:
Continue reading ‘Can we calculate party affiliation? (The Westminster edition)’

Creating blog seed lists for research

Colleagues and regular readers will know that we’ve been working on an “online influencer mapping” tool called Rufus. Those of you who’ve had a chance to use Rufus will know that it requires a seed list of URLs to get started. Creating this seed list can be automated in one or two ways, but one of the fastest, most effective, and most sensible ways to build a seed list is still to do it by hand.

We’ve got one or two other processes that also require us to build a seed list. No doubt other people do too — lots of web research is quite data hungry. So — because I’ve found myself telling a few of my Porter Novelli colleagues how we go about the process, I thought I’d share it here, in the interests of:

  1. having somewhere to point people in future,
  2. general good-heartedness: I’ve learned a lot from people in the past, and I like to give stuff back, and
  3. getting feedback and tips from people about how they might go about the same process.

Oh – and while these methods should work in any language, please bear in mind that I tend to think and work in English. I’d appreciate feedback on how best to localize these methods.

Building a seed list: 5 easy methods

With all these methods, there’s no substitute for checking out the blog. I don’t ask people to read the blog (that comes at a different stage of the process altogether) but you should at least click through and see what you’re dealing with. In fact, method 3 rather relies on you visiting the blogs you’re researching.

1. Look for someone who has already done your research for you

Start by being optimistic. Generally you’ll find that someone else has created a list of the “top ten” (or however many) blogs in the niche that interests you. Take a look at Brendan’s regularly updated PR Friendly Index for example. If you’re searching for English language blogs then you could do worse than start by looking at Guy Kawasaki’s Alltop. But simply Googling for lists of blogs or blog charts should get you a long way.

This is generally a source of fairly high-quality data. One thing to watch out for, though, are search engine spamming link farms, and shady “Make Money Online” (MMO) directories. You’ll learn to recognize these soon enough, but as long as you’re visiting all the blogs you’re putting on your seed list you should be alright.

2. Do a tag search on delicious

I picked up this technique from Anthony Mayfield, who showed me that by searching on the delicious social bookmarking site for the tags “xxx” and “cool” and/or “inspiration” you could find sites about “xxx” that people thought were cool. Knowing what your digital trendsetters think is cool is one hell of an insight.

For our purposes though, we’re looking for cool blogs. So (1) click the “Explore Tags” tab on the home page, and then (2) type your keyword and the word “blog” into the search box. Couldn’t be simpler?

Use the 'Explore' tab in delicious to find blogs for your seed list

Well — actually it could be simpler. You can query the delicious database when you type the URL into the address bar of your browser like this:

http://delicious.com/tag/blog+keyword

Where “keyword” is the word you’re looking for.

When you get the results, check the ones that (a) have the right kind of title (if you’re looking for French blogs, look for French titles for example), (b) have the right kind of tags and description and (c) have been bookmarked most often

If there’s a better local language social bookmarking site, I’d use that whenever possible. For example, Mister Wong is a good one for German language sites.

A quick note: social news sites like Digg and Reddit, and “serendipity browsers” like StumbleUpon tend not to work so well in my experience.

This method also owes a lot to Marshall Kirkpatrick. You might like to try out the Yahoo! Pipe that I built based on the process that Marshall documents.

3. Look for blog rolls

On every blog you visit during the research process, look for the blog roll — and check the likely-looking links. See if they’re useful or useless. Quite often you’ll find that someone who has an interest in widgets will also read and link to blogs that cover widgets. That, after all, is the principle on which Rufus works wrote small. So we reckon it’s a pretty good approach.

4. Ask your Twitter followers

Seriously — this works. Well — it worked for me and my team from around +100 followers onwards. I’d be interested in others’ experience.

5. Call someone

Get hold of someone who knows about the subject and phone them up or get them on IM. Category experts are an excellent source of low-volume but high-quality information. It’s time consuming, but can work well if you have the right contacts. Journalist friends might be a great source of blog lists.

I’ve purposefully left this one till last; I think it’s a good rule of thumb to do your desk research before picking up the phone. That way you can ask intelligent questions instead of damn fool ones.

Using a text editor

I try to keep two lists running all the time that I’m working; a scratchpad list of blogs I have yet to visit and the seed list itself. Because I’m on a Mac, I use the excellent BBEdit (there’s a free version called TextWrangler which will be just as good for most people.) If — as is more probable — you’re on a Windows machine, you might like to try the very powerful but slightly less pretty Notepad++. But if you just want to use Excel, though, that’s fine, too.

Automating Marshall Kirkpatrick’s “Social Media Cheatsheet” process with Yahoo! Pipes

Yahoo! Pipe for automating Marshall Kirkpatrick's Social Media CheatSheet process

Yahoo! Pipe for automating Marshall Kirkpatrick’s Social Media CheatSheet process

Marshall Kirkpatrick has published an excellent process for getting up to speed with what the big issues are in your market sector. Is there, he asks:

any way to ramp up your knowledge of these fields, fast, other than the “Google and wander” method?

He then outlines an almost perfect example of how to use social media to do this.

You should read his article before reading any further. It’s short and punchy and won’t take much time.

Read it? Good. Now you may have noticed in the comments section that the first commenter doubts that you can:

find one baker or candlestick maker that will go through all of that.

So I thought I’d see if I can automate the process. The short answer is that I can and I can’t. I can’t yet automate one or two really important bits and pieces, notably:

  1. ranking delicious bookmarks by popularity, not recency
  2. human editorial selection of bookmarks

Perhaps someone could help me with this.
But otherwise, I’ve published this Yahoo! Pipe, Automating Marshall Kirkpatrick’s Social Media Cheatsheet Process which automates 90% of the process, and may make it easier for the bakers and candlestickmakers.

All comments and — more importantly — suggestions and improvements gratefully received.

Monday, 12 Jan 2009 00:27: I’ve just added a bit to the pipe to list posts in descending order according to PostRank. Don’t know if this is useful

Blogger typology: using IBM’s Many Eyes to build matrix charts

Thanks to IBM’s Many Eyes service it’s relatively simple to create complicated visualizations that my current version of Excel can’t handle. For example, this “matrix chart” that I built using Excel’s bubble chart function is clearly unacceptable. I can’t easily link statements or values to the X and Y axes, and there’s lots of overlapping that seems (after many attempts) to be impossible to fix.

Matrix chart built using Excel - not very satisfactory!

Matrix chart built using Excel


Continue reading ‘Blogger typology: using IBM’s Many Eyes to build matrix charts’

A simple perl script to interrogate the Technorati API

Technorati API perl query in action

Sometimes (for instance when I’m doing the research for the blogger typology) you need to get a whole load of Technorati data for a whole load of blogs.

This research can (of course) be done by hand. And (of course) for a long list of blogs this would take a great deal of time. Handily, Technorati provides developers with an API that lets you automate those queries. An API (for those of you who don’t know) is an Application Programming Interface – a toolkit provided by a service or application (in this case by Technorati) that lets other computer applications ask it questions and use the answers for their own purposes. It may be helpful to think of APIs as being like the knobs on top of a Lego brick that let you stick other Lego on to it without in any way changing the nature of the brick itself. On the other hand it may not be so helpful after all.
Continue reading ‘A simple perl script to interrogate the Technorati API’

Your help needed to develop “blogger typology”

(NB: If you have both a blog and a short attention span, please skip the article, and go straight to this short survey. Many thanks!)

What is a blogger? Everyone seems to think they know, and yet the longer I work in this area, the more I realize I know nothing. And the less I know, the more suspicious I become of marketers who use vague terms like “conversation” (which has – after all – become little more than a Latinization of the ghastly “dialogue”.) I can just about understand what Technorati means when they talk about

The ecosystem of interconnected communities of bloggers and readers at the convergence of journalism and conversation.

(State of the Blogosphere 2008)

…but there are an awful lot of long words that could turn out to hide an awful lot. And that’s the carefully thought-out distillation of a bunch of experts. Most of us, most of the time fall back on lazy or confusing language. We talk about “social media” and never stop to think that — depending on who’s doing the talking (and what they have to sell) — what is meant by that apparently innocuous phrase shifts wildly from speaker to speaker.
Continue reading ‘Your help needed to develop “blogger typology”’

Some Twitter Social Network Analysis

On November 10th, Stephen Davies collected together a list of “UK PR people on Twitter” According to PostRank, this (and his earlier post, “UK Journalists on Twitter“) are the most popular posts on his blog.

Then a couple of days later, Stephen Waddington pushed that list through TwitterGrader to come up with his list of “Top 50 UK PR people by Twitter influence

A couple of weeks ago, I was looking for a seed list with which I could test our “whitelist” and “canonify exception” rules on Rufus (the network analysis tool that Porter Novelli has been working on for the past six months.) This isn’t the right place to go into it, but to put it simply, the whitelist restricts the search to domains that are on the list (like a guest list), and the canonify exception list stops Rufus from chopping the subdomains or directories off the list (without this, a site like sethgodin.typepad.com would just show up as typepad.com or en.wikipedia.org/wiki/Social_network_analysis would show up as wikipedia.org. Rufus, by the way, is named after the George Carlin character in Bill & Ted’s Excellent Adventure.

My colleague, Tim Hoang used to work with Stephen W., so he sent him the image. Wadds then posted “the map on his blog“. My flickr page has never had so much activity.

Here’s the original graph:

High network density in twitter UK PR community

Lots of people started drawing conclusions about the nature of PR, or the nature of Twitter from the graphs. There was lots of interesting speculation. Some people thought that this demonstrated how introverted the twitter crowd is. Others thought that it showed how introverted the PR/Social media crowd is. Others seemed to think that it didn’t matter.
Continue reading ‘Some Twitter Social Network Analysis’

Mapping the social graph of weight loss groups

These are the graphs from some research on weightloss groups on Facebook. I’ve processed the data so that:

1) the size of dot is related to "total number of friends" – this only works where a user’s friends are publicly visible – quite often they aren’t, and I haven’t checked to see what the incidence of this privacy setting is generally and specifically

2) all isolates (i.e. those users who have no (public) personal relationships within the group have been removed.

personal weightloss support group

This is the network graph of relationships on a personal weight loss support group. A college student set this up to support her own goals. She told me: " For my group, I just started it out by inviting all of my friends and then some people joined the group who found it in a search, I think. I am amazed by the amount of support I receive from random people who encourage me to keep on going. There are some spammers on the group who are just there trying to sell stuff and that gets annoying, but I know I can’t avoid them."

unofficial weightwatchers support group

This is the network graph of relationships on an unofficial weightwatchers group on Facebook. You can see that there are hardly any member-get-member relationships here. My friend Valery (who has a professorship in this sort of thing at Wharton) says:
"It’s very common that organizations and interest groups become foci for personal networks. In fact, I believe that joint activities are the prevalent mechanism of tie formation. "

But it doesn’t look like it here. Looks to me that – while people may form relationships around special interests – they don’t mirror these on Facebook. Say I suffer from Meniere’s Disease (apparently true) and I participate in a Meniere’s support forum (not true at present), I don’t necessarily make those people my Facebook friends…

blog-related support group

Another example of the "not many personal relationships" graph for a weightloss support group on Facebook.

How do people get information on weight loss? After a few interviews, I think the answer is like this:

1) Influencers are "pull", rather than "push" resources (I’m thinking of going on a particular product, so I mention it casually to several friends to gauge consensus/temperature. One or more of them tell me "oh yes, I’ve heard of that", and one tells me "yes, My friend tried that, and lost 20lbs") This is not an active market. Most people won’t be evangelizing, and evangelizing behaviour may even appear suspicious.

2) That said, people trust strangers to an extraordinary degree. Friend-of-friend endorsement is readily accepted, as is the anonymous commentary on boards & groups. Bloggers are slightly less trustworthy, it seems – because most of them have an axe to grind.

OK — so this really isn’t v. scientific. But compare this to the map of green issue member-get-member activity and you’ll see a huge difference.