A sneak peek at our online influence mapping tool
Porter Novelli has been working on its own “online influencer mapping” tool for about six months now. Recently, I’ve started posting screen grabs on our Flickr page to see what people think about it. I thought it was probably time to share some of the images here.
Version 3.5.4 (Always in Beta)
The project is named Rufus after the character George Carlin played in “Bill & Ted’s Excellent Adventure”.
For those of you who know how network analysis works and what it’s used for, this is revolutionary only in that it’s fast and accurate enough to use as an exploratory tool.
For those of you who have no idea what network analysis is or how it’s used in many, many situations, 2009 would be a really good year to start finding out.
For this graph (which took around 5 mins to generate), we took as a seed list the first 50 back links as generated by Yahoo Site Explorer (http://siteexplorer.search.yahoo.com/.) We’ve tested this up to 100 seeds, but there’s plenty of room to go further.
Taking the data from the previous crawl, I’ve used AnalyticTech’s NetDraw to remove the pendant data and one or two obvious data blips (web services like Feedburner that have a high background presence in most maps).
I’ve also set NetDraw to calculate centrality measures, then size nodes by indegree (citation frequency) and colour them by betweenness (I needed to do a few adjustments to make it look pretty, too…)
We use indegree as a measurement of “authority” within any given data sample. What we can say is that more sites in the sample link to the big circles than link to the small circles. Since they’re all talking about the same subject, it’s generally safe to say that the big circles are more authoritative. But in practice, we have to manually remove a few “noisy” sites that have a high probability of being linked to no matter what the subject matter — generally advertising networks like DoubleClick, measurement sites like Google Analytics or Quantcast, and bookmarking sites like Digg, Delicious, and Reddit)
Betweenness is a useful proxy for “influence” — you can see that (in this particular data set) the big media aggregators like Engadget, the BBC, ZDNet etc. show up much brighter than the other nodes.
The GNU foundation is startlingly bright, but the SourceForge datum will probably change when we get the canonification process working in v3.5.5
Version 3.5.5 (Still in Beta)
The canonify function seems to work now, but I was startled to see that the clump of sites that I thought were SourceForge were, in fact, the output of what is probably a grey-hat search engine site. Oops. Well, we’ll take that faction out of the calculation…
Rufus has a pretty good blacklist and whitelist function: the blacklist excludes sites that we don’t want to crawl, and the whitelist is like the guestlist: it restricts the crawler to a certain set of sites.
When we process the data to remove pendants, sites that are eye-catching here (like the spam site) shrink into nothingness. It’s not the outbound links, but the inbound links that count. Since the grey-hatter has no inbound links, he’ll vanish in a puff of smoke.
We just ran a crawl on a bunch of green-issue bloggers. Because this bunch are serious cross-linkers, it highlighted an interesting new bug.
The canonify-exception list function works for the nodes (so we can trap information about subdomains when we choose to (*.wordpress.com, *typepad.com, *blogspot.com for example) but doesn’t appear to store edge information ("edges" are the links between two nodes)
The result? We’re seeing a load of isolated nodes (isolates).
Normally the only way you’d see an isolate is when a site on our seed list doesn’t link out to any external pages. This happens more than you might think when we’re looking at corporate and product marketing sites, but there’s a theoretical maximum number of isolates N, where N is the number of sites on the seed list.
So we’re back to the drawing board on this particular function… Roll on version 3.5.6
Version 3.5.6 (The Eternal Beta)
Hooray! The problem with the isolates has been fixed in this version, and we’re cooking on gas once more.
Or perhaps – because this is a map of green bloggers and their influence landscape – we’re cooking with a low carbon-emission renewable fuel.
(Lots of credit and thanks to: Darrell for invaluable advice, Nick and Stuart for tireless development, to Kerry & Andy B for beta testing, and to Jean and Gary for funding and vision)








