Today is Silke’s turn to shine again, this time with an intelligent bit on community detection! I have just one thing to say:
Good moaning guv!
DataNinjas are back, bringing the best of SNA to you all the way from Fog City aka London!
While Yanne withers away back home in Leuven, I have been trying very hard to drink my sorrow away in the many pubs with my new (and I must admit: v. amusing – just in case they’re reading this *wink wink*) colleagues.
Cheers, sweetie darling!
For the few hours that I’ve been sober though, I’ve been focusing on another aspect of SNA: community detection. Not meaning that I’ve been walking around in the city with my binoculars, spying on the artistic, academic, queer communities – although I have, obviously.
We’re – of course – talking about community detection in a network, or by that we mean: taking a look at the structure of the network by focusing on its underlying sub-units, which are made up of highly interconnected nodes.
In Gephi, you can automatically get the software to divide your network in different communities. It also provides you with a value for modularity. This value (between -1 & 1) measures the density of the edges inside the communities in comparison to the density of the edges between the different communities. So it’s actually a measure for how good (or bad) the chosen community detection method is.
These features are really nice, colouring the nodes according to which sub-unit Gephi assigns them to. Yes, you get a lovely visual result –
*cue the overly eager data-newbie: “OMG visual result!?!” bangs head against wall… *
– it is nevertheless essential that you understand how and why that’s the result you get.
Some really intelligent datageeks (the Dataninja’s are mere enthusiastic datageeks, not the intelligent variant, I’m afraid to say. We deeply apologize if we have disappointed you, sir/madam/Lord/Lady/Reverend/HRH/HM. Please don’t press charges, we are very poor. We eat jellybeans and disgusting key lime pie cookies to sustain ourselves. Seriously, no money to be found here.)
Anyway anyway… as I was saying, some really intelligent datageeks have come up with various algorithms for community detection in networks. There are many around, but if you’re using Gephi, then it’s the Louvain-algorithm ( Blondel, Fast Unfolding of Communities in Large Networks, J. Stat. Mech. (2008))
you’ll be using (yay Belgium!).
You might as well check out the post itself, it’s pretty neat!
Although it may sound a tad scary, the Louvain algorithm is actually quite a simple idea – to understand that is, not to come up with yourself, duuuh. So firstly, each node is given its own community. I would be my own community and Yanne would be her own community. Barack Obama would be his own community *casually adding a similar person, tu-tu-tum* This means that if we’ve got 3 nodes in our network, we start out with 3 communities (or n nodes= n communities). The next step is to move a node to one of its neighbours’ communities and to check whether that improves the modularity value. If it does, then it gets assigned to that community. The algorithm does that for each node and then starts over again and again and again… until, eventually, there’s nothing more to improve. In the second phase the newly obtained communities become the nodes and a similar process takes place. That’s the main idea behind the algorithm. Go check out their article if you’re hungry for more!
If you’re baffled by all this technical gibberish, then we’ve got good news for you. We’ve made some new Harry Potter-examples. Why? Cause it’s fun! Cause it’s simple! Cause it’s AWE-SOME!
Here we go (you can click on the images to get a better look)!
In Fig. 1 people received a colour depending on which Hogwarts’ House they were sorted in. The same network is pictured in the next one (Fig. 2), but here the nodes are coloured according to their loyalty to a particular group (Dumbledore’s Army is like a junior version of the Order of the Phoenix, which is the group of the goodies; Death Eaters=baddies). We have all this knowledge about these people, but what if we didn’t? Would community detection be able to help us (not only with fictional networks) understand the structure of this network without all that previous knowledge?
The above pic (Fig. 3) shows us the result Gephi (and as such the Louvain-algorithm) came up with based only on ‘friendship’-links between people. This result would enable us to predict who would possibly be the baddies and the goodies and gives us an okay view on the different generations and even the Weasley-family (in red) pops out.
Right, so now you know how it works, let’s all get out there and community detect the hell out of things! Results, examples and questions are always welcome at @DataNinjasKUL (twitter)
The London DataNinja is leaving the house with a final message for Leuven DataNinja: