What’s that? You want us to tell you all about nodes, edges and SNA metrics? A veritable ‘Social Network Analysis for Dummies’?
You got it!
But before we take you on our little ‘intellectual’ journey, let’s get caffeinated and make ourselves comfortable. Yes, that’s an order! So put your feet up! (if you’re at work: colleagues make great footrests – you’re welcome)
If at this point you’ve made yourself a fort of cushions, you’re my hero.
Anyway, anyway, time to get down to business!
Social Network Analysis for Dummies: Part 1
First of all, what is Social Network Analysis (SNA) all about? As with everything, there are many different definitions floating about but we’ve tried to keep it simple and clear for you.
SNA is a quantitative and qualitative analysis of a social network.
So you start with a social network, which is a structure made up of actors/entities (such as people, companies, whatever you want!) and their relationships. What happens to an actor is dictated by his position and the structure of his connections: this will determine which information or resources will (or will not) reach him and will therefore influence his behaviour or beliefs.
These entities are called the nodes or vertices of a network. The relationships are what we call the ties or edges of a network. Matrices and graphs can be used to visualize this structure.
A matrix sounds like a pretty scary thing, but not to worry! It’s actually quite a straightforward arrangement of our data.
The matrix that is commonly used in SNA is the adjacency matrix. This is a simple square table with the same number of rows and columns as there are nodes. The information in the cells tells us something about the ties between each pair of these nodes. Let’s take a look at the matrix of our first Harry Potter-example (fig. 1)!
Adjacency Matrix Harry Potter Hermione Granger Ron Weasley
Harry Potter 0 1 1
Hermione Granger 1 0 1
Ron Weasley 1 1 0
Easy, right, this Social Network Analysis for Dummies? Of course there’s more to it, but for now that’s all you need to know.
To be honest, I’m just bored with matrices and I want to skip to the more exciting part about visualizing our data: graphs!
Graphs are an important part in the process of analyzing a social network as it presents the data in a very different way than the rather old-fashioned tables.
The graph on the right (fig. 3) represents the table on the left (fig. 2), which is actually six pages long! I think it’s fair to say that the graph shows our data in a much clearer way than the table does. To interpret this kind of graph, you need to know a thing or two about some basic SNA concepts.
Social Network Analysis for Dummies: Part 2
So, we’ve already had the pleasure of being introduced to nodes and edges but now it’s time for a more intimate rendez-vous. Edges, which represent the relationships between nodes, can be directed or undirected. The who-knows-who network below is an example of an undirected network (fig. 4) and the who-loves-who has directed edges (fig. 5).
Undirected networks can be used to represent family ties, taste-related beers, drug cartels, anything you like! Directed networks are used to display information (or other) flow from one actor to another, like letters written or insults hurled from one person to another, or even how STD’s are passed on.
Besides being directed or undirected, these edges can also have a weight attributed to them. If not, it’s assumed their weight is 1 (as in fig. 4).
In the example of our “Who loves who”-network (fig. 5) though, we have different weighted edges. The couples Harry/Ginny and Hermione/Ron both are in reciprocated relationships and so their weight is 2.
Harry loves Ginny =1
+ Ginny loves Harry =1
On the other hand we all know that Bellatrix drools all over Lord Voldemort but her affections are not returned in the same way. So that makes 1.
Bellatrix loves Voldemort =1
but Voldemort does not love Bellatrix =0
Strongly connected components are groups of nodes in which the nodes can all be reached through directed edges. There are also weakly connected components where the direction of the edges is not taken into consideration, so each node can be reached through any kind of edge.
For undirected networks, there’s no such thing as a strongly or weakly connected component as there’s no direction to their edges. Instead, they’re just called connected components.
The largest component in a network is called the giant component of that network. Be nice to it though, cause it’s feeling pretty vulnerable being called a giant all the time.
The geodesic distance is simply the length of the shortest path between two nodes. In the undirected who-knows-who network (fig. 4), the geodesic distance between all two nodes is 1, since everyone is connected to everyone else. In the directed who-loves-who network (fig. 5), the geodesic distance between Ginny and Cedric is 3, since she can only reach him through Harry and Cho.
DEGREE & DEGREE CENTRALITY
The degree of a node is the number of edges that start from or point to a node. So in the who-knows-who network (fig. 4), all actors have a degree of 5, because they are all connected to each other. When the network is directed, there’s a difference between in-degree and out-degree. In the last example (fig. 5), Bellatrix has an out-degree of 1 and no in-degree. Voldemort has exactly the opposite: 1 for in-degree and no out-degree.
The degree of a node can point to a high centrality of that node in the overall network. It is the most commonly used centrality measure for SNA, but it’s worth looking at other centrality measures as they may reveal other entities that have a more hidden centrality.
One of those hidden centralities is the betweenness centrality, where we take a look at how often a node lies on the shortest path between two other nodes. The more a node appears on one of those shortest paths, the higher its betweenness centrality. A node with high betweenness is also called a broker as it fulfills a brokerage position in the network, which means that information needs to pass through that entity to be shared by the other nodes. This also means that these nodes are often the vulnerable points of a network: by cutting them out, chances are the network will fall apart into unconnected components.
(Figures 6 and 7 are actually the same network, but we changed the size of the nodes according to the type of centrality we wanted to emphasize: in fig. 6, the larger the nodes, the higher their degree vs betweenness centrality in fig. 7)
If we take up a Harry Potter example again, then we can see that Severus Snape is the ultimate broker (fig. 7). The “goodies” (= the members of the Order of the Phoenix) share information with each other but they don’t know what the baddies (= the Death Eaters, boo!), are up to. That’s where Snape comes in handy, since he has links with both the goodies and the baddies. So although Dumbledore has the highest degree in this information network (fig. 6), Severus Snape has the highest betweenness and therefore, in this situation, is more centrally placed to convey or obtain information.
But once Snape dies…
Now before we move on to the last part of this post featuring measures for the entire network, I think it’s time for a philosophical intermezzo, don’t you?
Density measures whether there is much cohesion in the network. It is calculated by dividing the number of attested edges by the number of all the possible edges, resulting in a number between 0 and 1. A density of 0 means that all nodes are drifting around helplessly on their own, while in networks with a density of 1 everyone is connected to everyone else, like in the who-knows-who network (fig. 4) we showed before. The density of the Order of the Phoenix vs Death Eaters network (fig. 6 & 7) is 0.484, so this means that 48.4% of all possible ties are present here.
Back to degrees now! We’ve seen degree, in-degree, out-degree but there’s also the degree of the overall network, the average degree. The name says it all: it’s the average of all the node degrees. Like density, it gives us an estimate of how connected the entire network is. So the average degree of the Order of the Phoenix vs Death Eaters network (fig. 6 & 7) is 6.29 (in human language: all actors have an average of 6 connections).
*high-pitched voice* Exciting, isn’t it?!
The clustering coefficient gives you a measure of the tendency of nodes to cluster together in a network. In social networks and particularly in real-world networks, the clustering coefficient is quite high. There’s the famous “six degrees of separation” in the real-world social network, implying that most people are only six (or less) hops separated from one another. That means that we are probably connected to queen Elizabeth II/Daniel Craig/Emma Thompson/… in about six (or less!) hops.
(if you’re a fan of Kevin Bacon, check out the Oracle of Bacon, which lets you calculate the number of hops between any random actor attested in the IMDb and Kevin Bacon)
*DataNinjas hopping and bopping off to success, hoping you’ve enjoyed this Social Network Analysis for Dummies*