Saturday, April 30, 2011

The math of social networks

A social network is constituted by a number of units (nodes) that are connected to each other by a defined relationship -- for example, "x cites y", "x sends 5 email messages a week to y", "x and y belong to an organization in common." There are a few wrinkles -- the units may be persons, organizations, cities, journal articles, or other types of entities; the relationships may be uni-directional or bi-directional; and the linking relationships may represent categorical relationships or intensity relationships. "x and y are friends" is a bi-directional relationship; "x and y are close friends" is a bi-directional relationship recording intensity.

Some of the basic questions about a social network are easy to formulate but difficult to assess. Basically, we would like to know what groups of individuals are unusually closely interconnected with each other, relative to the average for the population as a whole.  Here are a few basic questions that we may have about networks of people.
  • Who is connected to whom? 
  • Are there a subset of persons who are unusually well connected? 
  • Are there sub-groupings of individuals who are more closely connected to each other than they are to others in the network? 
This last point may be put into the language of "communities": are there communities of individuals that can be identified on the basis of mathematical features of their positions within the graph of relationships defined by the data recording pairwise connections?

This question is especially important for sociologists because it goes to the heart of the reason why network maps are of sociological interest in the first place: we think that the social relationships among individuals explain important features of social action -- readiness to mobilize for a political cause, for example; this intuition derives from the idea that individuals influence each other through the exchange of information and the observation of each other's behavior; and so subgroups of persons with especially dense social connections with each other may have distinctive social characteristics as a group. So identifying the "communities" within a social network is an important sociological discovery.

This is where the mathematics of network graphs comes in. We need to have justifiable procedures for partitioning a network into sub-networks. These procedures need to make sense in terms of the intuition that there are often subgroups of nodes more closely related to each other. The procedures need to be non-arbitrary. They should be robust with respect to where we begin -- it shouldn't matter whether we begin analysis with this node or that node. And they need to be consistent with the fact that all the nodes are related to everyone else at some degree of separation. Neighborhoods that are entirely detached from the rest of the population are a trivial case; normally network ties extend transitively throughout a whole society.

Greek mathematician and social scientist Moses Boudourides is focused on this problem in his current work. (Follow him on Twitter at link.)  Boudourides is deeply sensitive to the sociological importance of the questions, so his work does a great job of bridging the two fields of thought. Some of his current work is available online, and it is very useful for people who want to understand more about the mathematics of social networks. It falls into the field of graph theory in mathematics, and it serves as a good tutorial to current thinking about the mathematics of social networks.

Worth reading first is "An Introduction to Community Detection in Graphs" (link). Here Boudourides offers a clear exposition of the mathematical problem of identifying a set of neighborhoods within a complex graph and lays out three approaches that have been taken.
Our aim here is to present an introductory and brief discussion of the formal concept of community in the context of the theory of complex networks (and social network analysis) and to describe (mostly by examples) a few of the many computational techniques which are commonly used for the detection of communities in a graph-theoretic background. (1)
Here is his definition of a community in the context of a network graph:
By a community structure of such a graph, we mean a partition of the set of nodes into a number of groups, called communities, such that all nodes belonging to any one of these groups satisfy a certain property of relative cohesiveness. Note that one may consider partitions, which are not necessarily strict, i.e., one may allow the case of overlapping communities, when there exist graph nodes belonging to more than one groups (communities) of the partition. (1)
The three iterative techniques he describes for analyzing a complex network into sub-communities are --
  1. Betweenness -- Centrality-based community detection
  2. k-Clique percolation
  3. Modularity maximization
In each case the analysis proceeds by working through the graph iteratively, identifying notes and links with certain characteristics, and arriving at a series of stages of community definition.  This process can proceed from above (divisive) or from below (agglomerative).

The "betweenness" approach derives from an application of the idea of "betweenness centrality" of edges: "the number of shortest paths between pairs of nodes that run through that edge" (3).  On this approach, edges are ranked by their betweenness measure; the highest ranked edge is removed; and the process is repeated for the reduced graph.  This is a divisive method.

The k-clique model is an agglomerative approach, or what Boudourides refers to as a "local community-finding approach" (6).  And modularity maximization approach begins with the graph as a whole and looks for regions that are locally higher in density than the graph as a whole.  Boudourides' explanation of each of these methods is technical and clear.  He indicates that the MM approach is most widely used; but that it falls in a class of particularly intractable optimization problems like the traveling salesman problem (NP-complete).  Consequently it is necessary to design heuristic algorithms on the basis of which to arrive at approximate solutions.  (As I understand the point, however, there is no guarantee that the approximate solution will be close to the ideal solution.)

With these tools at hand, he offers a detailed example: analysis of a data set of individuals who participated in peace demonstrations against the war in Iraq and the organizations and issues with which they were associated. Data on these activists are included in the International Peace Protest Survey (IPPS).  And the resulting neighborhood maps are fascinating.  These results are described in detail in a detailed research report on "Communities in the IPPS Survey Data" [link] and a theoretical paper on "Why and How Culture Matters in Community Interorganizational Structure" [link]. These presentations show the real power of mathematical network theory, in that they bring out social relationships among individuals within this population of activists that couldn't be discovered otherwise.  Here are a pair of network graphs for 972 activists in Italy presented in "Culture Matters":



Boudourides' particular goal here is to demonstrate the difference it makes to incorporate "cultural" affiliations into the structural analysis of the first figure.  Incorporating attitudes permits simplification of the community structure of this network, from nine communities in figure 5 to four communities in figure 6.  But more generally, the analysis demonstrates the analytical gain that is possible through this analysis, allowing us to discover important patterns of affiliation among these 900+ activists.  And this, in turn, appears very relevant when we come to trying to understand their behavior within a complex process of collective action. It allows us to give some rigorous detail to the idea that a social movement has a refined micro-structure underlying its macro-level actions and demands.

What is especially useful about these papers is the help they offer us non-specialists in understanding the mathematical techniques on the basis of which we can extract sociologically meaningful information from a network graph. This is a bit analogous to the gain we get from using statistical techniques to analyze and summarize a large data set. The statistical techniques allow us to winnow the data into a few statistical measures. And the techniques of graph theory that Moses Boudourides demonstrates allow a similar analytical power for the task of making sociological sense of a large network of connected individuals. In both cases it is necessary for us to understand the basics of the mathematical techniques if we are to use the tool appropriately.

No comments:

Post a Comment