7 Network Topography in igraph
7.1 Setup
Find and open your RStudio Project associated with this class. Begin by opening a new script. It’s generally a good idea to place a header at the top of your scripts that tell you what the script does, its name, etc.
#################################################
# What: Network Topography in R
# Created: 02.28.14
# Revised: 01.18.22
#################################################
If you have not set up your RStudio Project to clear the workspace on exit, your environment contain the objects and functions from your prior session. To clear these before beginning use the following command.
rm(list = ls())
Proceed to place the data required for this lab (Anabaptists Leaders.csv
, and Anabaptists Attributes.csv
) also inside your R Project folder. We have placed it in a sub folder titled data
for organizational purposes; however, this is not necessary.
For this exercise, we’ll use the Anabaptist Leadership network and its related attribute data, both of which can be found in the file we shared with you. The data set includes 67 actors, 55 who were sixteenth century Anabaptist leaders and 12 who were prominent Protestant Reformation leaders who had contact with and influenced some of the Anabaptist leaders included in this data set. These network data build upon a smaller dataset (Matthews et al. 2013) that did not include some leading Anabaptist leaders, such as Menno Simons, who is generally seen as the “founder” of the Amish and Mennonites.
We will add a measure here not implemented in the statnet version of this lab, namely the clustering coefficient. A few versions exist but it is conceptually similar to other measures of interconnectedness.
7.2 Load Libraries
Load igraph library.
library(igraph)
It is not currently possible to calculate the E-I index in statnet and igraph, but a package, isnar, has been developed to do just that. Its functionality is demonstrated at the end of this lab. We’ve included the scripts in both statnet and igraph versions of this lab, but you need to do this section only once.
In addition to igraph, we will be introducing and using isnar. Since this may be the first time you are using this tool, please ensure you install it prior to loading it. You will need to install remotes in order to use the function install_github()
to download and set up isnar as it is not published on the CRAN.
install.packages("remotes")
Now install isnar.
::install_github("mbojan/isnar") remotes
Before moving forward, let’s load the isnar package:
library(isnar)
Note: igraph imports the %>%
operator on load (library(igraph)
). This series of exercises leverages the operator because we find it very useful in chaining functions. We occasionally show how to carry out a series of commands with and without piping.
7.3 Import Data
Let’s import the data using the read.csv()
function. Remember, igraph’s graph_adjacency()
function requires a matrix.
<- read.csv("data/Anabaptist Leaders.csv",
anabaptist_matrix header = TRUE,
row.names = 1,
check.names = FALSE) %>%
as.matrix()
Now transform the matrix to an igraph
object.
<- graph.adjacency(anabaptist_matrix,
anabaptist_ig mode = "undirected")
anabaptist_ig
IGRAPH b43b453 UN-- 67 183 --
+ attr: name (v/c)
+ edges from b43b453 (vertex names):
[1] Martin Luther --Ulrich Zwingli Martin Luther --Thomas Muntzer
[3] Martin Luther --Andreas Carlstadt Martin Luther --Caspar Schwenckfeld
[5] Martin Luther --Melchior Hofmann Martin Luther --Philipp Melanchthon
[7] Martin Luther --Martin Bucer John Calvin --Wolfgang Capito
[9] John Calvin --Martin Bucer Ulrich Zwingli--Joachim Vadian
[11] Ulrich Zwingli--Conrad Grebel Ulrich Zwingli--Felix Manz
[13] Ulrich Zwingli--George Blaurock Ulrich Zwingli--Wilhelm Reublin
[15] Ulrich Zwingli--Johannes Brotli Ulrich Zwingli--Louis Haetzer
+ ... omitted several edges
To correctly calculate a number of topographical metrics in igraph (e.g., centralization), we need to make sure that the network is a “simple” graph/network, that is, a network without multiple lines or loops (diagonal). We can check whether the Anabaptist network is a simple graph with the following command:
is_simple(anabaptist_ig)
[1] TRUE
We can see that it is already a simple graph, so we don’t have to simplify it. However, if we need to, we would issue the following command:
simplify(anabaptist_ig,
remove.multiple = TRUE,
remove.loops = TRUE,
)
IGRAPH b44928c UN-- 67 183 --
+ attr: name (v/c)
+ edges from b44928c (vertex names):
[1] Martin Luther --Ulrich Zwingli Martin Luther --Thomas Muntzer
[3] Martin Luther --Andreas Carlstadt Martin Luther --Caspar Schwenckfeld
[5] Martin Luther --Melchior Hofmann Martin Luther --Philipp Melanchthon
[7] Martin Luther --Martin Bucer John Calvin --Wolfgang Capito
[9] John Calvin --Martin Bucer Ulrich Zwingli--Joachim Vadian
[11] Ulrich Zwingli--Conrad Grebel Ulrich Zwingli--Felix Manz
[13] Ulrich Zwingli--George Blaurock Ulrich Zwingli--Wilhelm Reublin
[15] Ulrich Zwingli--Johannes Brotli Ulrich Zwingli--Louis Haetzer
+ ... omitted several edges
Note that the defaults for the remove.multiple
and remove.loops
options are TRUE
, so we didn’t really need to include them in the previous command.
7.4 Network Size and Interconnectedness
7.4.1 Network Size
Network size is a basic descriptive statistic that is important to know because many of the subsequent measures are sensitive to it. Network size is easy to get with the vcount()
function. As you may have noticed, you get the network size as well when you call the igraph
object anabaptist_ig
, which is what we just did in the previous step.
vcount(anabaptist_ig)
[1] 67
7.4.2 Density and Average Degree
Network density equals actual ties divided by all possible ties. However, density tends to decrease as social networks get larger because the number of possible ties increases exponentially, whereas the number of ties that each actor can maintain tends to be limited. Consequently, we can only use it to compare networks of the same size. An alternative to network density is average degree centrality, which is not sensitive to network size and thus can be used to compare different sized networks.
First, calculate density using density using the edge_density()
function.
edge_density(anabaptist_ig)
[1] 0.08276798
In order to calculate the average degree centrality, you will have to calculate vertex degree and proceed taking the average of this vector of scores.
degree(anabaptist_ig) %>%
mean()
[1] 5.462687
You can also do this not using pipes:
mean(degree(anabaptist_ig))
[1] 5.462687
Keep in mind that you may continue refining the output by rounding the value:
edge_density(anabaptist_ig) %>%
round(digits = 3)
[1] 0.083
degree(anabaptist_ig) %>%
mean() %>%
round(digits = 3)
[1] 5.463
7.4.3 Clustering Coefficient (Global and Local)
To calculate this measure, use the transitivity()
function. This measure can be calculated for each vertex (type = "local"
) or as a ratio of triangles and the connected triples in the graph (type = "global"
).
# Traditional transitive measure:
transitivity(anabaptist_ig,
type = "global")
[1] 0.3557214
# Local transitive scores (for each ego):
transitivity(anabaptist_ig,
type = "local")
[1] 0.2857143 1.0000000 0.1923077 0.3333333 0.3818182 0.6666667 0.2380952
[8] 0.2527473 0.6666667 0.4000000 1.0000000 1.0000000 1.0000000 1.0000000
[15] 0.2222222 0.3333333 0.5000000 0.2545455 0.2166667 0.5000000 0.2142857
[22] 0.7000000 0.3888889 0.3333333 0.2500000 0.3333333 1.0000000 0.3214286
[29] 0.4285714 0.4285714 0.3928571 0.5000000 0.7619048 0.8000000 0.8666667
[36] 0.3571429 NaN 0.0000000 0.0000000 NaN 0.8333333 0.5000000
[43] NaN 0.4000000 1.0000000 0.0000000 0.1666667 0.4444444 0.4444444
[50] 1.0000000 NaN 0.4722222 0.4444444 0.5000000 0.2380952 0.5000000
[57] 1.0000000 0.3333333 0.5000000 0.0000000 0.3000000 0.0000000 0.3333333
[64] 0.3111111 0.3111111 NaN 0.0000000
Notice the NaN
values (not a number). We can take the average of local clustering coefficients and ignore these missing values by combining this function with the mean()
function.
mean(
transitivity(anabaptist_ig,
type = "local"),
na.rm = TRUE
)
[1] 0.4605426
Alternatively, rather than removing the NaN
we could zero them out and include them in the calculation of an average clustering coefficient. This is how ORA calculates the measure.
<- transitivity(anabaptist_ig,
trans type = "local")
# Calculate the mean:
mean(
# Recode trans vector, if NaN assing 0, otherwise return value
sapply(trans, function(s) ifelse(is.nan(s), 0, s))
)
[1] 0.4261737
7.4.4 Cohesion and Fragmentation
Now we turn to some additional measures related to the concept of interconnectedness.
cohesion(anabaptist_ig)
[1] 1
Because the network is not disconnected, cohesion is 1.00 and fragmentation is 0.00. However, with a little manipulation, we can also compute distance weighted cohesion and fragmentation, what is often called compactness and breadth.
First, calculate length of all shortest paths from or to the vertices in the graph.
<- distance_table(anabaptist_ig,
anabaptist_dist directed = FALSE)
The distance_table()
function returns a named list with two objects. The first, res
, is a numeric vector of distances. The second, unconnected
, the number of unconnected pairs. The sum of the two is always n(n-1) for directed graphs and n(n-1)/2 for undirected graphs, which is the number of potential pairs in a network.
Cohesion can be calculated by adding the number of connected pairs divided by the total number of possible pairs in the network.
# Calculate cohesion
sum(anabaptist_dist$res) /
sum(anabaptist_dist$res) + anabaptist_dist$unconnected) (
[1] 1
Calculating the fragmentation is as simple as removing the cohesion score from 1.
# Calculate fragmentation
1 - sum(anabaptist_dist$res) /
sum(anabaptist_dist$res) + anabaptist_dist$unconnected) (
[1] 0
7.4.5 Compactness and Breadth
igraph has no direct way to calculate compactness. However, here is how to compute compactness and breadth using the available tools from igraph.
First, calculate the length of all the shortest paths for all vertices in the network.
<- distances(anabaptist_ig) distance
Take a look at the matrix of distances, here only the first four rows and columns:
1:4, 1:4] distance[
Martin Luther John Calvin Ulrich Zwingli Joachim Vadian
Martin Luther 0 2 1 2
John Calvin 2 0 2 3
Ulrich Zwingli 1 2 0 1
Joachim Vadian 2 3 1 0
We can read these distances as steps between nodes. So Martin Luther is two steps away from John Calvin.
Calculating compactness requires calculating the reciprocal distance by taking the inverse of the distances in the matrix, removing the diagonal containing self distance scores and replacing infinite distances (disconnected nodes listed as Inf
) with a zero. Then taking the mean of all reciprocal distances in the matrix.
# Calculate reciprocal distances
<- 1/distance
reciprocal_distances # Modify the reciprocal_distances matrix
diag(reciprocal_distances) <- NA
== Inf] <- 0
reciprocal_distances[reciprocal_distances # Calculate compactness
<- mean(reciprocal_distances, na.rm = TRUE)
compactness compactness
[1] 0.3800372
For breadth, we could, of course, just take the additive inverse of compactness.
<- 1 - compactness
breadth breadth
[1] 0.6199628
7.6 Calculating the E-I Index with isnar
This section is in both statnet and igraph versions of this lab. You only need to do this section one time.
E-I Index indicate the ration of ties a group has to nongroup members. The index equals 1.0 for groups that have all external ties, while a group with -1.0 score has all internal ties. If the internal and external ties are equal, the index equals 0.0.
The E-I Index is not common to many R packages, and it is not as simple as one would think it would be to program. However, there is a package called isnar that does calculate it (Bojanowski 2021). It is written and maintained by Michal Bojanowski (m.bojanowski@icm.edu.pl) as a supplement to igraph. The only thing is that isnar is only available through GitHub. GitHub is a repository for open-source software, like R packages in development.
To estimate the E-I index, we require an attribute vector. Here, we’ll use the Melchiorite attribute included in the attribute file.
<- read.csv("data/Anabaptist Attributes.csv",
attributes header = TRUE)
Take a look at the vector names.
names(attributes)
[1] "ï..Names" "Believers.Baptism" "Violence"
[4] "Munster.Rebellion" "Apocalyptic" "Anabaptist"
[7] "Melchiorite" "Swiss.Brethren" "Denck"
[10] "Hut" "Hutterite" "Other.Anabaptist"
[13] "Lutheran" "Reformed" "Other.Protestant"
[16] "Tradition" "Origin.." "Operate.."
The Melchiorite
vector can be accessed using the [[
accessor. Now, use the ei()
function to get the E-I index.
ei(anabaptist_ig, attributes[["Melchiorite"]],
loops = FALSE, directed = FALSE)
[1] -0.9344262
7.7 Network Level Measures Table
You may want to export out these measures as a table for your report. Luckily, we can use a data.frame
to capture the data in a tabular format, then export it out as a CSV.
# First, create a data.frame of outputs
<- data.frame(
net_topography `size` = vcount(anabaptist_ig),
`average distance` = average.path.length(anabaptist_ig),
`diameter` = diameter(anabaptist_ig),
`degree centralization` = centralization.degree(anabaptist_ig)$centralization,
`standard deviation` = sd(degree(anabaptist_ig)),
`density` = edge_density(anabaptist_ig),
`average degree` = mean(degree(anabaptist_ig)),
`global clustering coefficient` = transitivity(anabaptist_ig, type = "global")
)
Take a look at the output:
str(net_topography)
'data.frame': 1 obs. of 8 variables:
$ size : int 67
$ average.distance : num 3.35
$ diameter : num 9
$ degree.centralization : num 0.16
$ standard.deviation : num 3.43
$ density : num 0.0828
$ average.degree : num 5.46
$ global.clustering.coefficient: num 0.356
Export it out.
write.csv(net_topography, file = "network_topography.csv", row.names = FALSE)
That’s all for igraph now.