Title: | Probability Models and Statistical Analysis of Random Multigraphs |
---|---|
Description: | Methods and models for analysing multigraphs as introduced by Shafie (2015) <doi:10.21307/joss-2019-011>, including methods to study local and global properties <doi:10.1080/0022250X.2016.1219732> and goodness of fit tests. |
Authors: | Termeh Shafie [aut, cre],
David Schoch [ctb] |
Maintainer: | Termeh Shafie <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.0 |
Built: | 2025-02-28 04:16:17 UTC |
Source: | https://github.com/termehs/multigraphr |
Finds the degree sequence of the adjacency matrix of an observed graph or multigraph
get_degree_seq(adj, type = "multigraph")
get_degree_seq(adj, type = "multigraph")
adj |
matrix of integers representing graph adjacency matrix. |
type |
equals |
Gives the degree sequence of the adjacency matrix of an observed graph or multigraph. Note that the matrix diagonal should be even if the matrix represents a multigraph.
Vector of integers representing the degree sequence of a (multi)graph.
Termeh Shafie
Shafie, T. (2015). A Multigraph Approach to Social Network Analysis. Journal of Social Structure, 16.
Shafie, T. (2016). Analyzing Local and Global Properties of Multigraphs. The Journal of Mathematical Sociology, 40(4), 239-264.
#' Shafie, T., Schoch, D. (2021). Multiplexity analysis of networks using multigraph representations. Statistical Methods & Applications 30, 1425–1444.
# Adjacency matrix for undirected network with 3 nodes A <- matrix(c(0, 1, 2, 1, 2, 1, 2, 1, 2), nrow=3, ncol=3) # If A represents a graph get_degree_seq(adj = A, type = 'graph') # If A represents a multigraph get_degree_seq(adj = A, type = 'multigraph')
# Adjacency matrix for undirected network with 3 nodes A <- matrix(c(0, 1, 2, 1, 2, 1, 2, 1, 2), nrow=3, ncol=3) # If A represents a graph get_degree_seq(adj = A, type = 'graph') # If A represents a multigraph get_degree_seq(adj = A, type = 'multigraph')
Calculates the edge assignment probabilities
given specified degree sequence under the two ways in which the RSM
model can be approximated by the IEA model:
- the IEAS (independent edge assignment of stubs) model,
- the ISA (independent stub assignment) model.
get_edge_assignment_probs(m, deg.seq, model)
get_edge_assignment_probs(m, deg.seq, model)
m |
integer giving number of edges in multigraph. |
deg.seq |
vector of integers with the sum equal to 2 |
model |
character string, either |
The IEAS and ISA edge assignment probabilities to
possible vertex pairs are calculated given a fixed degree sequence deq.seq
under the IEAS model, and deg.seq
/2m
under the ISA model.
Number of possible vertex pair sites (and thus the length of the edge assignment sequence) is given by
where n is number of vertices.
A numeric vector representing the edge assignment probabilities
to all possible vertex pair sites. The number of vertex pair sites is given by .
Termeh Shafie
Shafie, T. (2015). A Multigraph Approach to Social Network Analysis. Journal of Social Structure, 16.
Shafie, T. (2016). Analyzing Local and Global Properties of Multigraphs. The Journal of Mathematical Sociology, 40(4), 239-264.
#' Shafie, T., Schoch, D. (2021). Multiplexity analysis of networks using multigraph representations. Statistical Methods & Applications 30, 1425–1444.
# Under the IEAS model with 10 possible vertex pair sites (4 vertices) get_edge_assignment_probs(m = 8, deg.seq = c(4, 4, 4, 4), model = "IEAS") # Under the ISA model with 21 possible vertex pair sites (6 vertices) get_edge_assignment_probs(m = 10, deg.seq = c(8, 4, 2, 2, 2, 2), model = "ISA")
# Under the IEAS model with 10 possible vertex pair sites (4 vertices) get_edge_assignment_probs(m = 8, deg.seq = c(4, 4, 4, 4), model = "IEAS") # Under the ISA model with 21 possible vertex pair sites (6 vertices) get_edge_assignment_probs(m = 10, deg.seq = c(8, 4, 2, 2, 2, 2), model = "ISA")
Given a degree sequence, this function finds all unique multigraphs represented by their edge multiplicity sequences.
get_edge_multip_seq(deg.seq)
get_edge_multip_seq(deg.seq)
deg.seq |
vector of integers with the sum equal to 2 |
Multigraphs are represented by their edge multiplicity sequence M with elements M(i,j),
denoting edge multiplicity at possible vertex pair sites (i,j) ordered according to
(1,1) < (1,2) <···< (1,n) < (2,2) < (2,3) <···< (n,n),
where n is number of nodes.
Only practical for small multigraphs.
All unique edge multiplicity sequences as rows in a data frame. Each row in the data frame represents a unique multigraph given the degree sequence.
Termeh Shafie
Shafie, T. (2015). A Multigraph Approach to Social Network Analysis. Journal of Social Structure, 16.
Shafie, T. (2016). Analyzing Local and Global Properties of Multigraphs. The Journal of Mathematical Sociology, 40(4), 239-264.
#' Shafie, T., Schoch, D. (2021). Multiplexity analysis of networks using multigraph representations.
Statistical Methods & Applications 30, 1425–1444.
# Adjacency matrix for undirected network with 3 nodes A <- matrix(c( 0, 1, 2, 1, 2, 1, 2, 1, 2 ), nrow = 3, ncol = 3) deg <- get_degree_seq(A, "multigraph") get_edge_multip_seq(deg)
# Adjacency matrix for undirected network with 3 nodes A <- matrix(c( 0, 1, 2, 1, 2, 1, 2, 1, 2 ), nrow = 3, ncol = 3) deg <- get_degree_seq(A, "multigraph") get_edge_multip_seq(deg)
Goodness of fits test simulations of specified multigraph models using Pearson (S) and information divergence (A) test statistics under the random stub matching (RSM) and the independent edge assignments (IEA) model, where the latter is either independent edge assignments of stubs (IEAS) or independent stub assignment (ISA).
These can be used to check the reliability of the tests by examining the exact probability distributions of the test statistics and their fit to their asymptotic chi^2-distributions. Only practical for small multigraphs as exact distributions are calculated.
gof_sim(m, model, deg.mod, hyp, deg.hyp)
gof_sim(m, model, deg.mod, hyp, deg.hyp)
m |
integer giving number of edges in multigraph. |
model |
character string representing assumed model, either |
deg.mod |
vector of integers with the sum equal to 2 |
hyp |
character string representing the hypothesized null model, either |
deg.hyp |
vector of integers with the sum equal to to 2 |
The tests are performed using goodness-of-fit measures between simulated edge multiplicity sequence of a multigraph according to an RSM or IEA model, and the expected multiplicity sequence according to a simple or composite IEA hypothesis.
Test statistics of Pearson type (S) and of information divergence (A) type are used and summary of tests given these two statistics are given as output. The adjusted statistics and chi^2-distributions are useful for better power calculations.
Output is generated from function gof_stats
:
test.summary |
Expected value and variances of test statistics ( |
degrees.of.freedom |
Degrees of freedom for tests performed. |
probS |
Probability distributions of Pearson statistic |
probA |
Probability distributions of information divergence statistic |
adjusted.stats |
Expected values and variances for adjusted test statistics, preferred adjusted statistics. |
adjusted.chi2 |
Degrees of freedom for adjusted chi^2-distribution. |
power.apx |
Power approximations according to adjusted statistics. |
Termeh Shafie
Shafie, T. (2015). A Multigraph Approach to Social Network Analysis. Journal of Social Structure, 16.
Shafie, T. (2016). Analyzing Local and Global Properties of Multigraphs. The Journal of Mathematical Sociology, 40(4), 239-264.
gof_stats
,get_edge_assignment_probs
,
nsumk
,rsm_model
# Testing a simple IEAS hypothesis with degree sequence [6,6,6] against # an IEAS model with degree sequence [14,2,2] on a multigraph with n = 3 nodes and m = 9 edges deg.mod <- c(14,2,2) deg.hyp <- c(6,6,6) test1 <- gof_sim(9, 'IEAS', deg.mod, 'IEAS', deg.hyp) # Non-null distributions (pdf's and cdf's) of test statistics S and A are given by test1$probS test1$probA # Testing a simple ISA hypothesis with degree sequence [6,6,6] against # an IEAS model with degree sequence [12,3,3] on a multigraph with n = 3 nodes and m = 9 edges deg.mod <- c(12,3,3) deg.hyp <- c(6,6,6) test2 <- gof_sim(9, 'IEAS', deg.mod, 'ISA', deg.hyp) # Testing a composite IEAS hypothesis against # an RSM model with degree sequence [6,6,6,6] on a multigraph with n = 4 nodes and m = 20 edges. deg.mod <- c(6,6,6,6) test3 <- gof_sim(12, 'RSM', deg.mod, 'IEAS', deg.hyp = 0)
# Testing a simple IEAS hypothesis with degree sequence [6,6,6] against # an IEAS model with degree sequence [14,2,2] on a multigraph with n = 3 nodes and m = 9 edges deg.mod <- c(14,2,2) deg.hyp <- c(6,6,6) test1 <- gof_sim(9, 'IEAS', deg.mod, 'IEAS', deg.hyp) # Non-null distributions (pdf's and cdf's) of test statistics S and A are given by test1$probS test1$probA # Testing a simple ISA hypothesis with degree sequence [6,6,6] against # an IEAS model with degree sequence [12,3,3] on a multigraph with n = 3 nodes and m = 9 edges deg.mod <- c(12,3,3) deg.hyp <- c(6,6,6) test2 <- gof_sim(9, 'IEAS', deg.mod, 'ISA', deg.hyp) # Testing a composite IEAS hypothesis against # an RSM model with degree sequence [6,6,6,6] on a multigraph with n = 4 nodes and m = 20 edges. deg.mod <- c(6,6,6,6) test3 <- gof_sim(12, 'RSM', deg.mod, 'IEAS', deg.hyp = 0)
Goodness of fit between two specified edge multiplicity sequences (e.g. observed vs. expected). Pearson (S) and information divergence (A) tests statistics are used and the exact distribution of these statistics, their asymptotic chi^2-distributions, and their first two central moments are calculated using this function. Only practical for small multigraphs.
gof_stats(m, dof, m.seq, prob.mg, Q.seq)
gof_stats(m, dof, m.seq, prob.mg, Q.seq)
m |
integer giving number of edges in multigraph. |
dof |
integer giving degrees of freedom of test performed. |
m.seq |
matrix of integers, each row representing the edge multiplicity sequence of a multigraph (which correspond to observed values). |
prob.mg |
numerical vector representing a given probability distribution of
multigraphs/edge multiplicity sequences in |
Q.seq |
a numeric vector representing the hypothetical edge assignment probabilities to all possible vertex pair sites (from which expected values are calculate). |
The tests are performed using goodness-of-fit measures between two edge multiplicity sequences (e.g. observed vs. expected).
Test statistics of Pearson type (S) and of information divergence (A) type are used and summary of tests given these two statistics are given as output. The adjusted statistics and chi^2-distributions are useful for better power calculations.
test.summary |
Expected value and variances of test statistics ( |
degrees.of.freedom |
Degrees of freedom for tests performed. |
probS |
Probability distributions of Pearson statistic |
probA |
Probability distributions of information divergence statistic |
adjusted.stats |
Expected values and variances for adjusted test statistics, preferred adjusted statistics. |
adjusted.chi2 |
Degrees of freedom for adjusted chi^2-distribution. |
power.apx |
Power approximations according to adjusted statistics. |
Termeh Shafie
Shafie, T. (2015). A Multigraph Approach to Social Network Analysis. Journal of Social Structure, 16.
Shafie, T. (2016). Analyzing Local and Global Properties of Multigraphs. The Journal of Mathematical Sociology, 40(4), 239-264.
gof_sim
,get_edge_assignment_probs
,
nsumk
# Generate a set of edge multiplicity sequences (random multigraphs) and # its probability distribution using rsm_model() with degree sequence [4,4,6,6] rsm <- rsm_model(deg.seq = c(4,4,6,6)) mg <- as.matrix(rsm$m.seq) mg.p <- rsm$prob.dists[, 1] # Generate edge assignment probabilities from which the second set of # edge multiplicity sequences is generated from using the iea_model() deg.f <- (4*5)/2 - 1 eap <- get_edge_assignment_probs(m = 10, deg.seq = c(4,4,6,6), model = 'IEAS') # Perform the test test <- gof_stats(m = 10, dof = deg.f, m.seq = mg, prob.mg = mg.p, eap)
# Generate a set of edge multiplicity sequences (random multigraphs) and # its probability distribution using rsm_model() with degree sequence [4,4,6,6] rsm <- rsm_model(deg.seq = c(4,4,6,6)) mg <- as.matrix(rsm$m.seq) mg.p <- rsm$prob.dists[, 1] # Generate edge assignment probabilities from which the second set of # edge multiplicity sequences is generated from using the iea_model() deg.f <- (4*5)/2 - 1 eap <- get_edge_assignment_probs(m = 10, deg.seq = c(4,4,6,6), model = 'IEAS') # Perform the test test <- gof_stats(m = 10, dof = deg.f, m.seq = mg, prob.mg = mg.p, eap)
Goodness of fit tests between an observed edge multiplicity sequence and an expected edge multiplicity sequence according to specified RSM or IEA hypotheses using Pearson (S) and information divergence (A) tests statistics.
gof_test(adj, type, hyp, deg.hyp, dof)
gof_test(adj, type, hyp, deg.hyp, dof)
adj |
matrix of integer representing graph adjacency matrix. |
type |
equals |
hyp |
character string representing the null model, either |
deg.hyp |
vector of integers with the sum equal to to 2 |
dof |
integer giving degrees of freedom of test, r-1 for simple hypotheses and r-n for composite hypotheses where r = n(n+1)/2 |
This function can be used to test whether there is a significant difference between observed multigraph and the expected multiplicity sequence according to a simple or composite IEA hypothesis.
Test statistics of Pearson (S) and of information divergence (A) type are used and test summary based on these two statistics are given as output.
p-values indicate whether we have sufficient evidence to reject the null that there is no significant difference between the observed and expected edge multiplicity sequence.
summary |
Data frame including observed values of test statistics |
Termeh Shafie
Shafie, T. (2015). A Multigraph Approach to Social Network Analysis. Journal of Social Structure, 16.
Shafie, T. (2016). Analyzing Local and Global Properties of Multigraphs. The Journal of Mathematical Sociology, 40(4), 239-264.
#' Shafie, T., Schoch, D. (2021). Multiplexity analysis of networks using multigraph representations. Statistical Methods & Applications 30, 1425–1444.
get_degree_seq
,get_edge_assignment_probs
,
gof_sim
to check the reliability of your test
# Adjacency matrix of observed network (multigraph), n = 4 nodes , m = 15 edges A <- t(matrix(c( 0, 1, 0, 3, 0, 0, 1, 7, 0, 1, 0, 3, 3, 6, 3, 2), nrow= 4, ncol=4)) deg <- get_degree_seq(adj = A, type = 'multigraph') # Testing a simple IEAS hypothesis with above degree sequence gof_test(adj = A, type = 'multigraph', hyp = 'IEAS', deg.hyp = deg, dof = 9) # Testing a composite IEAS hypothesis gof_test(adj = A, type = 'multigraph', hyp = 'IEAS', deg.hyp = 0, dof = 6)
# Adjacency matrix of observed network (multigraph), n = 4 nodes , m = 15 edges A <- t(matrix(c( 0, 1, 0, 3, 0, 0, 1, 7, 0, 1, 0, 3, 3, 6, 3, 2), nrow= 4, ncol=4)) deg <- get_degree_seq(adj = A, type = 'multigraph') # Testing a simple IEAS hypothesis with above degree sequence gof_test(adj = A, type = 'multigraph', hyp = 'IEAS', deg.hyp = deg, dof = 9) # Testing a composite IEAS hypothesis gof_test(adj = A, type = 'multigraph', hyp = 'IEAS', deg.hyp = 0, dof = 6)
Summary of estimated statistics for analyzing global structure of random multigraphs under the independent edge assignment model given observed adjacency matrix.
Two versions of the IEA model are implemented,
both of which can be used to approximate the RSM model:
1. independent edge assignment of stubs (IEAS) given an edge probability sequence
2. independent stub assignment (ISA) given a stub probability sequence
iea_model( adj, type = "multigraph", model = "IEAS", K = 0, apx = FALSE, p.seq = NULL )
iea_model( adj, type = "multigraph", model = "IEAS", K = 0, apx = FALSE, p.seq = NULL )
adj |
matrix of integers representing graph adjacency matrix. |
type |
equals |
model |
character string representing which IEA model: either |
K |
upper limit for the complexity statistics R(k),
k=(0,1,..., |
apx |
logical (default = |
p.seq |
if |
When using the IEAS model:
If the IEAS model is used
as an approximation to the RSM model, then the edge assignment probabilities are estimated
by using the observed degree sequence. Otherwise, the edge assignment probabilities are
estimated by using the observed edge multiplicities (maximum likelihood estimates).
When using the ISA model:
If the ISA model is used
as an approximation to the RSM model, then the stub assignment probabilities are estimated by using
the observed degree sequence over 2m. Otherwise, a sequence containing the stub assignment
probabilities (for example based on prior belief) should be given as argument p.seq
.
nr.multigraphs |
Number of unique multigraphs possible. |
M |
Summary and interval estimates for number of loops ( |
R |
Summary and interval estimates for frequencies of edge multiplicities
|
Termeh Shafie
Shafie, T. (2015). A Multigraph Approach to Social Network Analysis. Journal of Social Structure, 16.
Shafie, T. (2016). Analyzing Local and Global Properties of Multigraphs.
The Journal of Mathematical Sociology, 40(4), 239-264.
Shafie, T., Schoch, D. (2021). Multiplexity analysis of networks using multigraph representations. Statistical Methods & Applications 30, 1425–1444.
get_degree_seq
, get_edge_multip_seq
, iea_model
# Adjacency matrix of a small graph on 3 nodes A <- matrix(c(1, 1, 0, 1, 2, 2, 0, 2, 0), nrow = 3, ncol = 3) # When the IEAS model is used iea_model(adj = A , type = 'graph', model = 'IEAS', K = 0, apx = FALSE) # When the IEAS model is used to approximate the RSM model iea_model(adj = A , type = 'graph', model = 'IEAS', K = 0, apx = TRUE) # When the ISA model is used to approximate the RSM model iea_model(adj = A , type = 'graph', model = 'ISA', K = 0, apx = TRUE) # When the ISA model is used with a pre-specified stub assignment probabilities iea_model(adj = A , type = 'graph', model = 'ISA', K = 0, apx = FALSE, p.seq = c(1/3, 1/3, 1/3))
# Adjacency matrix of a small graph on 3 nodes A <- matrix(c(1, 1, 0, 1, 2, 2, 0, 2, 0), nrow = 3, ncol = 3) # When the IEAS model is used iea_model(adj = A , type = 'graph', model = 'IEAS', K = 0, apx = FALSE) # When the IEAS model is used to approximate the RSM model iea_model(adj = A , type = 'graph', model = 'IEAS', K = 0, apx = TRUE) # When the ISA model is used to approximate the RSM model iea_model(adj = A , type = 'graph', model = 'ISA', K = 0, apx = TRUE) # When the ISA model is used with a pre-specified stub assignment probabilities iea_model(adj = A , type = 'graph', model = 'ISA', K = 0, apx = FALSE, p.seq = c(1/3, 1/3, 1/3))
Finds ordered n-tuples of non-integers summing to k. Only practical for n < 15.
nsumk(n, k)
nsumk(n, k)
n |
a positive integer. |
k |
a positive integer. |
Useful for finding all possible degree sequences for a network with n
nodes
and k
/2 number of edges, or for finding all possible
edge multiplicity sequence n
that sum up to k number of edges.
The number of vertex pair sites (or length of edge multiplicity sequence) for
a multigraph with n nodes is given by .
A matrix with choose(k+n-1,n-1)
rows and n
columns.
Each row comprises non-negative integers summing to k
.
Termeh Shafie
# All possible degree sequences for # a network with 4 nodes and 5 edges D <- nsumk(4, 10) # Remove degree sequences with isolated nodes D <- D[-which(rowSums(D == 0) > 0), ] # All edge multiplicity sequences/multigraph with 2 nodes and 4 edges r <- (2*3)/2 # vertex pair sites (or length of edge multiplicity sequences) mg <- nsumk(r,4) # number of rows give number of possible multigraphs
# All possible degree sequences for # a network with 4 nodes and 5 edges D <- nsumk(4, 10) # Remove degree sequences with isolated nodes D <- D[-which(rowSums(D == 0) > 0), ] # All edge multiplicity sequences/multigraph with 2 nodes and 4 edges r <- (2*3)/2 # vertex pair sites (or length of edge multiplicity sequences) mg <- nsumk(r,4) # number of rows give number of possible multigraphs
Given a specified degree sequence, this function finds all unique multigraphs represented by their edge multiplicity sequences. Different complexity statistics together with their probability distributions and moments are calculated.
rsm_model(deg.seq)
rsm_model(deg.seq)
deg.seq |
vector of integers representing the degree sequence of a multigraph. |
The probability distributions of all unique multigraphs given fixed degree sequence, together with the first two central moments and interval estimates of the statistics M1 = number of loops and M2 = number of multiple edges, under the RSM model are calculated.
For other structural statistics and for large multigraphs,
use the IEA approximation of the RSM model via the function iea_model
m.seq |
possible multigraphs represented by edge multiplicity sequences |
prob.dists |
probability distribution of the multigraphs/edge multiplicity sequences, and the probability distributions of the statistics number of loops, number of multiple edges, and simple graph (logical) for each multigraph |
M |
summary of moments and interval estimates for
number of loops and number of multiple edges ( |
Termeh Shafie
Shafie, T. (2015). A Multigraph Approach to Social Network Analysis. Journal of Social Structure, 16.
Shafie, T. (2016). Analyzing Local and Global Properties of Multigraphs. The Journal of Mathematical Sociology, 40(4), 239-264.
get_degree_seq
, get_edge_multip_seq
, iea_model
# Given a specified degree sequence D <- c(2, 2, 3, 3) # degree sequence mod1 <- rsm_model(D) mod1$m.seq mod1$prob.dists mod1$M # Given an observed graph A <- matrix(c( 0, 1, 2, 1, 2, 1, 2, 1, 2 ), nrow = 3, ncol = 3) D <- get_degree_seq(adj = A, type = "graph") mod2 <- rsm_model(D)
# Given a specified degree sequence D <- c(2, 2, 3, 3) # degree sequence mod1 <- rsm_model(D) mod1$m.seq mod1$prob.dists mod1$M # Given an observed graph A <- matrix(c( 0, 1, 2, 1, 2, 1, 2, 1, 2 ), nrow = 3, ncol = 3) D <- get_degree_seq(adj = A, type = "graph") mod2 <- rsm_model(D)