Title: | Hierarchical Graph Clustering for a Collection of Networks |
---|---|
Description: | Graph clustering using an agglomerative algorithm to maximize the integrated classification likelihood criterion and a mixture of stochastic block models. The method is described in the article "Model-based clustering of multiple networks with a hierarchical algorithm" by T. Rebafka (2022) <arXiv:2211.02314>. |
Authors: | Tabea Rebafka [aut, cre] |
Maintainer: | Tabea Rebafka <[email protected]> |
License: | GPL-2 |
Version: | 1.3 |
Built: | 2025-02-19 03:43:40 UTC |
Source: | https://github.com/cran/graphclust |
ARI to compare two clusterings or to compare two entire lists of clusterings
ARI(x, y)
ARI(x, y)
x |
vector with clustering, matrix with hot-one-encoding of the clustering, or a list of clusterings (in vector or matrix form) |
y |
as x |
ARI (scalar of vector)
x <- c(1,1,2,2,3,3) y <- c(1,1,1,2,2,2) ARI(x,y) x <- matrix(0, 3, 6) x[1,1] <- x[1,2] <- x[2,3] <- x[2,4] <- x[3,5] <- x[3,6] <- 1 y <- matrix(0, 2, 6) y[1,1] <- y[1,2] <- y[1,3] <- y[2,4] <- y[2,5] <- y[2,6] <- 1 ARI(x,y) X <- list(c(1,1,2,2,3,3), rep(1,10)) Y <- list(c(1,1,1,2,2,2), rep(1:2,each=5)) ARI(X,Y)
x <- c(1,1,2,2,3,3) y <- c(1,1,1,2,2,2) ARI(x,y) x <- matrix(0, 3, 6) x[1,1] <- x[1,2] <- x[2,3] <- x[2,4] <- x[3,5] <- x[3,6] <- 1 y <- matrix(0, 2, 6) y[1,1] <- y[1,2] <- y[1,3] <- y[2,4] <- y[2,5] <- y[2,6] <- 1 ARI(x,y) X <- list(c(1,1,2,2,3,3), rep(1,10)) Y <- list(c(1,1,1,2,2,2), rep(1:2,each=5)) ARI(X,Y)
Sort stochastic block model parameter in a unique way using its graphon
degreeSort(thetaInit, outTheta = TRUE, outPerm = FALSE)
degreeSort(thetaInit, outTheta = TRUE, outPerm = FALSE)
thetaInit |
stochastic block model parameter to be sorted |
outTheta |
if TRUE returns the sorted stochastic block model parameter |
outPerm |
if TRUE returns the permutation of the blocks of the stochastic block model to provide the sorted stochastic block model parameter |
according to the values of outTheta and outPerm the function returns the sorted stochastic block model parameter or the associated permutation of the blocks of the stochastic block model or a list with both of them
theta1 <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) degreeSort(theta1) theta2 <- list(pi=c(.5,.5), gamma=matrix(4:1/8,2,2)) degreeSort(theta2)
theta1 <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) degreeSort(theta1) theta2 <- list(pi=c(.5,.5), gamma=matrix(4:1/8,2,2)) degreeSort(theta2)
fitSBMcollection() is a subversion of graphClustering() where no stopping criterion is applied. So all networks are ultimately merged to a single cluster and considered as i.i.d realisations of a single stochastic block model.
fitSBMcollection( allAdj, hyperParam = list(alpha = 0.5, eta = 0.5, zeta = 0.5, lambda = 0.5), nbCores = 1 )
fitSBMcollection( allAdj, hyperParam = list(alpha = 0.5, eta = 0.5, zeta = 0.5, lambda = 0.5), nbCores = 1 )
allAdj |
list of adjacency matrices |
hyperParam |
hyperparameters of prior distributions |
nbCores |
number of cores for parallelization |
list with the following fields: $nodeClusterings is a list with the node labels for each networks, $theta contains the estimated SBM parameter, $ICL is the value of the ICL criterion of the final clustering
theta <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) obs <- rCollectSBM(rep(10,4), theta)$listGraphs res <- fitSBMcollection(obs, nbCores=1)
theta <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) obs <- rCollectSBM(rep(10,4), theta)$listGraphs res <- fitSBMcollection(obs, nbCores=1)
Applies the variational EM-algorithm implemented in the package blockmodels to every network.
fitSimpleSBM( allAdj, directed = TRUE, nbSBMBlocks = Inf, nbCores = 1, outCountStat = TRUE )
fitSimpleSBM( allAdj, directed = TRUE, nbSBMBlocks = Inf, nbCores = 1, outCountStat = TRUE )
allAdj |
list of adjacency matrices |
directed |
Networks are directed (TRUE by default) or undirected (FALSE). |
nbSBMBlocks |
upper bound for the number of blocks in the SBMs of the mixture components. Default is Inf |
nbCores |
number of cores for parallelization. |
outCountStat |
If TRUE (default), the output is a list of count statistics for every network. If FALSE, the output is a list of parameters of the stochastic block models fitted to every network. |
list of count statistics for every network or list of parameters of the stochastic block models fitted to every network.
theta <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) obs <- rCollectSBM(rep(10,4), theta)$listGraphs res <- fitSimpleSBM(obs, outCountStat=FALSE, nbCores=2)
theta <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) obs <- rCollectSBM(rep(10,4), theta)$listGraphs res <- fitSimpleSBM(obs, outCountStat=FALSE, nbCores=2)
Applies the hierarchical graph clustering algorithm to a collection of networks and fits a finite mixture model of stochastic block models to the data
graphClustering( allAdj, hyperParam = list(alpha = 0.5, eta = 0.5, zeta = 0.5, lambda = 0.5), returnInitial = FALSE, nbClust = NULL, nbSBMBlocks = Inf, initCountStat = NULL, initDeltaICL = NULL, nbCores = 1 )
graphClustering( allAdj, hyperParam = list(alpha = 0.5, eta = 0.5, zeta = 0.5, lambda = 0.5), returnInitial = FALSE, nbClust = NULL, nbSBMBlocks = Inf, initCountStat = NULL, initDeltaICL = NULL, nbCores = 1 )
allAdj |
list of adjacency matrices |
hyperParam |
hyperparameters of prior distributions |
returnInitial |
Boolean. Return SBM parameters from initialization or not. Default is FALSE. |
nbClust |
desired number of clusters. Default NULL, which means that the number of clusters is chosen automatically via the ICL criterion |
nbSBMBlocks |
upper bound for the number of blocks in the SBMs of the mixture components. Default is Inf |
initCountStat |
initial count statistics may be provided to the method. Default is NULL. |
initDeltaICL |
initial deltaICL-matrix may be provided to the method. Default is NULL. |
nbCores |
number of cores for parallelization |
list with the following fields: $graphGroups is the graph clustering, $nodeClusterings is a list with the node labels for each networks, $thetaMixSBM contains the estimated parameter of the mixture of SBMs, $ICL is the value of the ICL criterion of the final clustering, $histGraphGroups traces the history of the cluster aggregations, $histDeltaICL traces the evolution of the deltaICL value, $histFusedClusters traces the history of the aggregated cluster numbers
theta <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) obs <- rCollectSBM(rep(10,4), theta)$listGraphs res <- graphClustering(obs, nbCores=1)
theta <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) obs <- rCollectSBM(rep(10,4), theta)$listGraphs res <- graphClustering(obs, nbCores=1)
Graph clustering method based on graph moments by Mukherjee et al. (2017)
graphMomentsClustering(Networks, nbMoments = 3, nbClusters)
graphMomentsClustering(Networks, nbMoments = 3, nbClusters)
Networks |
list of adjacency matrices |
nbMoments |
order of the largest graph moments to be considered |
nbClusters |
desired number of clusters |
vector with the clustering of the networks
param <- vector('list', 3) param[[1]] <- list(prop = 1/3, # component 1 : alpha > beta alpha = .04, beta = .02, deltaIn = 100, deltaOut = 100, R = 500 ) param[[2]] <- list(prop = 1/3, # component 2 : just permute alpha and beta ; alpha = .01, beta = .02, deltaIn = 100, deltaOut = .1, R = 1000 ) param[[3]] <- list(prop = 1/3, # component 3 : alpha=beta alpha = .015, beta = .015, deltaIn = .1, deltaOut = .1, R = 1000 ) obs <- sampleDPAMixture(M=20, param) res <- graphMomentsClustering(obs$listAdj, 3, 3) table(res, obs$graphGroups)
param <- vector('list', 3) param[[1]] <- list(prop = 1/3, # component 1 : alpha > beta alpha = .04, beta = .02, deltaIn = 100, deltaOut = 100, R = 500 ) param[[2]] <- list(prop = 1/3, # component 2 : just permute alpha and beta ; alpha = .01, beta = .02, deltaIn = 100, deltaOut = .1, R = 1000 ) param[[3]] <- list(prop = 1/3, # component 3 : alpha=beta alpha = .015, beta = .015, deltaIn = .1, deltaOut = .1, R = 1000 ) obs <- sampleDPAMixture(M=20, param) res <- graphMomentsClustering(obs$listAdj, 3, 3) table(res, obs$graphGroups)
(squared) L2-norm of the graphons associated with two stochastic block model parameters
graphonL2norm(theta1, theta2)
graphonL2norm(theta1, theta2)
theta1 |
a stochastic block model parameter |
theta2 |
a stochastic block model parameter |
(squared) L2-norm of the graphons associated with two stochastic block model parameters
theta1 <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) theta2 <- list(pi=c(.5,.5), gamma=matrix(4:1/8,2,2)) graphonL2norm(theta1, theta2)
theta1 <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) theta2 <- list(pi=c(.5,.5), gamma=matrix(4:1/8,2,2)) graphonL2norm(theta1, theta2)
Graph clustering using the pairwise graphon distances and spectral clustering
graphonSpectralClustering(allAdj, nbClusters, sig = 0.1, nbCores = 1)
graphonSpectralClustering(allAdj, nbClusters, sig = 0.1, nbCores = 1)
allAdj |
list of adjacency matrices |
nbClusters |
number of clusters to be found |
sig |
parameter for Gaussian kernel used for the similarity matrix |
nbCores |
number of cores for parallelization. |
list with the obtained graph clusteirng ($clust) and the matrix with the pairwise graphon distances between all pairs of networks
theta <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) obs <- rCollectSBM(rep(10,4), theta)$listGraphs res <- graphonSpectralClustering(obs, 2, nbCores=1)
theta <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) obs <- rCollectSBM(rep(10,4), theta)$listGraphs res <- graphonSpectralClustering(obs, 2, nbCores=1)
Plot the metagraph of the parameter of the stochastic block model associated with one of the estimated graph clusters
metagraph(nb, res, title = NULL, edge.width.cst = 10)
metagraph(nb, res, title = NULL, edge.width.cst = 10)
nb |
number of the cluster we are interested in |
res |
output of graphClustering() |
title |
title of the figure |
edge.width.cst |
width of edges in the metagraph |
none
theta <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) obs <- rCollectSBM(rep(10,4), theta)$listGraphs res <- graphClustering(obs, nbCores=2) metagraph(1, res)
theta <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) obs <- rCollectSBM(rep(10,4), theta)$listGraphs res <- graphClustering(obs, nbCores=2) metagraph(1, res)
Computation of graph moments of a network
moments(A, k = 3)
moments(A, k = 3)
A |
adjacency matrix |
k |
order of the largest graph moments to be considered |
vector with the first k (normalized) graph moments of the network A
param <- list(R = 500, alpha = .04, beta = .02, deltaIn = 100, deltaOut = 100) A <- sampleDPA(param) moments(A)
param <- list(R = 500, alpha = .04, beta = .02, deltaIn = 100, deltaOut = 100) A <- sampleDPA(param) moments(A)
Permute block labels of a stochastic block model parameter
permutParam(theta, permut)
permutParam(theta, permut)
theta |
a SBM parameter with say K blocks |
permut |
a permutation of the block labels 1,2,...,K |
stochastic block model parameter with permuted block labels
theta1 <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) theta2 <- list(pi=c(.5,.5), gamma=matrix(4:1/8,2,2)) permutParam(theta1, 2:1) permutParam(theta2, 2:1)
theta1 <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) theta2 <- list(pi=c(.5,.5), gamma=matrix(4:1/8,2,2)) permutParam(theta1, 2:1) permutParam(theta2, 2:1)
Plot dendrogram to visualize the clustering obtained by the hierarchical clustering algorithm
plotDendrogram(res, labels = NULL, labcex = 0.5)
plotDendrogram(res, labels = NULL, labcex = 0.5)
res |
output of graphClustering() |
labels |
network labels, default (NULL) network number. |
labcex |
size of labels in the figure |
dendrogram
theta <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) obs <- rCollectSBM(rep(10,4), theta)$listGraphs res <- graphClustering(obs, nbCores=2) plotDendrogram(res)
theta <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) obs <- rCollectSBM(rep(10,4), theta)$listGraphs res <- graphClustering(obs, nbCores=2) plotDendrogram(res)
Simulate a sample of networks of a stochastic block model
rCollectSBM(vec_n, theta, directed = TRUE)
rCollectSBM(vec_n, theta, directed = TRUE)
vec_n |
vector with number of vertices |
theta |
stochastic block model parameter with latent group probabilities $pi and connectivy parameters $gamma |
directed |
directed networks (TRUE by default) or undirected (FALSE) |
list with a list of adjacency matrices ($listGraphs) and a list of node labels ($listLatentZ)
theta1 <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) rCollectSBM(2:4, theta1)
theta1 <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) rCollectSBM(2:4, theta1)
Simulate a collection of networks of a mixture of stochastic block models
rMixSBM(vec_n, thetaMixSBM, directed = TRUE)
rMixSBM(vec_n, thetaMixSBM, directed = TRUE)
vec_n |
vector with number of vertices |
thetaMixSBM |
K-list for a mixture with K components. Each field is a list with the stochastic block model parameter ($pi and $gamma) and a cluster proportion ($prop) |
directed |
directed networks (TRUE by default) or undirected (FALSE) |
list with a list of adjacency matrices ($listGraphs), a list of node labels ($listLatentZ) and a vector with the graph clustering ($label)
theta1 <- list(prop=.2, pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) theta2 <- list(prop=.8, pi=c(.5,.5), gamma=matrix(4:1/8,2,2)) thetaMixSBM <- list(NULL) thetaMixSBM[[1]] <- theta1 thetaMixSBM[[2]] <- theta2 obs <- rMixSBM(vec_n=rep(10,3), thetaMixSBM)
theta1 <- list(prop=.2, pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) theta2 <- list(prop=.8, pi=c(.5,.5), gamma=matrix(4:1/8,2,2)) thetaMixSBM <- list(NULL) thetaMixSBM[[1]] <- theta1 thetaMixSBM[[2]] <- theta2 obs <- rMixSBM(vec_n=rep(10,3), thetaMixSBM)
Simulate a network of a stochastic block model
rsbm(n, theta, directed = TRUE)
rsbm(n, theta, directed = TRUE)
n |
number of vertices |
theta |
stochastic block model parameter with latent group probabilities $pi and connectivy parameters $gamma |
directed |
directed network (TRUE by default) or undirected (FALSE) |
list with simulated adjacency matrix ($adj) and node labels ($Z)
theta1 <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) rsbm(10, theta1)
theta1 <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) rsbm(10, theta1)
generation of a network of the directed preferential attachment (DPA) model
sampleDPA(param)
sampleDPA(param)
param |
list with the following elements: $R (= number of iterations), $alpha, $beta , $deltaIn, $deltaOut (parameters of the DPA model) |
adjacency matrix of generated network
param <- list(R = 500, alpha = .04, beta = .02, deltaIn = 100, deltaOut = 100) A <- sampleDPA(param) A
param <- list(R = 500, alpha = .04, beta = .02, deltaIn = 100, deltaOut = 100) A <- sampleDPA(param) A
Generation of a mixture of directed preferential attachment (DPA) models
sampleDPAMixture(M, param)
sampleDPAMixture(M, param)
M |
number of desired networks |
param |
list of list of parameters of the DPA models. Each element of param is a list with the following elements: $prop (weight of the mixture component), $R (= number of iterations), $alpha, $beta , $deltaIn, $deltaOut (parameters of the DPA model) |
list of 2 lists : the first ($listAdj) is a list of M adjacency matrices, the second a list ($graphGroups) contains the true cluster labels
param <- vector('list', 3) param[[1]] <- list(prop = 1/3, # component 1 : alpha > beta alpha = .04, beta = .02, deltaIn = 100, deltaOut = 100, R = 500 ) param[[2]] <- list(prop = 1/3, # component 2 : just permute alpha and beta ; alpha = .01, beta = .02, deltaIn = 100, deltaOut = .1, R = 1000 ) param[[3]] <- list(prop = 1/3, # component 3 : alpha=beta alpha = .015, beta = .015, deltaIn = .1, deltaOut = .1, R = 1000 ) obs <- sampleDPAMixture(M=20, param)
param <- vector('list', 3) param[[1]] <- list(prop = 1/3, # component 1 : alpha > beta alpha = .04, beta = .02, deltaIn = 100, deltaOut = 100, R = 500 ) param[[2]] <- list(prop = 1/3, # component 2 : just permute alpha and beta ; alpha = .01, beta = .02, deltaIn = 100, deltaOut = .1, R = 1000 ) param[[3]] <- list(prop = 1/3, # component 3 : alpha=beta alpha = .015, beta = .015, deltaIn = .1, deltaOut = .1, R = 1000 ) obs <- sampleDPAMixture(M=20, param)
the norm is the minimal graphon distance between two stochastic block model parameters obtained with the best permutations of the parameters
sbmNorm(theta1, theta2)
sbmNorm(theta1, theta2)
theta1 |
a stochastic block model parameter |
theta2 |
a stochastic block model parameter |
(squared) norm between two stochastic block models
theta1 <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) theta2 <- list(pi=c(.5,.5), gamma=matrix(4:1/8,2,2)) theta3 <- list(pi=c(.5,.5), gamma=matrix(1:4/4,2,2)) sbmNorm(theta1, theta2) sbmNorm(theta1, theta3) sbmNorm(theta2, theta3)
theta1 <- list(pi=c(.5,.5), gamma=matrix((1:4)/8,2,2)) theta2 <- list(pi=c(.5,.5), gamma=matrix(4:1/8,2,2)) theta3 <- list(pi=c(.5,.5), gamma=matrix(1:4/4,2,2)) sbmNorm(theta1, theta2) sbmNorm(theta1, theta3) sbmNorm(theta2, theta3)