*What are communities?*A property common to many types of graphs, including protein-protein interaction graphs, is

*community structure*. A

*community*is often defined as a subset of the vertices in the graph such that connections btween the vertices are denser than connections with the rest of the graph. For example, the graph may consist of one connected component. However, within that connected component, we may have three densely connected subgraphs; these could be said to be three different

*communities*within the graph.

In terms of protein-protein interaction networks, if there are several communities within a connected component (for example, three communities), these could represent three different groups of proteins, where the proteins within one community interact much more with each other than with proteins in the other communities.

By detecting communities within a protein-protein interaction graph, we can detect putative

*protein complexes*, that is, groups of associated proteins that are probably fairly stable over time. In other words, protein complexes can be detected by looking for groups of proteins among which there are many interactions, and where the members of each group have few interactions with other proteins that do not belong to that group.

There are lots of different methods available for detecting communities in a graph, and each method will give slightly different results. That is, the particular method used for detecting communities will decide how you split a connected component into one or more communities.

**Input data file**I have an input file 'testset' like this, that lists the node pairs with edges between them:

D915_00007 D915_00016

D915_00016 D915_00006

D915_00006 D915_00015

D915_00015 D915_00005

D915_00005 D915_00019

D915_00019 D915_00034

D915_00034 D915_00050

D915_00050 D915_00049

D915_00049 D915_00111

D915_00111 D915_00048

*First install the RBGL library*Then in R I first need to install RBGL:

> source("http://bioconductor.org/biocLite.R") # needed for finding connected components

> biocLite("RBGL")

Then I can type:

> source("Rfunctions.R") # this file is included below

> library("RBGL")

> mygraph <- makeproteingraph("testset") # takes in my input file

> findcommunities(mygraph, 1) # find communities of size at least 1

This gives me output:

[1] "Community 0 : D915_00049 D915_00050 D915_00111 D915_00048"

[1] "Community 1 : D915_00006 D915_00007 D915_00015 D915_00016"

[1] "There were 2 communities in the input graph"

Note that if you run

`findcommunities()`again and again on the same input graph, it might find slightly different sets of communities each time. This is because it uses a random number generator in the method that it uses for identifying communities, and the random number used will be different each time you run the

`findcommunities()`function, which means that you will get slightly different answers each time. The answers should be very similar, however, but you might see a small difference, for example, a large community found the first time you run it might be split into two smaller communities the second time you run it.

*My file Rfunctions.R with the functions used above*Below I give code for my file Rfunctions.R that contains one function

`findcommunities()`that identifies communities within a graph (or subgraph of a graph). The function

`findcommunities()`uses the function

`spinglass.community()`from the "igraph" library to identify communities in a graph or subgraph. As its arguments (inputs), the

`findcommunities()`function takes the graph/subgraph that we want to find communities in, and the minimum number of vertices that a community must have to be reported.

# Function to make a graph based on protein-protein interaction data in an input file

makeproteingraph <- function(myfile)

{

library("graph")

mytable <- read.table(file(myfile)) # Store the data in a data frame

proteins1 <- mytable$V1

proteins2 <- mytable$V2

protnames <- c(levels(proteins1),levels(proteins2))

# Find out how many pairs of proteins there are

numpairs <- length(proteins1)

# Find the unique protein names:

uniquenames <- unique(protnames)

# Make a graph for these proteins with no edges:

mygraph <- new("graphNEL", nodes = uniquenames)

# Add edges to the graph:

# See http://rss.acs.unt.edu/Rdoc/library/graph/doc/graph.pdf for more examples

weights <- rep(1,numpairs)

mygraph2 <- addEdge(as.vector(proteins1),as.vector(proteins2),mygraph,weights)

return(mygraph2)

}

# Function to find network communities in a graph

findcommunities <- function(mygraph,minsize)

{

# Load up the igraph library:

library("igraph")

# Set the counter for the number of communities:

cnt <- 0

# First find the connected components in the graph:

myconnectedcomponents <- connectedComp(mygraph)

# For each connected component, find the communities within that connected component:

numconnectedcomponents <- length(myconnectedcomponents)

for (i in 1:numconnectedcomponents)

{

component <- myconnectedcomponents[[i]]

# Find the number of nodes in this connected component:

numnodes <- length(component)

if (numnodes > 1) # We can only find communities if there is more than one node

{

mysubgraph <- subGraph(component, mygraph)

# Find the communities within this connected component:

# print(component)

myvector <- vector()

mylist <- findcommunities2(mysubgraph,cnt,"FALSE",myvector,minsize)

cnt <- mylist[[1]]

myvector <- mylist[[2]]

}

}

print(paste("There were",cnt,"communities in the input graph"))

}

# Function to find network communities in a connected component of a graph

findcommunities2 <- function(mygraph,cnt,plot,myvector,minsize)

{

# Find the number of nodes in the input graph

nodes <- nodes(mygraph)

numnodes <- length(nodes)

# Record the vertex number for each vertex name

myvector <- vector()

for (i in 1:numnodes)

{

node <- nodes[i] # "node" is the vertex name, i is the vertex number

myvector[`node`] <- i # Add named element to myvector

}

# Create a graph in the "igraph" library format, with numnodes nodes:

newgraph <- graph_from_graphnel(mygraph)

# Now find communities in the graph:

communities <- spinglass.community(newgraph)

# Find how many communities there are:

sizecommunities <- communities$csize

numcommunities <- length(sizecommunities)

# Find which vertices belong to which communities:

membership <- communities$membership

# Get the names of vertices in the graph "newgraph":

vertexnames <- get.vertex.attribute(newgraph, "name")

# Print out the vertices belonging to each community:

for (i in 1:numcommunities)

{

if (plot == TRUE) { cnt <- cnt + 1 }

nummembers <- 0

printout <- paste("Community",cnt,":")

for (j in 1:length(membership))

{

community <- membership[j]

community <- community + 1

if (community == i) # If vertex j belongs to the ith community

{

vertexname <- vertexnames[j]

if (plot == FALSE)

{

nummembers <- nummembers + 1

# Print out the vertices belonging to the community

printout <- paste(printout,vertexname)

}

else

{

# Colour in the vertices belonging to the community

myvector[`vertexname`] <- cnt

}

}

}

if (plot == FALSE && nummembers >= minsize)

{

cnt <- cnt + 1

print(printout)

}

}

return(list(cnt,myvector))

}

## No comments:

Post a Comment