Before, I wrote some notes on finding communities (clusters) in a graph using R. I've now been finding out how to do this in Python.

Here's some options I explored

- I found that the networkx Python module for graphs has a function to find k-clique communities in a graph. However, these communities seem to be very small ones, it split up my network too much!

- I found a nice discussion of community detection in Python on stackoverflow.

- I read that another option would be igraph but it turns out that installing it isn't very easy, so I decided to try something else first..

- I then found that there is a Python module called community that contains some nice functions for finding communities. It can find the best partition of a network into communities, here is my code for this (below). By the way, I found that the 'community' module seems to expect that the input to its 'best_partition' function will be a connected component (ie. each node can be reached from every other node), so I needed to first split my original graph into connected components, and then find communities in each connected component:

import networkx as nx # following example at https://networkx.github.io/examples.html

sys.path.append("/nfs/users/nfs_a/alc/Documents/git/Python/taynaud-python-louvain-6a3696fdce57/") # this is where I downloaded the 'community' module

import community

# first read the graph into networkx, it is in a file with pairs of chemical compounds that I want to be the nodes, and I want an edge between each pair:

G = nx.Graph() # create a graph in networkxnodeset = set() # set of nodes in our graph already

# read in the input file:

fileObj = open(input_pairs_file, "r")

for line in fileObj:

line = line.rstrip()

temp = line.split("\t")

compound1 = int(temp[0])

compound2 = int(temp[1])

similarity = float(temp[2]))

if compound1 not in nodeset:

G.add_node(compound1)

if compound2 not in nodeset:

G.add_node(compound2)

G.add_edge(compound1,compound2, weight=similarity)

fileObj.close()

# first split the graph into connected components:

subgraphs = list(nx.connected_component_subgraphs(G))

for subgraph in subgraphs:

# find the best partition of the data:

partition = community.best_partition(subgraph) # we can now print out this partition if we want

## No comments:

Post a Comment