Monday, 13 July 2015

Python list comprehensions

List comprehensions are a really nice feature of Python. For example, say you have a list 'leaves' of objects, and each object in the list has an attribute 'name' (a string). To print out a comma-separated string of all the names of all the objects in the list, we just type:

# first get all the names of all the 'leaf' objects in the list 'leaves":
leaf_names = [leaf.name for leaf in leaves]
# now print out a comma-separated string of all the names of all the leaves:
print "The names are %s" % (','.join(leaf_names))


Other examples:

You have a dictionary 'mydict' of lists, get those lists and flatten them into one big list:
The values (not keys) of 'mydict' are lists.

biglist = [x for sublist in mydict.values() for x in sublist]
# mydict.values() gives a list of lists, this flattens it into one big list

You have a dictionary 'fasta_seq' whose keys are protein names, and values are protein sequences, and want the lengths of the proteins:
This gives a list with the lengths of the proteins:

lens = [len(fasta_seq[x]) for x in list(fasta_seq.keys())]

You have a list of words, called 'temp', and want to find the words in that list that do not contain 'status:', 'UniProt:', 'protein_id:', or 'locus:'
matching = [x for x in temp if 'status:' not in x and 'UniProt:' not in x and 'protein_id:' not in x and 'locus:' not in x] 

You have a list 'temp' of words, and only want those words that also appear in a second list 'temp2':
seqmatch_words = [x for x in temp if x in temp2]
You have a list 'stack' of tuples, and want the first member of each tuple:
Your list is called 'stack'. This puts the first members of the tuples in a list of their own:
stack_members = [x[0] for x in stack]

You want to create a list containing 9 lists initialised to zero:
Matrix = [[0 for x in xrange(9)] for x in xrange(9)]

You want to create a dictionary with the length of each item in a list of strings:
genes = ['gene1', 'gene2', 'gene32']
mydict = {gene: len(gene) for gene in genes}

You want to create a dictionary, taking the keys from one list and the values from a second list:
genes = ['gene1', 'gene2', 'gene32']
values = [32, 22, 66]
mydict = {genes[i]: values[i] for i in range(len(genes))}

You want to create a set, taking the items in a list of strings that start with 's':
mylist = ['string1', 'hello', 'sausage']
myset = {x for x in mylist if x.startswith('s')}

You have a list of sets, and a dictionary with values for the items in the sets, and want to create a list of sublists, where each sublist have the dictionary values for the objects in a set:
mylist = [{'seq1', 'seq2'}, {'seq3', 'seq4', 'seq5'}, {'seq6'}]
mydict = {'seq1': 5, 'seq2': '4', 'seq3': '4', 'seq4': '3', 'seq5': 2, 'seq6': 2}
mybiglist = [[mydict[x] for x in sublist] for sublist in mylist]

You have two lists of equal lengths and want to merge them into a single list of tuples:
list1 = ['A', 'B', 'C', 'D']
list2 = ['a', 'b', 'c', 'd']
[(list1[i], list2[i]) for i in range(len(list1))]
[('A', 'a'), ('B', 'b'), ('C', 'c'), ('D', 'd')]


1 comment:

Ben Fulton said...

You could even leave leaf_names as a comprehension by surrounding it with parentheses (). (Using square brackets [] tells it to immediately convert to a list.) It will save a little bit of processing and memory by not actually doing the iteration until it's actually required by the join.