Thursday 3 December 2015

Selecting distinct random colours in Python

I wanted to make a plot of Pfam domains in proteins, and colour them using different colours. The problem is that I needed the domains to have distinct colours, but didn't know in advance how many domains a particular gene family would have. I found a nice Python script by adewes on github for this (thank you!)
[Note: there's a small typo in this script, 'colors.append(generate_new_color(colors),pastel_factor = 0.9)' should be 'colors.append(generate_new_color(colors,pastel_factor = 0.9))']

An example script to choose 20 distinct random colours and plot them
I made a small example script that chooses 20 distinct random colours, and then makes a plot showing the colours and their hex values. The script on github (mentioned above) gives RGB codes for the colours, and I convert these to hex values for my script. (The script also worked fine when I just plotted the RGB values directly, but I wanted to use the hex values as labels for the colours in the picture produced.)

Visualization of random colors.

Simple plot example with the 20 random colors and its visual representation.
# Based on and

from __future__ import (absolute_import, division, print_function,

import numpy as np
import matplotlib.pyplot as plt
import random
import sys
from matplotlib import colors


# function to create a random colour, copied from

def get_random_color(pastel_factor = 0.5):
    return [(x+pastel_factor)/(1.0+pastel_factor) for x in [random.uniform(0,1.0) for i in [1,2,3]]]


# function to create a random colour, copied from

def color_distance(c1,c2):
    return sum([abs(x[0]-x[1]) for x in zip(c1,c2)])


# function to create a random colour, copied from

def generate_new_color(existing_colors,pastel_factor = 0.5):
    max_distance = None
    best_color = None
    for i in range(0,100):
        color = get_random_color(pastel_factor = pastel_factor)
        if not existing_colors:
            return color
        best_distance = min([color_distance(color,c) for c in existing_colors])
        if not max_distance or best_distance > max_distance:
            max_distance = best_distance
            best_color = color
    return best_color


# choose 20 random colours:
hex_ = []
mycolours = []
names = []
for i in range(1,21):
    mycolour = generate_new_color(mycolours,pastel_factor = 0.9)
    myhex = colors.rgb2hex(mycolour)

n = len(hex_)
ncols = 4
nrows = int(np.ceil(1. * n / ncols))

fig, ax = plt.subplots()

X, Y = fig.get_dpi() * fig.get_size_inches()

# row height
h = Y / (nrows + 1)
# col width
w = X / ncols

for (i, color) in enumerate(hex_):
    name = names[i]
    col = i % ncols
    row = int(i / ncols)
    y = Y - (row * h) - h

    xi_line = w * (col + 0.05)
    xf_line = w * (col + 0.25)
    xi_text = w * (col + 0.3)

    ax.text(xi_text, y, name, fontsize=(h * 0.075),

    # Add extra black line a little bit thicker to make
    # clear colors more visible.
    ax.hlines(y, xi_line, xf_line, color='black', linewidth=(h * 0.7))
    ax.hlines(y + h * 0.1, xi_line, xf_line, color=color, linewidth=(h * 0.6))

ax.set_xlim(0, X)
ax.set_ylim(0, Y)

fig.subplots_adjust(left=0, right=1,
                    top=1, bottom=0,
                    hspace=0, wspace=0)

Here is the picture it makes, called 'random_colour_key.png':

No comments: