Friday 22 December 2017

Finding all occurrences of a subsequence in a sequence

I wanted to find all the occurrences of a subsequence in a genome assembly. To do this, I first tried using BLAT but it didn't find them for me (not sure why).

So I instead wrote a little Python function to print out all the positions of a subsequence in a sequence:

#====================================================================#

# find the positions of a subsequence in a sequence:

def find_positions_of_subsequence(seq, subsequence, seqname):

    still_searching = True
    start = 0
    end = len(seq) - 1
    while (still_searching == True):
        position = seq.find(subsequence, start, end)
        if position == -1:
            still_searching = False
        else:
            actual_position = position + 1
            format_string = "Found at %d in %s" % (actual_position, seqname)
            print(format_string)
            start = position + 1

    return

#====================================================================#


Python saves the day!

No comments: