I wanted to find all the occurrences of a subsequence in a genome assembly. To do this, I first tried using BLAT but it didn't find them for me (not sure why).
So I instead wrote a little Python function to print out all the positions of a subsequence in a sequence:
# find the positions of a subsequence in a sequence:
def find_positions_of_subsequence(seq, subsequence, seqname):
still_searching = True
start = 0
end = len(seq) - 1
while (still_searching == True):
position = seq.find(subsequence, start, end)
if position == -1:
still_searching = False
actual_position = position + 1
format_string = "Found at %d in %s" % (actual_position, seqname)
start = position + 1
Python saves the day!
Post a Comment