Mate-pair reads from 454 sequencers appear in the output SFF files as a single sequence. The two mated reads of a pair are separated by a known 'linker' sequence.
There are different possible linker sequences used, 'FLX' and 'Titanium':
(i) FLX: GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC, a palindrome, equal to its own reverse complement,
(ii) Titainum: TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG
and the reverse-complement CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA.
If you want to analyse the 454 reads (for example, using SffToCA to convert the 454 reads to FRG format), you might need to know the linker sequence. But what if we don't know if the FLX or Titanium linker was used?
I've written a little Python script (using the BioPython SFF parser) to do this, parse_sff_for_454_linkers.py. It takes the first 100,000 reads from the SFF file, and checks whether they have the FLX or Titanium linkers, and prints out how many reads have each type. You should see that the reads have just mainly one type of linker, so this will tell you which type of linker was used, for example:
% python3 ~alc/Documents/git/Python/parse_sff_for_454_linkers.py GW3JGXI01.sff
read_cnt: 100000 , flx_cnt: 0 , ti_cnt: 22173
Here it looks like my SFF file has Titanium linkers in the reads.