I have only just got my head around BAM flags, using this useful page.
What we have is:
Hexadecimal Decimal Meaning
0x0001 1 Read paired
0x0002 2 Read mapped in proper pair
0x0003 3 Read paired, mapped in proper pair
0x0004 4 Read unmapped
0x0005 5 Read paired, unmapped
0x0006 6 Read mapped in proper pair, unmapped
0x0007 7 Read paired, in proper pair, unmapped
0x0008 8 Mate unmapped
0x0009 9 Read paired, mate unmapped
0x000A 10 Mate unmapped, mapped in proper pair
0x000B 11 Read paired, mate unmapped, proper pair
0x000C 12 Read unmapped, mate unmapped
0x000D 13 Read paired, read & mate unmapped
0x000E 14 Proper pair, read & mate umapped
0x000F 15 Proper pair, read & mate unmapped, paired
0x0010 16 Read reverse strand
0x0011 17 Read paired, read reverse strand
...
0x0040 64 First in pair
0x0045 69 Read paired, read unmapped, first in pair
...
0x0080 128 Read is second in pair
...
0x0085 133 Read paired, read unmapped, second in pair
Note that not all these combinations make sense, eg. 'read unmapped', and 'read mapped in proper pair' cannot both be true at once, so we shouldn't see flag 6.
There are some useful flags listed here.
Flags to find read-pairs mapped with certain orientation
The flags 99, 147, 83, and 163 mean 'mapped in correct orientation and within insert size':
(Note: 'first in pair' seems to mean the first read mapped of a pair, and does not refer to the order of the two reads on the scaffold. Also the insert size seems to be negative if the 'second read' of a pair is mapped to the left of the 'first read' of the pair.)
[Note: 1-Oct-2013: actually, this presentation by Pierre Lindenbaum says that 'first read' means that the read came from the 1.fastq file, and 'second read' means it came from the 2.fastq file, given to the mapping algorithm.]
83 Read paired, proper pair, first in pair, reverse strand [<--- --->] : has insert size x1, "outties"
83 Read paired, proper pair, first in pair, reverse strand [---> <---] : has insert size -x1, "innies"
163 Read paired, proper pair, second in pair, mate reverse strand [<--- --->] : has insert size x2, "outties"
163 Read paired, proper pair, second in pair, mate reverse strand [---> <---] : has insert size -x2, "innies"
99 Read paired, proper pair, first in pair, mate reverse strand [---> <---] : has insert size x3, "innies"
99 Read paired, proper pair, first in pair, mate reverse strand [<--- --->] : has insert size -x3, "outties"
147 Read paired, proper pair, second in pair, read reverse strand [---> <---] : has insert size x4, "innies"
147 Read paired, proper pair, second in pair, read reverse strand [<--- --->] : has insert size -x4, "outties"
That is, to find "innies", we could look for pairs with flags 83 or 163 and negative insert size, or for pairs with flags 99 or 147 and positive insert size.
Likewise, to find "outties", we could look for pairs with flags 83 or 163 and positive insert size, or for pairs with flags 99 or 147 and negative insert size.
From this link, we find that the flags 67, 131, 115 and 179 mean 'mapped within insert size but wrong orientation':
67 Read paired, first in pair, proper pair [both read and mate on plus strand [---> --->]
67 Read paired, first in pair, proper pair [both read and mate on plus strand [---> --->
131 Read paired, second in pair, proper pair [both read and mate on plus strand [---> --->]
131 Read paired, second in pair, proper pair [both read and mate on plus strand [---> --->]
115 Read paired, proper pair, read reverse strand, mate reverse strand, first in pair [<--- <---]
115 Read paired, proper pair, read reverse strand, mate reverse strand, first in pair [<--- <---]
179 Read paired, proper pair, read reverse strand, mate reverse strand, second in pair [<--- <---]
179 Read paired, proper pair, read reverse strand, mate reverse strand, second in pair [<--- <---]
Selecting reads with certain flags using Samtools
You can use Samtools to select reads with certain flags from a BAM file. For example, to identify all reads that are part of read-pairs mapped as "innies" [---> <---], we need to select reads with flags 99 or 147 or 83 or 163 (see above).
We can do this by typing:
% samtools view -f99 in.bam > out.sam
% samtools view -f147 in.bam >> out.sam
% samtools view -f83 in.bam >> out.sam
% samtools view -f163 in.bam >> out.sam
[Note: you have to run samtools separately to get each of the flags, you can't run 'samtools view -f99 -f147 -f83 -f163'.]
You can alternatively use the hexadecimal of 99 and 147 and 83 and 163 (0x63 and 0x93 and 0x53 and 0xA3):
% samtools view -f0x63 in.bam > out.sam
% samtools view -f0x93 in.bam >> out.sam
% samtools view -f0x53 in.bam >> out.sam
% samtools view -f0xA3 in.bam >> out.sam
Further reading
Here is a nice presentation by Pierre Lindenbaum about sam, bam and vcf format.
Some other pages on filtering using flags:
Biostars: how to filter mapped reads using samtools
Biostars: how to extract read-pairs mapped concordantly exactly one time
Deeply undestanding sam tags
1 comment:
Thank you for this blog which helped me to resolved sam flag.
However, after investigating further to find outties and innies, I must warn you that you should correct your post as so:
innies are found with f99 and insertSize +, f147 and insertSize+, f83 and insertSize -, f263 and insertSize -.
while outties are found with f99 and insertSize -, f147 and insertSize +, f83 and insertSize+, f163 and insertSize -
Post a Comment