Wednesday 15 May 2013

BAM and SAM flags

I have only just got my head around BAM flags, using this useful page.

What we have is:
Hexadecimal Decimal Meaning                                          
0x0001                     1                   Read paired                                                       
0x0002                     2                   Read mapped in proper pair                         
0x0003                     3                   Read paired, mapped in proper pair                       
0x0004                     4                   Read unmapped                                                  
0x0005                     5                   Read paired, unmapped                                           
0x0006                     6                   Read mapped in proper pair, unmapped                
0x0007                     7                   Read paired, in proper pair, unmapped                   
0x0008                     8                   Mate unmapped                                     
0x0009                     9                   Read paired, mate unmapped                            
0x000A                    10                 Mate unmapped, mapped in proper pair                  
0x000B                    11                 Read paired, mate unmapped, proper pair               
0x000C                    12                 Read unmapped, mate unmapped                       
0x000D                    13                 Read paired, read & mate unmapped                     
0x000E                    14                 Proper pair, read & mate umapped                           
0x000F                    15                  Proper pair, read & mate unmapped, paired          
0x0010                    16                  Read reverse strand                                 
0x0011                     17                 Read paired, read reverse strand                               
...
0x0040                     64                 First in pair                                           
0x0045                     69                 Read paired, read unmapped, first in pair              
... 
0x0080                    128               Read is second in pair                                             
...                                   
0x0085                     133               Read paired, read unmapped, second in pair            

Note that not all these combinations make sense, eg. 'read unmapped', and 'read mapped in proper pair' cannot both be true at once, so we shouldn't see flag 6.

There are some useful flags listed here.

Flags to find read-pairs mapped with certain orientation
The flags 99, 147, 83, and 163 mean 'mapped in correct orientation and within insert size': 
(Note: 'first in pair' seems to mean the first read mapped of a pair, and does not refer to the order of the two reads on the scaffold. Also the insert size seems to be negative if the 'second read' of a pair is mapped to the left of the 'first read' of the pair.)
[Note: 1-Oct-2013: actually, this presentation by Pierre Lindenbaum says that 'first read' means that the read came from the 1.fastq file, and 'second read' means it came from the 2.fastq file, given to the mapping algorithm.]
83           Read paired, proper pair, first in pair, reverse strand [<--- --->] : has insert size x1, "outties" 
83           Read paired, proper pair, first in pair, reverse strand [---> <---] : has insert size -x1, "innies" 
163         Read paired, proper pair, second in pair, mate reverse strand [<--- --->] : has insert size x2, "outties" 
163         Read paired, proper pair, second in pair, mate reverse strand [---> <---] : has insert size -x2, "innies" 
99           Read paired, proper pair, first in pair, mate reverse strand [---> <---] : has insert size x3, "innies" 
99           Read paired, proper pair, first in pair, mate reverse strand [<--- --->] : has insert size -x3, "outties" 
147         Read paired, proper pair, second in pair, read reverse strand [---> <---] : has insert size x4, "innies" 
147         Read paired, proper pair, second in pair, read reverse strand [<--- --->] : has insert size -x4, "outties"

That is, to find "innies", we could look for pairs with flags 83 or 163 and negative insert size, or for pairs with flags 99 or 147 and positive insert size. 

Likewise, to find "outties", we could look for pairs with flags 83 or 163 and positive insert size, or for pairs with flags 99 or 147 and negative insert size.

From this link, we find that the flags 67, 131,  115 and 179 mean 'mapped within insert size but wrong orientation':
67           Read paired, first in pair, proper pair [both read and mate on plus strand [---> --->]
67           Read paired, first in pair, proper pair [both read and mate on plus strand [---> --->
131         Read paired, second in pair, proper pair [both read and mate on plus strand [---> --->]
131         Read paired, second in pair, proper pair [both read and mate on plus strand [---> --->]
115         Read paired, proper pair, read reverse strand, mate reverse strand, first in pair [<--- <---]
115         Read paired, proper pair, read reverse strand, mate reverse strand, first in pair [<--- <---]
179         Read paired, proper pair, read reverse strand, mate reverse strand, second in pair [<--- <---]
179         Read paired, proper pair, read reverse strand, mate reverse strand, second in pair [<--- <---]

Selecting reads with certain flags using Samtools
You can use Samtools to select reads with certain flags from a BAM file. For example, to identify all reads that are part of read-pairs mapped as "innies" [---> <---], we need to select reads with flags 99 or 147 or 83 or 163 (see above). 

We can do this by typing:
% samtools view -f99 in.bam > out.sam 
% samtools view -f147 in.bam >> out.sam
% samtools view -f83 in.bam >> out.sam
% samtools view -f163 in.bam >> out.sam

[Note: you have to run samtools separately to get each of the flags, you can't run 'samtools view -f99 -f147 -f83 -f163'.] 

You can alternatively use the hexadecimal of 99 and 147 and 83 and 163 (0x63 and 0x93 and 0x53 and 0xA3):
% samtools view -f0x63 in.bam > out.sam
% samtools view -f0x93 in.bam >> out.sam
% samtools view -f0x53 in.bam >> out.sam
% samtools view -f0xA3 in.bam >> out.sam

Further reading
Here is a nice presentation by Pierre Lindenbaum about sam, bam and vcf format.
Some other pages on filtering using flags:
Biostars: how to filter mapped reads using samtools
Biostars: how to extract read-pairs mapped concordantly exactly one time
Deeply undestanding sam tags

1 comment:

Quentin et Thibault said...

Thank you for this blog which helped me to resolved sam flag.
However, after investigating further to find outties and innies, I must warn you that you should correct your post as so:
innies are found with f99 and insertSize +, f147 and insertSize+, f83 and insertSize -, f263 and insertSize -.
while outties are found with f99 and insertSize -, f147 and insertSize +, f83 and insertSize+, f163 and insertSize -