Instead of trying to squeeze this explanation into 140 characters, here is a short blog about the issue. I’ve posted the question to the samtools list serve, but no response there.
I am generating consensus sequences from a series of BAM files using the standard samtools command:
samtools mpileup -AIuf my.fasta my.bam | bcftools view -cgI – | vcfutils.pl vcf2fq > my.fq
What happens with the genotype is 50/50 is that instead of calling one of the bases, it instead uses an ambiguity code- R, Y, M, etc.. This is problematic for me, as those polymorphism are interesting, and the downstream work (aligning, testing for selection) cannot handle them.
So:
- Is there a way to force samtools to call an nucleotide rather than an ambiguous base?
- Is there a better way to be generating these consensus sequences from BAM files?