DT40 project
SNP detection method, line specific mutations

What are we doing?

Looking for line specific mutations by comparing the average reference nucleotide frequencies between cell lines

Cell line specific heterozygous mutations have around 50% reference base count in samples in the cell line, and around 100% in other samples. So the average reference base frequency among the cell line samples is around 50%, and around 100% among other samples. But because of the large number of reads in all samples, the distribution are more localized.

To see these mutations I will plot almost all positions on an (average other samples refbase freq, average cell line refbase freq ) plane. Cell line specific mutations are expected to be in the middle right (1,0.5), and are expected to be clearly separated from other positions.

These mutations are identified through a very simple and robust method, so they are good candidates for testing the sensitivity of mutation calling methods.

Finding line specific SNVs

Looking at the 'phylogeny' tree of samples, we can see that PCNA,BRCA2 homoyzgous, BRCA1 homoyzgous cell lines are separated from each other by large distances. It will be the most useful to use these cell lines for line-specific mutations.

PCNA cell line

doc notebook

BRCA2 homozygous cell line

doc notebook

BRCA1 homozygous cell line

doc notebook

WT cell line

doc notebook

Statistics on line speficic SNPs

These statstics try to assess the sensitivity of our method on the test set of SNPs

statistics on testset, sensitivity of the method