Cell line specific heterozygous mutations have around 50% reference base count in samples in the cell line, and around 100% in other samples. So the average reference base frequency among the cell line samples is around 50%, and around 100% among other samples. But because of the large number of reads in all samples, the distribution are more localized.
To see these mutations I will plot almost all positions on an (average other samples refbase freq, average cell line refbase freq ) plane. Cell line specific mutations are expected to be in the middle right (1,0.5), and are expected to be clearly separated from other positions.
These mutations are identified through a very simple and robust method, so they are good candidates for testing the sensitivity of mutation calling methods.
Looking at the 'phylogeny' tree of samples, we can see that PCNA,BRCA2 homoyzgous, BRCA1 homoyzgous cell lines are separated from each other by large distances. It will be the most useful to use these cell lines for line-specific mutations.
These statstics try to assess the sensitivity of our method on the test set of SNPs