A CNV detection algorithm based on statistical analysis of the aligned reads 


Vol. 16,  No. 5, pp. 661-672, Oct.  2009
10.3745/KIPSTD.2009.16.5.661


PDF
  Abstract

Recently it was found that various genetic structural variations such as CNV(copy number variation) exist in the human genome, and these variations are closely related with disease susceptibility, reaction to treatment, and genetic characteristics. In this paper we propose a new CNV detection algorithm using millions of short DNA sequences generated by giga-sequencing technology. Our method maps the DNA sequences onto the reference sequence, and obtains the occurrence frequency of each read in the reference sequence. And then it detects the statistically significant regions which are longer than 1Kbp as the candidate CNV regions by analyzing the distribution of the occurrence frequency. To select a proper read alignment method, several methods are employed in our algorithm, and the performances are compared. To verify the superiority of our approach, we performed extensive experiments. The result of simulation experiments (using a reference sequence, build 35 of NCBI) revealed that our approach successfully finds all the CNV regions that have various shapes and arbitrary length (small, intermediate, or large size).

  Statistics


  Cite this article

[IEEE Style]

S. K. Hong, D. W. Hong, J. H. Yoon, B. S. Kim, S. H. Park, "A CNV detection algorithm based on statistical analysis of the aligned reads," The KIPS Transactions:PartD, vol. 16, no. 5, pp. 661-672, 2009. DOI: 10.3745/KIPSTD.2009.16.5.661.

[ACM Style]

Sang Kyoon Hong, Dong Wan Hong, Jee Hee Yoon, Baek Sop Kim, and Sang Hyun Park. 2009. A CNV detection algorithm based on statistical analysis of the aligned reads. The KIPS Transactions:PartD, 16, 5, (2009), 661-672. DOI: 10.3745/KIPSTD.2009.16.5.661.