Text
A Tour of Discrete Probability Guided by a Problem in Genomics
The classic binomial, geometric, negative binomial, and hypergeometric distributions differ by their mathematical form and the nature of underlying random experiments. In this article, we discuss a unifying framework for these distributions that comes from an unlike source: computational genomics. One important problem in genomic is to find all protein-coding genes. A mathematical/computational solution to this problem begins with identifying open reading frames (OFRs) belongs to the non-coding region of the genome modeled as a randomly and independently assembled segment of DNA. Rejection of this hypothesis with a high degree of certainty increases the likelihood that the ORF in question in an actual gene. To test the above hypothesis, one has to compute the distribution of the number and length of ORFs in a long sequence of non-coding DNA. This computation leads naturally to the above-mentioned discrete probability distributions or their analogs and reveals various relationships between them through conditioning and compounding. Computation of the ORF length is based on an important, yet rarely visited, negative hypergeometric distribution. Although conceptually related to the binomial and negative binomial distributions, it is structurally similar to the hypergeometric distribution.
Barcode | Tipe Koleksi | Nomor Panggil | Lokasi | Status | |
---|---|---|---|---|---|
art136236 | null | Artikel | Gdg9-Lt3 | Tersedia namun tidak untuk dipinjamkan - No Loan |
Tidak tersedia versi lain