Operational Risks: Modeling Analytics
暂无分享,去创建一个
This book serves as an introduction to the statistical analysis of biological sequences using Markovian methods. It is the English version of the text by Robin, Rodolphe, and Schbath (2003). Although this book centers on DNA sequences, most of the lessons learned can be applied to more general sequences. The authors claim that a limited knowledge of mathematics is needed outside of basic probability theory, but this seems to be an optimistic perspective. However, given the material covered, the authors have done a fine job of presenting this material in a manner that does not require knowledge of measure theory. Chapters 1 and 2 briefly cover fundamental material on biology and simple sequence models. Chapter 3 introduces Markov chains of orders 1 and m, and discusses parameter estimation by maximum likelihood. Chapter 4 presents several types of heterogeneous Markov chains, including hidden Markov models, which have become quite popular in computational biology. Chapter 5, the heart of the book, focuses on the statistical properties relating to the occurrence of a general “word” of finite length in a random sequence. These words are commonly called motifs by biologists, and the sequence comprises of letters from the four-letter DNA “alphabet.” The chapter presents models for studying both the frequencies of a motif along a sequence, and their locations. Chapters 6 and 7 apply the material from Chapter 5 to real genomes such as E. coli and H. influenzae. These chapters also give practical advice on competing models and different approximation methods. For statisticians with a little background in biology, this book delivers a very readable presentation on the analysis of DNA sequences to determine whether a motif is of statistical significance due to its overabundance (or underabundance) in terms of frequencies or location. This book is concise but sufficiently detailed. Biologists without a background in mathematical statistics may find the learning curve a little steep but tractable. The authors’ continuous use of practical examples will be greatly appreciate by biologists and statisticians alike. This book is one of a kind, and I recommend it to any statistician interested in learning about DNA sequences and motifs.