On the Algorithmic Tractability of Single Nucleotide Polymorphism (SNP) Analysis and Related Problems
- Art: Diplomarbeit
- Autor: Sebastian Wernicke
- Abgabedatum: September 2003
- Umfang: 134 Seiten
- Dateigröße: 2,3 MB
- Note: 1,0
- Institution / Hochschule: Eberhard Karls Universität Tübingen Deutschland
- ISBN (eBook): 978-3-8324-7481-2
-
ISBN (Paperback) :
978-3-8324-7481-2 P - ISBN (CD) :978-3-8324-7481-2 CD
- Sprache: Englisch
- Prämierung:
- Arbeit zitieren: Wernicke, Sebastian September 2003: On the Algorithmic Tractability of Single Nucleotide Polymorphism (SNP) Analysis and Related Problems, Hamburg: Diplomica Verlag
- Schlagworte: Human Genom, Phylogen, FPT, Graph, Haplotyp
In den Warenkorb
74,00 €
Diplomarbeit von Sebastian Wernicke
Abstract:
This work brings together two areas of science—biology and informatics—that have only recently been connected in the emerging (and vastly growing) research field of Bioinformatics. In order to achieve a common basis for Parts 2 and 3 of this work, Part 1 intends to introduce the computer scientist to the relevant biological background and terminology (Chapter 2), and to familiarize the biologist with the relevant topics from theoretical computer science (Chapter 3).
Chapter 2 first introduces some terminology from the field of genetics, thereby defining SNPs. We then motivate the analysis of SNPs by two applications, i.e. the analysis of evolutionary development and the field of pharmacogenetics. Especially the field of pharmacogenetics is capable of having an enormous impact on medicine and the pharmaceutical industry in the near future by using SNP data to predict the efficacy of medication.
Chapter 3 gives a brief introduction to the field of computational complexity. We will see and motivate how algorithms are analyzed in theoretical computer science. This will lead to the definition of „complexity classes”, introducing the class NP which includes computationally hard problems. Some of the hard problems in the class NP can be solved efficiently using the tool of fixed-parameter tractability, introduced at the end of this chapter.
An important application of SNP data is in the analysis of the evolutionary history of species development (phylogenetic analysis – part two – chapters 4 and 5). As will be made plausible in Chapter 5 using SNP data is—in many ways—superior to previous approaches of phylogenetic analysis. In order to analyze the development of species using SNP data, an underlying model of evolution must be specified. A popular model is the so-called perfect phylogeny, but the construction of this phylogeny is a computationally hard problem when there are inconsistencies (such as read-errors or an imperfect .t to the model of perfect phylogeny) in the underlying data.
Chapter 4 analyzes the problem of „forbidden submatrix removal” which is closely connected to constructing perfect phylogenies—we will see in Chapter 5 that its computational complexity is directly related to that of constructing a perfect phylogeny from data which is partially erroneous. In this chapter, we analyze the algorithmic tractability of „forbidden submatrix removal”, characterizing cases where this problem is NP-complete (being fixed-parameter tractable in general).
Chapter 5 introduces the concept, motivation, and some known results for phylogenetic analysis. We then apply the results from Chapter 4 to perfect phylogeny problems, i.e., the problem of dealing with data-inconsistencies with respect to the underlying evolutionary model of perfect phylogeny. It will be shown that these problems are all fixed-parameter tractable and can be efficiently solved using existing algorithms.
Basically, obtaining SNP data (Part 3 – chapters 6 and 7) requires sequencing two DNA strands and comparing them to each other. The problems lie in the details: Firstly, current techniques only allow sequences of at most 500 base pairs in length to be sequenced as a whole, and secondly, it is—in terms of cost and labor—often only possible to detect the presence of SNP sites rather than being able to tell which of the two DNAs contained which base. Part 3 of this work analyzes the computational complexity of these two problems by relating them to a graph-theoretic problem4 called Graph Bipartization.
Chapter 6 introduces the computationally hard problem of Graph Bipartization, stating some known results and showing the relative hardness of the two Graph Bipartization-problem variants Edge Bipartization and Vertex Bipartization (the latter one of which is proven to be at least as hard as the former one). Following this introduction, we develop and test practical algorithms for Graph Bipartization. These algorithms—although they require a long time even for medium-sized general graphs—prove to be efficient for the Graph Bipartization problems that arise during the acquisition of SNPs, even if these graphs contain a few hundred vertices.
Chapter 7 introduces a formal definition of the computational problems of SNP analysis and proves their close relationship to Graph Bipartization. The last section of this chapter shows that the algorithms developed in Chapter 6 can be used to efficiently solve the presented problems by solving their corresponding Graph Bipartization problem. The work is concluded by Chapter 8, presenting a summary of results and suggestions for future research related to this work.
Table of Contents:
| 1. | Introduction | 1 |
| 1.1 | The Human Genome and SNPs | 1 |
| 1.2 | Overview of this Work | 2 |
| 2. | Biological Background and Motivation | 5 |
| 2.1 | Basic Genetic Terminology | 5 |
| 2.2 | An Introduction to SNPs | 7 |
| 2.3 | Importance and Prospects of SNPMapping | 9 |
| 2.3.1 | SNPs in the Study of Population History | 9 |
| 2.3.2 | SNPs and Pharmacogenetics | 11 |
| 3. | Computer Science Preliminaries and Notation | 15 |
| 3.1 | Notation for Matrices and Graphs | 15 |
| 3.2 | Crash Course in Computational Complexity Theory | 16 |
| 3.2.1 | Machine-Independent Analysis | 16 |
| 3.2.2 | Running Time—Keeping Score | 18 |
| 3.2.3 | Complexity Classes | 22 |
| 3.3 | Fixed-Parameter Tractability (FPT) | 24 |
| 3.3.1 | An Efficient Algorithm for Vertex Cover | 24 |
| 3.3.2 | Formal Definition and Aspects of FPT | 27 |
| 4. | Submatrix Removal Problems | 31 |
| 4.1 | Definitions and Terminology | 31 |
| 4.2 | A Reduction to d-Hitting Set | 35 |
| 4.2.1 | Finding Forbidden Submatrices | 35 |
| 4.2.2 | Approximability and Fixed-Parameter Tractability Results | 37 |
| 4.3 | Hardness Results | 39 |
| 4.3.1 | Overview of Results—Four Theorems | 39 |
| 4.3.2 | Proofs for Theorems 4.13 and 4.14 | 40 |
| 4.3.3 | Proof of Theorem 4.11 | 47 |
| 4.3.4 | Proof of Theorem 4.12 | 49 |
| 4.4 | Discussion and Future Extensions | 53 |
| 5. | Perfect Phylogeny Problems | 55 |
| 5.1 | Phylogenetic Trees | 55 |
| 5.1.1 | Introduction and Motivation | 55 |
| 5.1.2 | Formal Definition | 56 |
| 5.2 | Perfect Phylogeny Problems | 58 |
| 5.3 | Relation to Forbidden Submatrix Problems | 60 |
| 5.4 | Minimum Species Removal | 62 |
| 5.5 | Minimum Character Removal | 66 |
| 6. | Graph Bipartization | 69 |
| 6.1 | Introduction and Known Results | 69 |
| 6.2 | Reducing Edge Bipartization to Vertex Bipartization | 72 |
| 6.3 | A Branch&Bound Approach | 75 |
| 6.3.1 | Initial Heuristics | 76 |
| 6.3.2 | Data Reduction Rules | 79 |
| 6.4 | Implementation and Comparison of the Algorithms | 89 |
| 6.4.1 | Using the Program | 89 |
| 6.4.2 | Some Implementation Details | 90 |
| 6.4.3 | Tests and Test Results | 92 |
| 7. | Using Graph Bipartization in SNP Analysis | 99 |
| 7.1 | Introduction and Overview of Results | 99 |
| 7.2 | SNP Haplotype Assembly | 100 |
| 7.3 | Inferring Haplotypes from Genotypes | 103 |
| 7.3.1 | Minimum Genotype Removal | 109 |
| 7.3.2 | Minimum Site Removal | 112 |
| 7.4 | Testing Branch&Bound on SNP Data | 115 |
| 8. | Conclusion | 119 |
| 8.1 | Summary of Results and Future Extensions | 119 |
| 8.2 | Acknowledegments | 121 |
| List of Figures | 124 | |
| Bibliography | 125 |
In den Warenkorb
74,00 €
Link zur Arbeit:
http://www.diplom.de/ean/9783832474812
Arbeit zitieren:
Wernicke, Sebastian September 2003: On the Algorithmic Tractability of Single Nucleotide Polymorphism (SNP) Analysis and Related Problems, Hamburg: Diplomica Verlag
Schlagworte:
Human Genom, Phylogen, FPT, Graph, Haplotyp



