Bachelor + Master Publishing
811 Bachelorarbeiten, 533 Masterarbeiten, 10.103 Diplomarbeiten

On the Algorithmic Tractability of Single Nucleotide Polymorphism (SNP) Analysis and Related Problems

On the Algorithmic Tractability of Single Nucleotide Polymorphism (SNP) Analysis and Related Problems
Über dieses Buch
  • Art: Diplomarbeit
  • Autor: Sebastian Wernicke
  • Abgabedatum: September 2003
  • Umfang: 134 Seiten
  • Dateigröße: 2,3 MB
  • Note: 1,0
  • Institution / Hochschule: Eberhard Karls Universität Tübingen Deutschland
  • ISBN (eBook): 978-3-8324-7481-2
  • ISBN (Paperback) :
    978-3-8324-7481-2 P
  • ISBN (CD) :978-3-8324-7481-2 CD
  • Sprache: Englisch
  • Prämierung:
  • Arbeit zitieren: Wernicke, Sebastian September 2003: On the Algorithmic Tractability of Single Nucleotide Polymorphism (SNP) Analysis and Related Problems, Hamburg: Diplomica Verlag
  • Schlagworte: Human Genom, Phylogen, FPT, Graph, Haplotyp

Diplomarbeit von Sebastian Wernicke

Abstract:

This work brings together two areas of science—biology and informatics—that have only recently been connected in the emerging (and vastly growing) research field of Bioinformatics. In order to achieve a common basis for Parts 2 and 3 of this work, Part 1 intends to introduce the computer scientist to the relevant biological background and terminology (Chapter 2), and to familiarize the biologist with the relevant topics from theoretical computer science (Chapter 3).

Chapter 2 first introduces some terminology from the field of genetics, thereby defining SNPs. We then motivate the analysis of SNPs by two applications, i.e. the analysis of evolutionary development and the field of pharmacogenetics. Especially the field of pharmacogenetics is capable of having an enormous impact on medicine and the pharmaceutical industry in the near future by using SNP data to predict the efficacy of medication.

Chapter 3 gives a brief introduction to the field of computational complexity. We will see and motivate how algorithms are analyzed in theoretical computer science. This will lead to the definition of „complexity classes”, introducing the class NP which includes computationally hard problems. Some of the hard problems in the class NP can be solved efficiently using the tool of fixed-parameter tractability, introduced at the end of this chapter.

An important application of SNP data is in the analysis of the evolutionary history of species development (phylogenetic analysis – part two – chapters 4 and 5). As will be made plausible in Chapter 5 using SNP data is—in many ways—superior to previous approaches of phylogenetic analysis. In order to analyze the development of species using SNP data, an underlying model of evolution must be specified. A popular model is the so-called perfect phylogeny, but the construction of this phylogeny is a computationally hard problem when there are inconsistencies (such as read-errors or an imperfect .t to the model of perfect phylogeny) in the underlying data.

Chapter 4 analyzes the problem of „forbidden submatrix removal” which is closely connected to constructing perfect phylogenies—we will see in Chapter 5 that its computational complexity is directly related to that of constructing a perfect phylogeny from data which is partially erroneous. In this chapter, we analyze the algorithmic tractability of „forbidden submatrix removal”, characterizing cases where this problem is NP-complete (being fixed-parameter tractable in general).

Chapter 5 introduces the concept, motivation, and some known results for phylogenetic analysis. We then apply the results from Chapter 4 to perfect phylogeny problems, i.e., the problem of dealing with data-inconsistencies with respect to the underlying evolutionary model of perfect phylogeny. It will be shown that these problems are all fixed-parameter tractable and can be efficiently solved using existing algorithms.

Basically, obtaining SNP data (Part 3 – chapters 6 and 7) requires sequencing two DNA strands and comparing them to each other. The problems lie in the details: Firstly, current techniques only allow sequences of at most 500 base pairs in length to be sequenced as a whole, and secondly, it is—in terms of cost and labor—often only possible to detect the presence of SNP sites rather than being able to tell which of the two DNAs contained which base. Part 3 of this work analyzes the computational complexity of these two problems by relating them to a graph-theoretic problem4 called Graph Bipartization.

Chapter 6 introduces the computationally hard problem of Graph Bipartization, stating some known results and showing the relative hardness of the two Graph Bipartization-problem variants Edge Bipartization and Vertex Bipartization (the latter one of which is proven to be at least as hard as the former one). Following this introduction, we develop and test practical algorithms for Graph Bipartization. These algorithms—although they require a long time even for medium-sized general graphs—prove to be efficient for the Graph Bipartization problems that arise during the acquisition of SNPs, even if these graphs contain a few hundred vertices.

Chapter 7 introduces a formal definition of the computational problems of SNP analysis and proves their close relationship to Graph Bipartization. The last section of this chapter shows that the algorithms developed in Chapter 6 can be used to efficiently solve the presented problems by solving their corresponding Graph Bipartization problem. The work is concluded by Chapter 8, presenting a summary of results and suggestions for future research related to this work.

Table of Contents:

1. Introduction 1
1.1 The Human Genome and SNPs 1
1.2 Overview of this Work 2
2. Biological Background and Motivation 5
2.1 Basic Genetic Terminology 5
2.2 An Introduction to SNPs 7
2.3 Importance and Prospects of SNPMapping 9
2.3.1 SNPs in the Study of Population History 9
2.3.2 SNPs and Pharmacogenetics 11
3. Computer Science Preliminaries and Notation 15
3.1 Notation for Matrices and Graphs 15
3.2 Crash Course in Computational Complexity Theory 16
3.2.1 Machine-Independent Analysis 16
3.2.2 Running Time—Keeping Score 18
3.2.3 Complexity Classes 22
3.3 Fixed-Parameter Tractability (FPT) 24
3.3.1 An Efficient Algorithm for Vertex Cover 24
3.3.2 Formal Definition and Aspects of FPT 27
4. Submatrix Removal Problems 31
4.1 Definitions and Terminology 31
4.2 A Reduction to d-Hitting Set 35
4.2.1 Finding Forbidden Submatrices 35
4.2.2 Approximability and Fixed-Parameter Tractability Results 37
4.3 Hardness Results 39
4.3.1 Overview of Results—Four Theorems 39
4.3.2 Proofs for Theorems 4.13 and 4.14 40
4.3.3 Proof of Theorem 4.11 47
4.3.4 Proof of Theorem 4.12 49
4.4 Discussion and Future Extensions 53
5. Perfect Phylogeny Problems 55
5.1 Phylogenetic Trees 55
5.1.1 Introduction and Motivation 55
5.1.2 Formal Definition 56
5.2 Perfect Phylogeny Problems 58
5.3 Relation to Forbidden Submatrix Problems 60
5.4 Minimum Species Removal 62
5.5 Minimum Character Removal 66
6. Graph Bipartization 69
6.1 Introduction and Known Results 69
6.2 Reducing Edge Bipartization to Vertex Bipartization 72
6.3 A Branch&Bound Approach 75
6.3.1 Initial Heuristics 76
6.3.2 Data Reduction Rules 79
6.4 Implementation and Comparison of the Algorithms 89
6.4.1 Using the Program 89
6.4.2 Some Implementation Details 90
6.4.3 Tests and Test Results 92
7. Using Graph Bipartization in SNP Analysis 99
7.1 Introduction and Overview of Results 99
7.2 SNP Haplotype Assembly 100
7.3 Inferring Haplotypes from Genotypes 103
7.3.1 Minimum Genotype Removal 109
7.3.2 Minimum Site Removal 112
7.4 Testing Branch&Bound on SNP Data 115
8. Conclusion 119
8.1 Summary of Results and Future Extensions 119
8.2 Acknowledegments 121
List of Figures 124
Bibliography 125

Arbeit zitieren:
Wernicke, Sebastian September 2003: On the Algorithmic Tractability of Single Nucleotide Polymorphism (SNP) Analysis and Related Problems, Hamburg: Diplomica Verlag

Schlagworte:
Human Genom, Phylogen, FPT, Graph, Haplotyp

Entdecken Sie mehr zum Thema

diplom.de
Bachelor + Master Publishing

Hermannstal 119 k
22119 Hamburg

Fon: +49 (0) 40 655992-0
Fax: +49 (0) 40 655992-22

Service-Telefon

Rufen Sie uns an:
+49 (0) 40 655992-0

Mo-Fr
09.00-16.00 Uhr

diplom.de in den Medien

Folgen Sie uns bei Twitter & werden Sie diplom.de-Fan bei Facebook!
Schreibtipps unserer Lektoren, Neuigkeiten aus dem Verlagsalltag und das Expertenwissen unserer Autoren als Tweet & Post!
Wir freuen uns auf Sie!

diplom.de BACHELOR + MASTER PUBLISHING

Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Magisterarbeiten, Dissertationen und andere Abschlussarbeiten aus allen Fachbereichen und Hochschulen können Sie bei uns als eBook sofort per Download beziehen oder sich auf CD oder als Buch zusenden lassen. Seit mehr als 15 Jahren ist diplom.de der seriöse, professionelle und erfolgreiche Partner für die Veröffentlichung wissenschaftlicher Abschlussarbeiten.

© Diplomica Verlag GmbH 1996-2011, AG Hamburg HRB 80293 - GF Björn Bedey, USt-IdNr.: DE214910002 - Verkehrsnummer: 12285 - Impressum
Index der Arbeiten - Index der Autoren