High Performance Computing for Genomics

Loading...
Thumbnail Image

Identifiers

Publication date

Reading date

2023-05-25

Authors

Perez-Wohlfeil, Esteban

Collaborators

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

UMA Editorial

Metrics

Google Scholar

Share

Research Projects

Organizational Units

Journal Issue

Department/Institute

Abstract

With the thrive of data acquisition methods, computerized research has become increasingly more common. However, in order to match the huge data-processing demands, the design of new algorithms along with their optimization on specific hardware platforms has become a necessity. This scenario is particularly true in the case of comparative genomics, where massive DNA sequences are being published daily, and their processing presents many computational bottlenecks. The comparison of DNA sequences is a central problem with direct impact on human health, and therefore its computational acceleration is of wide interest. However, due to its arbitrary nature, the parallel acceleration of sequence comparison poses computational challenges such as including heterogeneous granularity, unpredictable load, etc. In order to achieve high performance, algorithms must be tailored to the underlying hardware model, which may represent different computational approaches and often require even the redesign of the algorithms themselves. This thesis addresses a computational tour of the sequence comparison problem by making use of hardware and algorithmic optimizations in single core machines, shared memory systems and Graphic Processing Units. The first contribution features a formal framework that enables unlimited search space size in strictly linear time. The second contribution describes a parallelization using shared memory machines that achieves high sensitivity in metagenomic sequences. The third contribution describes the overcoming of the data parallelism model in GPUs for the irregular pairwise sequence comparison. Lastly, the use of Machine-Learning-aided schedulers is explored to improve resource allocation and throughput in supercomputers dedicated to sequence comparison.

Description

The results are twofold: from a computational perspective, new High Performance Computing methodologies and parallel mechanisms are proposed for different hardware architectures; and from a comparative genomics perspective, the complexity of sequence comparison has been lowered while providing exhaustive and heuristic approaches that can be run on both commodity and specialized hardware.

Bibliographic citation

Collections

Endorsement

Review

Supplemented By

Referenced by

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 Internacional