Compressed Data Structures for Scalable Genomic Search (2026-2028)
Abstract
This project aims to develop novel compressed indexes and querying algorithms for efficiently processing genomic sequence data at massive scales. The project expects to improve data representations for supporting membership, pattern matching, and ranking tasks over biological sequences used in the life sciences and medicine. Expected outcomes include novel compressed structures for representing sequences; querying algorithms which can operate with reduced computational resources; and an enhanced capability for handling dynamic and evolving biological sequence data. The outcomes of this project can benefit a range of scientific research discovery applications by improving analytical capacity while reducing the time and resources required.