Transformer-Based Classification of Genomic Sequences

Field

Biology

Biology

Semester

Fall 2025

Fall 2025

Project Overview

This project addresses the challenge of determining whether a sequence of bases in the genome encodes a benign or pathogenic protein. Genetic diseases arise when mutations cause proteins to misfold, aggregate, or malfunction, yet classifying sequences as pathogenic or benign by inspection is nearly impossible. The team is developing a transformer-based model that tokenizes genomic sequences and applies attention mechanisms to automatically extract meaningful features. By learning which bases and subsequences matter most, the model can more sensitively and accurately distinguish pathogenic from benign sequences.

Bonsai Applied Computations Group

© 2026. All rights reserved.

Bonsai Applied Computations Group

© 2026. All rights reserved.