
Transformer-Based Classification of Genomic Sequences
Field
Semester
Project Overview
This project addresses the challenge of determining whether a sequence of bases in the genome encodes a benign or pathogenic protein. Genetic diseases arise when mutations cause proteins to misfold, aggregate, or malfunction, yet classifying sequences as pathogenic or benign by inspection is nearly impossible. The team is developing a transformer-based model that tokenizes genomic sequences and applies attention mechanisms to automatically extract meaningful features. By learning which bases and subsequences matter most, the model can more sensitively and accurately distinguish pathogenic from benign sequences.