Engineer at scale → Model DNA output → Make new chromosomes
We are a new team at the Generative Biology Institute, EIT, aiming to make chromosome-scale DNA designs predictable and programmable.
We gather data that are missing from current models of DNA function, use them to develop new frameworks that predict synthetic DNA behaviour inside cells, and apply these engineering tools and computational models to create mammalian chromosomes with defined functional properties.
We aim to make sequences with applications in medicine, biotechnology, and basic science.
Our research
Engineer DNA at scale
Most of human chromosomes are made of non-coding sequence - it plays critical roles in gene regulation, chromatin architecture, and the safeguarding of genetic information. In natural genomes, these functions are spread sparsely across gigabases, so we do not yet know how much of the non-coding DNA, and which sequences must be written to produce a chromosome that performs to a specification. Decoding how function emerges from non-coding sequence demands experimentation at the scale of billions of base pairs — a scale that has, until now, been out of reach. To close this gap, we have developed a versatile toolbox that harnesses CRISPR prime editing and recombinases to generate deletions, inversions, translocations, and duplications across the genome at scale. We exploit these tools to systematically create and phenotype defined and stochastic structural variants as a plentiful source of diverse sequence configurations not present in nature, enabling us to assign function to individual genes, non-coding sequences and their combinations, to build predictive models, and to probe the limits, rules, and biases that govern chromosome design.Model DNA output
How do we ensure that the chromosomes we design will actually function as intended inside a cell? Our tests on applying predictive models on sequences that substantially depart from the natural human genome have revealed that the performance degrades severely, exposing a gap in generalization ability that must be closed to design long sequences that include non-coding DNA. State-of-the-art sequence-based models such as Enformer and AlphaGenome, trained on rich compendiums of functional genomic data from initiatives like ENCODE and GTEx, can predict DNA methylation, gene expression, chromatin accessibility, transcription factor binding, and chromatin conformation with impressive accuracy across human and selected model organism genomes — yet they remain anchored to a fundamentally narrow slice of sequence and context space, built on the same canonical chromosomes that all current models share as their training foundation. To realise the full potential of large-scale DNA writing, we collaborate with the AI and Robotics Institute to develop computational methods that generalise robustly to novel sequences and contexts.Make new chromosomes
We now have the engineering tools to probe the functional boundaries of genomes in ways that were previously unimaginable. We have already demonstrated what is achievable: through iterative installation of recombinase recognition sites into repetitive sequences in human HEK293T and HAP1 cells, we recently created the most extensively engineered human genomes to date, accumulating over 1,600 targeted sequence insertions in a single cell line over the course of one year. Building on this foundation, we accelerate chromosomal engineering through automation and the infrastructure of the EIT, to radically remodel human cell line genomes. For example, iterative deletion of non-essential regions can expose the minimal requirements for chromosome segregation and replication, and directly test whether the vast non-coding expanses of human DNA are truly dispensable or required in some configuration; saturating genomes with disease-associated risk alleles allows testing hypotheses about mutational load in common disorders; and systematically eliminating xenoantigens from animal genomes can deliver safer transplant donors for humans. Selected publications
Generating long deletions across the genome with pooled paired prime editing screens. J Weller et al (2025). bioRxiv
Randomizing the human genome by engineering recombination between repeat elements. J Koeppel, R Ferreira, et al (2025). Science
Enhancer scrambling: systematic randomization of mammalian regulatory landscapes via CRISPR prime editing and recombinases. P Murat, J Koeppel, et al. (2025). bioRxiv
Engineering structural variants to interrogate genome function. J Koeppel et al. (2024) Nature Genetics.
The interplay of DNA repair context with target sequence predictably biases Cas9-generated mutations. A Pallaseni, EM Peets, G Girling et al. (2024) Nature Communications.
Pooled Genome-Scale CRISPR Screens in Single Cells. D Schraivogel, LM Steinmetz, L Parts (2023). Annual Review of Genetics.
Prediction of prime editing insertion efficiencies using sequence features and DNA repair determinants. J Koeppel et al (2023). Nature Biotechnology
Predicting Mutations Generated by Cas9, Base Editing, and Prime Editing in Mammalian Cells. J Weller et al. (2023) The CRISPR Journal
Our approach
1) We work on important problems. We pick projects that bring change or impact our understanding. We know the context, examples, literature, and gaps. The projects reflect society’s, field’s, GBI’s, team’s and personal take on importance.
2) We get things done. We start projects with a scope and a clear vision of success, and finish them. Every project has an accountable leader. We plan ahead, and execute with urgency along the critical path without frustration.
3) We succeed as a team. We have a diverse mix of backgrounds and skillsets, complementing each other with our strengths. Everyone has a chance to grow.
4) We are excited about science. We read broadly, discuss latest developments, know the ancients, and keep up to date both with the depth of our field, and the entire breadth of engineering and modelling biology.
Our Team
Our Sanger team website, active until August 2026, is here
Team leaderZeinab Sheikhi
Ph.D student