← Back to Blog
Bioinformatics

AlphaFold and AlphaEvolve: How AI Cracked Biology's Hardest Problem

Every protein in your body is a precise three-dimensional machine. Haemoglobin binds oxygen. Insulin signals cells to absorb glucose. Antibodies recognise pathogens with nanometre precision. What makes each of these machines work is not just its chemical composition - it is its shape. A protein that folds into the wrong shape does not merely fail to function; it can actively cause disease. Alzheimer's, Parkinson's, cystic fibrosis, and many cancers all have misfolded or mis-assembled proteins at their core. Determining the exact three-dimensional shape of a protein from its amino acid sequence alone has been called the most important unsolved problem in molecular biology - and for fifty years, it was largely out of reach.

Anfinsen's Dogma: Sequence is Destiny

In the late 1950s, biochemist Christian Anfinsen denatured the enzyme ribonuclease A with urea and a reducing agent, completely unfolding it into a disordered chain. When he removed the denaturing conditions, the protein spontaneously refolded back into its native, catalytically active form. The conclusion was profound: the three-dimensional structure of a protein is completely determined by its amino acid sequence [1]. No cellular machinery, no external template - just chemistry and physics finding the lowest-energy arrangement. This principle, now known as Anfinsen's dogma, earned him the Nobel Prize in Chemistry in 1972. It also reframed protein structure prediction as a computational problem: given a sequence, find the corresponding minimum-energy three-dimensional structure.

Levinthal's Paradox: The Impossible Search Space

Knowing that the sequence encodes the structure is very different from knowing how to find it. In 1969, Cyrus Levinthal highlighted the difficulty with a deceptively simple calculation [2]. A protein of just 100 amino acids might adopt three distinct rotational states per residue, giving roughly 3100 - approximately 5 × 1047 - possible conformations. Even at 1013 configurations tested per second, exhaustive search would require longer than the age of the universe. Yet proteins fold reliably in microseconds to seconds, in a single cell, without a supercomputer in sight.

Levinthal's paradox implied that proteins do not find their shape by random sampling. Instead, the energy landscape of a protein is shaped like a funnel: almost every pathway through the high-dimensional conformational space leads downhill toward the native structure, guided by incremental energetic rewards for correct local contacts [3]. The funnel exists because evolution has selected sequences that fold reliably. But computing where the bottom of the funnel lies - given only the sequence - remained extraordinarily difficult.

CASP: Twenty-Five Years of Slow Progress

In 1994, John Moult and colleagues launched CASP - the Critical Assessment of protein Structure Prediction - a biennial blind competition for structure-prediction algorithms [4]. Participants predict the structures of proteins whose experimental structures are known but not yet publicly released. CASP became the definitive yardstick for the field. For most of its history, that yardstick showed agonisingly slow progress. Methods based on comparative modelling (borrowing known structures from evolutionary relatives) worked well when a close relative existed; for genuinely novel proteins with no known homologues, accuracy dropped sharply. Physics-based simulation methods were accurate in principle but computationally intractable at realistic scales. Year after year, CASP targets that lacked homologues defeated every approach.

AlphaFold: The First Signal

At CASP13 in 2018, DeepMind's AlphaFold system arrived with a different strategy. Rather than simulating folding physics, it used deep residual networks to predict the probability distribution over inter-residue distances from co-evolutionary patterns in large sequence databases. The intuition was biological: if two positions in a protein sequence have evolved in concert across thousands of species - always mutating together - those positions are almost certainly in physical contact in the folded structure. AlphaFold extracted this signal from multiple sequence alignments (MSAs) with unprecedented fidelity. Published formally in 2020, it topped CASP13 by a significant margin [5]. But performance on the hardest targets remained inconsistent, and the gap to experimental accuracy was still large.

AlphaFold 2: A Decade of Progress in One System

CASP14 in 2020 was different. AlphaFold 2 did not merely top the competition - it achieved median backbone accuracy of around 0.96 angstroms across CASP14 targets, comparable to the precision of experimental determination by X-ray crystallography, and far ahead of every other submission [6]. The CASP14 assessors called it "a transformative advance for structural biology." The Nobel Committee agreed: in 2024, John Jumper and Demis Hassabis shared the Nobel Prize in Chemistry alongside David Baker, whose complementary work on computational protein design had opened the reverse direction of the same problem.

Three innovations drove AlphaFold 2's leap. First, the Evoformer - a novel attention-based neural network block that jointly attends over both the MSA matrix (sequences of evolutionary relatives) and a pairwise distance representation of the target protein, allowing structural information to flow iteratively between the two views. Second, end-to-end differentiable structure prediction: the network directly outputs three-dimensional atomic coordinates rather than distance predictions that feed into a separate optimisation step. Third, a recycling mechanism that feeds the model's own output back as input for multiple refinement passes, analogous to an organism iteratively refining a structure across evolutionary time.

Why Was It So Hard for So Long?

Twenty-five years of near-stagnation, then sudden collapse. The reasons reveal something important about the nature of hard scientific problems.

The data problem was critical. Co-evolutionary analysis of protein sequences requires thousands of aligned homologues to extract reliable signals. In the 1990s, the sequence databases simply did not contain enough entries. The exponential growth of DNA sequencing - and in particular metagenomic sequencing, which sequences entire microbial communities without culturing individual organisms - filled those databases across the 2000s and 2010s, making the evolutionary signal finally legible.

The representation problem was equally important. Capturing the geometric and long-range relational structure of a protein required a mathematical architecture that did not exist until the transformer attention mechanism, developed for natural language processing in 2017, provided a general tool for attending over sets of interdependent elements. AlphaFold 2's Evoformer is essentially an attention mechanism adapted to operate simultaneously over sequences and pairwise distances - a recognition that protein folding, like language, is fundamentally about long-range dependencies.

AlphaFold 3: Beyond Single Proteins

AlphaFold 2 was remarkable but limited: it predicted the structure of a single protein chain in isolation. Most of biology happens through interactions - proteins binding to other proteins, to RNA, to DNA, to small-molecule drugs. AlphaFold 3 (Abramson et al., 2024) addressed this by adopting a diffusion-based architecture capable of modelling arbitrary biomolecular complexes: proteins, RNA, DNA, small-molecule ligands, metal ions, and post-translational modifications, all in a single unified framework [7]. For drug discovery, this is particularly consequential. The critical question when designing a new medicine is often not merely "what shape does this target protein adopt?" but "how does this candidate molecule bind to the protein's active site, and does it block the interaction you want to disrupt?" AlphaFold 3 gives computational chemists a tool to explore that question at a scale and speed that experimental methods alone cannot match.

AlphaEvolve: Evolution as a Code Generator

In May 2025, DeepMind published AlphaEvolve - a system that applies evolutionary principles not to proteins but to algorithms themselves [8]. The setup mirrors natural selection. A population of candidate algorithms, each represented as executable code, is subjected to an automated fitness evaluation. A large language model - Google's Gemini - then generates mutated and recombined variants. The fittest survive to seed the next generation.

The results were striking. AlphaEvolve discovered a new method for multiplying 4 × 4 complex-valued matrices using fewer scalar multiplications than any previously known algorithm - reducing the count from 49 to 48, improving on a result that had stood for 56 years. It accelerated GPU kernels for tensor operations used in AI training, found improved scheduling heuristics now deployed live in Google's data centres, and produced new solutions to long-standing open problems in combinatorial geometry.

The connection to evolutionary biology is not merely metaphorical. AlphaEvolve's population-based search, fitness evaluation, selection pressure, and iterative mutation closely echo the mechanisms of natural selection - operating on code rather than genomes, and on timescales of minutes rather than millennia. It is perhaps the most literal incarnation yet of a bio-inspired algorithm: using the logic of evolution to evolve the very algorithms we use to do science.

What Comes Next

The AlphaFold protein structure database now contains predicted structures for virtually every protein in the human proteome - and for most known proteins across all sequenced organisms. This has compressed structural biology from years per structure to hours per structure, accelerating work on malaria vaccines, antibiotic resistance proteins, and the molecular mechanisms of CRISPR.

But structure is only the first step. Predicting protein dynamics, the effects of disease-causing mutations on folding, and how proteins behave in the crowded, heterogeneous environment of a living cell remain open and active challenges. And AlphaEvolve points toward a future where the algorithms used to solve these problems are not hand-crafted by researchers but discovered by machines applying evolutionary pressure to code - closing a loop that began when biologists first realised that natural selection was itself an optimisation algorithm.

Biology taught us the method. AI is now teaching biology back.

References

  1. Anfinsen, C. B. (1973). Principles that govern the folding of protein chains. Science, 181(4096), 223–230. doi:10.1126/science.181.4096.223
  2. Levinthal, C. (1969). How to fold graciously. In J. T. P. DeBrunner & E. Munck (Eds.), Mossbauer Spectroscopy in Biological Systems (pp. 22–24). University of Illinois Press.
  3. Dill, K. A. & Chan, H. S. (1997). From Levinthal to pathways to funnels. Nature Structural Biology, 4(1), 10–19. doi:10.1038/nsb0197-10
  4. Moult, J., Pedersen, J. T., Judson, R. & Fidelis, K. (1995). A large-scale experiment to assess protein structure prediction methods. Proteins: Structure, Function, and Bioinformatics, 23(3), ii–v.
  5. Senior, A. W. et al. (2020). Improved protein structure prediction using potentials from deep learning. Nature, 577, 706–710. doi:10.1038/s41586-019-1923-7
  6. Jumper, J. et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596, 583–589. doi:10.1038/s41586-021-03819-2
  7. Abramson, J. et al. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, 630, 493–500. doi:10.1038/s41586-024-07487-w
  8. Novikov, A. et al. (2025). AlphaEvolve: A coding agent for scientific and algorithmic discovery. Google DeepMind. arXiv:2506.13131. arxiv.org/abs/2506.13131