
AI Has Accelerated the Discovery Phase, But Not the Validation Step
In the last five years, structure prediction and generative design have changed how early-stage drug candidates are generated. AlphaFold 2 and 3, RFdiffusion, ProteinMPNN, Boltz-2, Chai, OpenFold, and a growing list of co-folding and binder-design models have made structural predictions that once required months of experimental work available in hours [1–4]. Pharma has responded with notable investment: AI partnerships across the industry now span foundation model development, generative antibody design, and discovery platform collaborations.
What has not changed is the requirement for experimental confirmation. Every AI-designed construct, every virtual screening hit, every computationally optimized small molecule, every generative binder remains a hypothesis until it is confirmed in the lab. Validation in drug discovery has always been multi-layered: biochemical and cellular assays, biophysical measurements, X-ray crystallography, NMR, cryo-EM, and ultimately animal and human studies. Structural biology, including cryo-EM, is one important part of that picture. As AI accelerates the discovery phase, the demand for structural data to support, refine, and occasionally refute model predictions grows.
How Structural Biology, Including Cryo-EM, Fits in an AI-Driven Discovery Loop
A modern drug discovery program runs as an iterative loop: identify or design, synthesize, test, analyze. Computational models, including AI models, can be used to identify or design new small-molecule binders for small-molecule discovery, or novel sequences for biologics discovery. The new entities are made, tested, analyzed, and the results can be fed back into the loop for the next round of design [5].
Structural biology can be used at two distinct points in this cycle. Knowledge of the structure of a target can inform the first step (design) and structural data can also be used in the validation step, where, alongside other analytical techniques, it provides support for the initial hypothesis and generates data that can be used to further train the model.
The validation step is the rate-limiting step for most AI-driven programs. Computational candidates are cheap and fast to generate; experimental confirmation, especially structural data generation, is slow and expensive. For it to remain relevant in the process, efforts to accelerate the structural biology step are necessary.

Where AI Predictions Still Need Experimental Support
There are three failure modes that are quite common in AI-driven structural workflows.
Side-chain and pocket placement errors, even when the backbone is right
One example of this problem is shown by some recent work on the human GPAT1 enzyme [6]. AlphaFold predicted the overall protein fold to within 1.5 Å RMSD of the experimental cryo-EM structure. By most metrics, this is an excellent prediction. But neither the catalytic nor the allosteric pocket was correctly resolved in the AI model, because the side-chain placement at the functional sites was wrong [6]. For a drug discovery program targeting either site, the AI model alone would have actively misled medicinal chemistry. Although AI programs are consistently evolving, and the most recent versions perform better than old ones, they still rely on whatever structural information is available and will not be able to “predict” a novel (not present in the training model) conformation.
Conformational ensembles invisible to static prediction
Most AI structure prediction returns a single static model. Many drug targets, including GPCRs, transporters, antibodies, and allosteric enzymes, exist as ensembles, and binding properties can be determined by the unbound conformational landscape as much as by the bound pose. A recent study on a SARS-CoV-2 broadly neutralizing antibody, which we discuss below, makes this point sharply: a mature antibody and its germline ancestor adopted nearly identical conformations when bound to antigen, but differed substantially in their unbound states [7]. A structure prediction trained on bound complexes alone would not have captured the conformational rearrangement that distinguishes the two.
Novel constructs and de novo designs with no training-set analog
Generative models like RFdiffusion produce protein backbones that, by design, may have no close relatives in the PDB [3]. In absence of experimental models, confidence metrics from AI tools are uncalibrated for these outputs. Without experimental data to confirm it, there is no way to know for sure whether a designed protein folds as intended, binds at the predicted interface, or behaves as designed at all.
Structural biology is one of the methods that, in a single experiment, can address all three problems, and as such should be included in the process as early and as frequently as the project allows.
Case Study: Cryo-EM Reveals the Conformational Logic of Antibody Maturation
A recent collaboration between researchers at UC San Francisco and our team at NanoImaging Services (NIS) illustrates how structural data can address a question that AI alone cannot. The study, published as a preprint by Tharp et al. [7], examined how a human antibody lineage matured to recognize divergent SARS-CoV-2 spike variants by acquiring 13 somatic mutations. The central question of the paper was not affinity prediction, it was pathway accessibility: of the many possible orders in which those 13 mutations could have been acquired during affinity maturation, which were actually compatible with the biophysical constraints the antibody had to satisfy along the way?
The team built a high-throughput biophysical platform to characterize all 8,192 possible evolutionary intermediates between the unmutated germline and the mature broadly neutralizing antibody, measuring affinity, surface expression, and polyspecificity for each variant. Computational analysis of those measurements revealed that the vast majority of mutational paths between germline and mature were biophysically inaccessible. Mutations that gained affinity often came at a cost to expression or specificity, and only a small subset of paths navigated those trade-offs successfully.
Cryo-EM provided further insights into the evolutionary mechanism. We solved structures of both the germline and mature antibodies in their free states, and in complex with the BA.1 and BA.4 spike binding domains. Despite showing roughly one-hundred-fold difference in affinity, the bound conformations of the germline and mature Fabs were nearly identical. The affinity gain was not driven by new contacts between antibody and antigen. Instead, by comparing the structures of the bound and unbound Fab, it became evident that the mature antibody had become preconfigured for binding through a rearrangement of two complementarity-determining region loops in its unbound state [7].
This finding directly connects to the pathway question. Several of the strongest-effect mutations in the lineage sit in the loops that change conformation. The order in which those mutations are acquired matters because some of them are only tolerated once the loop has rearranged — they would create steric clashes and not be tolerated in the germline conformation. The accessibility of a mutational path therefore depends on whether the mutations are compatible with the conformational state of the antibody at each step. Without paired free and bound structures from cryo-EM, the conformational logic that constrains which paths are accessible would not have been visible.
For teams running AI-driven antibody discovery the implication is that the training data needs to capture unbound as well as bound poses. Structures of antibodies in their unbound states are underrepresented in the PDB, and the gap matters for model quality. Each new structure of a free Fab, particularly paired with bound counterparts, adds signal that current models do not have.
What Useful Structural Biology Integration Looks Like in Practice
Pairing structural biology with AI-driven discovery puts specific demands on the experimental side.
Frequent integration matters more than throughput
AI workflows can iterate quickly, and no current structural method can match that pace one-for-one. Cryo-EM timelines have compressed substantially with multi-grid automation, on-the-fly processing, and AI-assisted particle picking and model building [8,9], and routine projects on well-behaved samples now move faster than they did even three years ago, but they still are not capable to provide experimental support for every iteration. The goal should then be to include structural validation at the points in the program where it matters most: to confirm a binding mode before chemistry investment, to characterize a difficult target where prediction confidence is low, or to resolve a conformational ambiguity that affects model selection. Setting realistic expectations matters; project timelines depend heavily on sample behavior, and the up-front conversation about what is achievable is part of running a productive collaboration.
Different questions need different (re)solutions
Cryo-EM is particularly well-suited to targets that have historically been hard for crystallography, including large complexes, membrane proteins, GPCRs, transporters, antibody–antigen complexes, engineered scaffolds, and de novo binders [10]. The resolution required depends on the question. For epitope and paratope work on a Fab–antigen complex, data to a nominal 3 to 3.5 Å resolution is often sufficient to validate the interface and identify the major contact residues. For small-molecule ligand binding, where specific interactions, hydrogen-bond geometries, and pocket water positions matter, higher resolution data, ideally better than 2 Å is needed; although cryo-EM data have been shown a steady improvement in achievable resolution, for these questions, crystallography may provide better answers. It is worth keeping in mind that structures tell you what can be visible under those specific experimental conditions rather than definitively confirming or refuting a predicted pose. A structure without observed ligand density does not necessarily prove the prediction is wrong; other methods are always needed alongside the structural effort to support or dispute a model.
Sample preparation is still the limiting factor
Structural studies still rely heavily on having a well-behaved sample, and protein expression and purification of native, wild-type proteins as well as engineered constructs remains a recurring burden that often requires multiple cycles of optimization. Having protein production capabilities integrated with cryo-EM under one roof, as we do at NIS through our Proteos team, facilitates this cycle and eliminates the handoff that typically adds weeks to a project.
How Validation Data Feeds the Next AI Cycle
Structural biology does not confirm or reject a candidate on its own. It supports a model. A cryo-EM structure of a predicted protein–ligand complex that shows clear ligand density at the predicted site is strong supporting evidence for the prediction; a structure that does not show the ligand under those experimental conditions does not prove the prediction wrong, but it does flag that further experimental work (biochemical, biophysical, or cellular) is needed. As with any experimental method, structures are interpreted alongside other data, not in isolation.
Every structure generated in support of an AI-driven program also generates training signals for future model iterations. The need for structural data that can be used to train and improve AI models is an acknowledged gap in the field [5]. For example, the PDB contains few germline–mature antibody pairs, almost no structures of de novo designed proteins from generative models, and a relatively small number of protein–small-molecule complexes at the resolution needed for confident pose prediction. Cryo-EM is well positioned to help fill this gap, particularly for larger and more complex targets that are difficult to crystallize. Programs that integrate experimental validation with their AI workflows are not just confirming individual candidates — they are generating more of the structural data that the next generation of models will need.
Cryo-EM Is Becoming an Important Validation Tool for AI-Driven Drug Discovery
AI has changed how quickly initial drug candidates can be identified, but it has not changed the requirement that those candidates be experimentally validated at every step. Failures of AI models are often driven by the limited amount of experimental data available to train them, and this is true across the board, including structural prediction. Structural biology tools, including cryo-EM, are needed to generate the experimental data that improves the next generation of models.
For teams running AI-driven discovery programs, the question is not whether to validate experimentally. The question is how to integrate structural biology tightly enough into the design cycle that the validation step can be used routinely within the loop. That is the partnership we build with our clients at NIS — we offer protein production and structure determination scaled to keep pace with the way modern drug discovery works.
If you are running an AI-driven discovery program and want to talk about how cryo-EM and structural biology fit into your workflow, get in touch with our team.
1. Jumper, J. et al. Highly accurate protein structureprediction with AlphaFold. Nature 596, 583–589 (2021).
2. Abramson, J. et al. Accurate structure prediction ofbiomolecular interactions with AlphaFold 3. Nature 630, 493–500(2024).
3. Watson, J. L. et al. De novo design of proteinstructure and function with RFdiffusion. Nature 620, 1089–1100(2023).
4. Dauparas, J. et al. Robust deep learning–based proteinsequence design using ProteinMPNN. Science 378, 49–56 (2022).
5. Ferreira, F. J. N. & Carneiro, A. S. AI-driven drugdiscovery: a comprehensive review. ACS Omega 10, 23889–23903(2025).
6. Johnson, Z. L. et al. Structural basis of theacyl-transfer mechanism of human GPAT1. Nature Structural & MolecularBiology 30, 22–30 (2023).
7. Tharp, C. R. et al. Biophysical trade-offs in antibodyevolution are resolved by conformation-mediated epistasis. bioRxiv(2026). doi:10.64898/2026.03.12.711465
8. Punjani, A. et al. cryoSPARC: algorithms for rapidunsupervised cryo-EM structure determination. Nature Methods 14,290–296 (2017).
9. Bepler, T. et al. Topaz-Denoise: general deep denoisingmodels for cryoEM and cryoET. Nature Communications 11, 5208(2020).
10. Robertson,M. J. et al. Drug discovery in the era of cryo-electron microscopy. Trendsin Biochemical Sciences 47, 124–135 (2022).

Welcome to Finsweet's accessible modal component for Webflow Libraries. This modal uses Webflow Interactions to open and close. It is accessible through custom attributes and custom JavaScript added in the embed block of the component. If you're interested in how this is built, check out the Attributes documentation page for this modal component.
Infographic Available for Download
AI is accelerating drug discovery, but predictions still need experimental validation. See how cryo-EM and structural biology support AI-driven workflows.
