The Epistemic Ceiling
The boundaries of biological knowledge
There is a moment that repeats itself across the history of science, and it doesn’t get nearly enough credit. It’s not usually the eureka moment, ironically. It’s the moment someone figured out how to look.
Before Galileo pointed a lens at Jupiter, the planets were just far off points of light that moved in strange ways. Theories existed that were usually quite elaborate, albeit internally inconsistent. But the observations were limited by the instruments of seeing. Before Leeuwenhoek ground his lenses and peered into canal water in the 1670s, the microbial world didn’t exist, not because it wasn’t there, but because no one had built an instrument that could witness it. The map of reality was not wrong so much as it was radically incomplete. Invisible things were simply not part of anyone’s model of the world.
What we call scientific revolutions are, most of the time, epistemic revolutions. Not new ideas. New observations.
We are sitting at exactly that kind of teetering edge right now. And I don’t think most people in AI or biotech have fully reckoned with what that means.
§ I
The Acceleration Is Real
The cost curves are genuinely staggering. Whole genome sequencing cost $3 billion when the Human Genome Project completed in 2003. It is now under $100.[1] RNA sequencing costs have followed a similar trajectory. The tools of molecular observation are approaching something like Moore’s Law.
Source: NHGRI Genome Sequencing Program. Y-axis is logarithmic — each gridline represents a 100× reduction. The Human Genome Project completed in 2003 at a cost of ~$50M. By 2025, the same sequencing costs under $100.
Meanwhile, AI has made the analytical layer essentially free. Foundation models trained on hundreds of millions of biological sequences can predict protein structures, generate novel molecules, and find patterns across multi-omic datasets at a scale no human team could ever hope to approach. The ability to build, iterate, and deploy biological AI tools has never moved faster.
But here is the asymmetry that sits at the center of everything: the tools for generating ground truth, for actually observing what living systems are doing, causally, in response to interventions, are not accelerating at the same rate. And they may be subject to fundamentally different limits.
The inference engine has been supercharged. The observation layer is still broken.
§ II
I Tie All Physics Matters to Physics. All Business Matters to Finance.
I always dreamt (and sometimes still do) of becoming a physicist before becoming an entrepreneur and then before becoming a biotech entrepreneur, and I find that every hard problem in any scientific field eventually reduces to something a physicist would somewhat recognize.
In physics, we accept that observation has costs. The uncertainty principle is not an engineering problem to be solved. It is a statement about the structure of reality. When you measure position, you disturb momentum. The act of looking is entangled with the thing being seen.
Biology has its own version of this, and it is more brutal in practice.
When you take a biopsy, you are sampling a single frame from a dynamic movie. When you lyse a cell to read its transcriptome, you have killed the thing you wanted to understand. When you grow cells in a dish, you have stripped away every environmental signal that shaped their behavior in the body. When you give a drug to a mouse, you are making an implicit claim that the mouse is a good enough proxy for a human — a claim that the data increasingly refuses to support.
Every observation comes with a distortion. This is not a solvable engineering problem the way compute costs are solvable. It is a structural feature of trying to measure living things.
“The first principle is that you must not fool yourself — and you are the easiest person to fool.”
— Richard Feynman, Cargo Cult Science, 1974 [11]
Feynman, who wandered into biology late in his career, noticed the structural problem too.[12] He observed that unlike physics and chemistry, biology lacks a basic foundation of fundamental laws developed by theory and proven by experiment. Without that foundation, there is no guiding principle from which to derive predictions. You are always, to some degree, flying blind.
In biology, the mechanisms that fool you are not personal failures. They are baked into the instruments.
§ III
The Examples Are Everywhere, and They Are Devastating
Case I: You were never looking at what you thought.
For decades, researchers around the world believed they were studying breast cancer cells, lung cancer cells, liver cells. Many were studying HeLa — the cervical cancer cells from Henrietta Lacks, which had contaminated their cultures and outcompeted everything else in the dish.
Estimates suggest that between 10–20% of all cell lines in scientific use are contaminated. The scientific community ignored the first whistleblower — cell biologist Stanley Gartler — for a decade.[9] John Masters, who finally forced the reckoning in 2002, argued that scientists should assume all cell lines are contaminated until proven otherwise.[10]
The epistemic implication isn’t about fraud or sloppiness. It’s about something more fundamental: the most basic fact — what you are observing — was wrong. For years. Across hundreds of labs. Nobody knew.
Case II: Your model doesn’t predict the system you care about.
Over five decades, the biomedical community ran what might be the most expensive epistemic experiment in history. Nearly 150 clinical trials were conducted to test agents intended to block the inflammatory response in critically ill patients. Every single one failed.[8]
A landmark 2013 paper in PNAS finally explained why.[7] Among genes changed significantly in human inflammatory disease, the mouse gene responses correlated with human responses at an R² between 0.0 and 0.1. In plain terms: the genomic response of mice to inflammation matches the human response with the accuracy of random chance.
The observation platform was wrong. Thousands of preclinical trials, decades of Nobel-caliber science, built on a broken instrument. This isn’t a sepsis-specific problem. It is a general statement about what happens when you observe a proxy instead of the thing you care about, and never verify the gap.
Case III: The biology was wrong, not the chemistry.
Nine out of ten drug candidates that enter Phase I clinical trials fail.[4] And that 90% failure rate only counts drugs that already survived preclinical testing. Include earlier attrition and the figure is far worse. For oncology specifically, 97% of drugs fail to receive FDA approval.[5]
The dominant reason is not bad chemistry. It is lack of clinical efficacy — the core biological claim, derived from preclinical models, was wrong. As NIH’s Michael Gottesman put it: “There is no cancer drug in current use that was not first tested in a cultured cell model. We need to know precisely which cell a culture represents.”[13] The problem is that too often, we didn’t.
Case IV: Science cannot replicate itself.
A 2016 Nature survey of 1,576 researchers found that more than 70% have tried and failed to reproduce another scientist’s experiments. More than 50% have failed to replicate their own work.[2]
Amgen’s internal replication effort found that only 11% of landmark preclinical cancer findings could be independently confirmed.[3] An estimated $28 billion per year is spent in the US alone on preclinical research that cannot be reproduced.[6]
It is a story about what happens when you try to generate ground truth from living systems that resist stable observation. Minor differences in experimental conditions — the strain of cells, the temperature, the same reagent from a different supplier, the time of day — can change results. Reproducibility in biology is hard not because scientists are careless. It is hard because the thing being observed is dynamic, context-dependent, and resistant to isolation.
§ IV
What Is Living? The Deeper Problem.
Which raises the harder question underneath all of this: what exactly are we trying to observe?
In finance, every variable ultimately reduces to a claim on future cash flows. The complexity is staggering, but the ontology is clean. In physics, we have a rigorous definition of what a system is and what it means to measure it — even if measurement disturbs the thing being measured.
Biology doesn’t have this. “Living” is one of the genuinely unsolved problems in science. A virus is not clearly alive. A prion is a protein that replicates by inducing misfolding in other proteins. A tumor is the patient’s own cells, reprogrammed against the patient. An organoid is what, exactly? A simulation? An approximation? A model of something we don’t fully understand, instantiated in the very substrate we don’t fully understand?
“I have approximate answers and possible beliefs and different degrees of certainty about different things, but I am not absolutely sure of anything.”
— Richard Feynman, The Meaning of It All
This matters enormously for AI. Every machine learning system requires labels. Labels require ground truth. Ground truth requires observation. Observation requires a theory of what you’re observing. In biology, that chain has unresolved gaps at nearly every link. You can build the best model in the world, but if the training signal is corrupted — if you were watching the wrong cells, the wrong species, the wrong snapshot of a dynamic process — the model learns to predict a world that doesn’t exist.
Garbage in, garbage out. In biology, the garbage is subtle, expensive to collect, and often structurally unavoidable.
§ V
Telemetry Is the Bottleneck
Here is what I have come to believe: the next decade of biological AI will not be won by the team with the best model architecture. It will be won by the team that solves telemetry.
By telemetry, I mean the ability to continuously, causally, and where possible non-destructively observe living systems as they respond to the world. Not snapshots. Not endpoints. Not frozen slices of something that was alive thirty seconds ago. The actual dynamic signal of what a living system is doing and why.
The Movie
We have extraordinary tools for taking snapshots of biology. We have almost nothing for watching the movie. That gap is the bottleneck.
The companies and researchers that figure out how to generate ground truth at scale — real causal ground truth, not correlation dressed up as causation — will be the ones that matter. Not because their AI is marginally better, but because their training signal is fundamentally richer.
As Christine Brideau of Deerfield Management put it at the NYSCF Conference: “The biggest issue we have in drug discovery is really choosing the right target for the disease. Most drugs will fail in a Phase 2b trial because of lack of efficacy, and part of it is because maybe we didn’t choose the right target, or perhaps we didn’t have the right biomarker to be able to determine whether the drug was actually working.”[14]
In any learning system, the quality of the signal sets an absolute ceiling on the quality of the model. In biology right now, that ceiling is low. And the AI is already pressing against it.
§ VI
The Edge We’re Standing On
Galileo didn’t just build a telescope. He understood that the telescope was an epistemic instrument — that it would change not just what could be seen, but what we were allowed to believe. Leeuwenhoek didn’t just find bacteria. He established that there was an entire invisible world that had always been shaping human fate, and that every theory built before it was discovered was built on an incomplete picture of reality.
We are at that kind of moment again. The cost curves are bending. The compute is there. The models are ready to learn.
What remains is the hardest part: figuring out how to look.
The biological opacity that has defined medicine for all of human history — the opacity that killed people in every era because we simply could not see what was happening inside them — is finally starting to yield. Not because we have solved the epistemic problem, but because we are close enough to the edge of solving it that we can see its shape clearly for the first time.
The question is not whether AI will transform biology. It will. The question is whether the observation layer can keep pace with the inference engine. Whether the telemetry can supply the signal that the models are ready to learn from.
That is the race. And it is the most important race happening in science right now.
References
[1]NHGRI Genome Sequencing Program. Genome sequencing costs. National Human Genome Research Institute, 2023.
[2]Baker M. 1,500 scientists lift the lid on reproducibility. Nature 533, 452–454 (2016).
[3]Begley C.G. & Ellis L.M. Drug development: Raise standards for preclinical cancer research. Nature 483, 531–533 (2012).
[4]Sun D. et al. Why 90% of clinical drug development fails and how to improve it. Frontiers in Pharmacology, PMC9293739 (2022).
[5]Wong C.H., Siah K.W. & Lo A.W. Estimation of clinical trial success rates and related parameters. Biostatistics 20(2), 273–286 (2019). 97% oncology failure rate.
[6]Freedman L.P., Cockburn I.M. & Simcoe T.S. The Economics of Reproducibility in Preclinical Research. PLoS Biology (2015). $28B/yr estimate.
[7]Seok J. et al. Genomic responses in mouse models poorly mimic human inflammatory diseases. PNAS 110(9), 3507–3512 (2013).
[8]Hotchkiss R.S. & Opal S. Why have clinical trials in sepsis failed? Trends in Molecular Medicine 20(4), 2014.
[9]Gartler S.M. Genetic markers as tracers in cell culture. National Cancer Institute Monograph 26, 167–195 (1967).
[10]Masters J.R. HeLa cells 50 years on: the good, the bad and the ugly. Nature Reviews Cancer 2, 315–319 (2002).
[11]Feynman R.P. Cargo Cult Science. Caltech commencement address, 1974. Reprinted in Surely You’re Joking, Mr. Feynman! (1985).
[12]Feynman R.P., Leighton R.B. & Sands M. The Feynman Lectures on Physics, Vol. 1. Addison-Wesley (1963). Chapter 3: The Relation of Physics to Other Sciences.
[13]Gottesman M.M. quoted in: Couzin-Frankel J. The Dirty Little Secret of Cancer Research. Discover Magazine (2019).
[14]Brideau C. (Deerfield Management). NYSCF Conference panel on drug discovery. New York Stem Cell Foundation (2022).
[15]ICLAC Register of Misidentified Cell Lines. International Cell Line Authentication Committee, Version 12 (2023).








