AI-driven target discovery: it’s a journey

AI has promised to accelerate drug discovery, particularly the challenging task of identifying new therapeutic targets. But what does that look like in practice? How far have we come, and which challenges still need cracking?

AI-driven target discovery is still a relatively new field. Until recently, scientists relied on slow, manual research to find new drug targets. They pored over papers, tested genes one by one, and followed biological hunches. But with advances in machine learning and access to vast datasets, that is changing fast.

Over the past decade, a wave of AI-first biotech companies has emerged to disrupt the status quo:

Insilico uses machine learning to train models from public laboratory, clinical data and publications to predict target success.
Recursion uses AI-powered image analysis to spot subtle changes in cell shape and behavior in response to drugs or genetic perturbations that can reveal new drug targets.
Insitro combines machine learning with lab experiments to model how diseases work at the cellular level. ‍
Owkin K - Owkin’s AI - takes a patient data-first approach: we use proprietary enriched data from thousands of patients and past clinical trials to prioritize targets that are more likely to work.

Each company has its own approach. But they all share the same goal: to make finding the right drug targets faster, smarter, and more reliable than ever before.

AI as a guide

At the moment AI helps us in two ways:

Suggesting targets

The first is by analyzing the published literature and data faster than an entire human biomedical team could. To do this well, the AI has to be able to accurately predict three aspects of a new target: its efficacy, safety and specificity.

So how does AI go from raw data to real drug targets? Let’s take Owkin’s Discovery AI as an example.

The first step is data - a lot of it. The Owkin team accesses multimodal data from our worldclass network of partner institutions. They gather and clean these diverse types of biomedical data: gene mutational status, tissue histology, patient outcomes, bulk gene expression, single-cell gene expression, spatial resolved gene expression and clinical records. We also bring in existing knowledge on target druggability, gene expression across several cancers and in healthy tissues, as well as the phenotypic impact of gene expression in cancer cells in laboratories (from datasets and databases like ChEMBL and DepMap, plus past clinical trial results). We also generate data in key indications from our MOSAIC multiomic spatial dataset.

Next, the team specifies important features for the AI to take into account (e.g. cellular localization) and the AI extracts new features from other data modalities (e.g. H&E, genomic). In total we extract around 700 features with great depth in spatial transcriptomics and single cell modalities. Our AI is world-leading in these modalities largely due to our proprietary MOSAIC database, the world’s largest spatial omics database in cancer. Access to this data source gives our AI the unique ability to train on data that encompasses a variety of different biological information (e.g. tumour microenvironment).

Compared to features specified by humans, those extracted by AI may not be easily recognizable by humans - but they represent patterns in the data that humans would not be able to see, that could be predictive for target success. Our AI can also extract new features by analyzing our Knowledge Graph, a kind of map that links genes, diseases, drugs, and patient characteristics.

These features are fed into machine learning models trained like this: the engine is given the features extracted about that target and indication. It then uses a machine learning algorithm called a classifier to identify which key features are predictive of target success in a clinical trial. We then validate the accuracy of the model on the results of successful clinical trials of known targets.

In doing this our AI answers key biological questions: is this gene likely to be an effective drug target? Could it cause toxicity in critical organs? Is it relevant only to certain subgroups of patients (e.g. for a target in breast cancer, might the target be effective for only some patients but not others and can we identify those different groups)?

At the end of the process, our Discovery AI produces a score for each target, representing its potential for success in treating a given disease, as well as predicting the toxicity of a new target. Importantly, we have designed our AI with explainability at its heart. We can go back through the model and understand the importance of each feature to each prediction.

In addition, it can use optimisation methods to help identify subgroups of patients, within a disease, that will respond better to a given target. It can also find new uses for known drugs - drug repositioning.

Lastly, we use Large Language Models (LLMs) to connect unstructured insights from scientific literature with structured data, to complement our AI’s prediction with what is known in the published literature.

Crucially, the models that comprise Owkin’s Discovery AI are continuously retrained on both successes and failures from past clinical trials, allowing them to become smarter over time.

Put simply: Owkin’s AI works as a guide by combining biological data, clinical reality, and historical outcomes into one powerful decision-making engine. This is how our AI can match appropriate targets to indications in 2 weeks instead of 6 months. And this has been a major part of our collaboration with Sanofi, in which we have already delivered multiple candidate targets.

Testing targets

After target identification, the target then needs to be tested in a lab setting. Advanced AI can help here too. Let’s take Owkin’s AI as an example again. Because our AI works by distilling the important features of each target in relation to a given disease (or disease subpopulation), it can guide the experimental teams on where to focus their work. Here’s how:

First, our AI can help our biologists choose the right experimental model, for example, specific cell lines or organoids that closely resemble the patient group the target came from, or that best recapitulate intracellular pathways we might want to test. This makes our early testing more relevant and increases the chances of success later on.

Choosing the best model system for lab testing is crucial

Second, our AI can recommend experimental conditions. For example, those that best mimic the tumor environment, like specific combinations of immune cells, oxygen levels, or treatment backgrounds, based on patterns it has learned from real patient data. We can then adapt our culture conditions to achieve this.

In short, AI doesn’t just point to interesting targets, it can help us design smarter experiments to test those targets.

Here’s one real-life example of how Owkin’s AI made our lab validation more efficient. One of our AI-identified targets showed early signs of potential toxicity in the kidneys. Our AI had analyzed how the target was expressed across different healthy tissues and it predicted high expression in kidney cells, especially in the glomeruli, which are essential for filtering blood. Based on this insight, our AI flagged kidney toxicity as a risk with the target. So we prioritized testing in healthy kidney models early on - and we confirmed it was toxic. Confirming this risk helped us avoid investing time and resources into a potentially unsafe target.

AI as a guru

In the future, AI that focusses on target discovery should be able to not only guide us to targets with a high probability of success, and suggest how we might test them, but also be able to accurately predict the results of those tests.

For example, in the case of our kidney example above, a future AI would predict that our target would be toxic in kidney, without us having to do the experiment at all.

We're getting closer to being able to predict a target’s efficacy and toxicity before it reaches the clinic—but we’re not quite there yet. Today, most preclinical decisions still rely heavily on animal models and early lab experiments, which often fail because the models do not reflect the complexity of human biology. Conversely, some targets that look promising in model experiments in the lab ultimately fail in humans. For example, Navitoclax, a BCL-2 family inhibitor, showed strong anti-cancer activity in mice. It also showed some platelet toxicity but this was deemed acceptable to take the drug to trial. However, this toxicity was unexpectedly worse in humans, halting its development in solid tumors. These cases highlight the urgent need for more accurate predictive tools that are not subject to the same errors as our existing models. AI platforms can get closer to a human response by combining patient data, gene essentiality, and prior clinical outcomes.

So why can’t AI already tell us, with certainty, which targets will work and which won’t? The biggest roadblock is data. While AI has access to vast amounts of information, much of it isn’t the kind that can reliably predict how drugging a new target will affect patients. What’s missing is rich experimental and interventional data, especially from advanced preclinical models like organoids, patient-derived xenografts (PDX), and co-culture systems, that reflect the complexity of human biology. These new techniques - derived from patient tissues - when combined with multimodal patient data, will provide a much closer model of human biology.

Artist’s impression of organoids in a petri dish

An Agentic future

Equally important are the AI models themselves. Today’s systems are excellent at analyzing data and generating hypotheses, but to go beyond predictions from single models, we need to leverage more collaborative intelligence. For this, you need next-generation AI models such as agentic AI. Agentic AI can learn from previous experiments, reason across multiple types of biological data, and simulate how a specific intervention (like inhibiting a protein) is likely to behave in different experimental models. In the future, they will also be able to design and run experiments themselves in order to build their knowledge base autonomously.

This is what we at Owkin are doing with the launch of K Pro. We’re packaging 9 years of accumulated knowledge into these different techniques into one agentic AI co-pilot. This will enable users to access patient data, as well as the most cutting-edge models, through an intuitive AI interface that facilitates rapid and efficient investigation of biological questions.

Over time, with the right training data, we believe that these models will predict experimental outcomes before they're run, narrowing down which hypotheses are most promising and which are likely dead ends. In practice, that means faster, cheaper, and more targeted drug discovery—less time spent in the lab chasing false leads, and more focus on the targets that are most likely to work in patients.

And look out for our next blog, where we will deep dive into how we benchmark our target identification AI on existing clinical trial data.