Augmenting data scientists and biologists through the integration of LLMs in drug discovery

Drug discovery is complex and time-consuming, with drugs taking anywhere between 7 to 20 years to reach the market. In particular, identifying the right therapeutic targets – the molecules a drug needs to interact with to produce the desired effect – is slow and resource-intensive. With its ability to analyze vast amounts of structured data to characterize targets, there is no doubt that AI has been transforming this landscape. But to fully harness its potential to identify new candidate targets, two key elements need to be built: the extraction of high-quality labels for targets from past clinical trials (i.e. has the target been successful in past clinical trials?) and a robust, clearly defined target evaluation framework.

Labels are annotations that can be used to train AI models:

For example:

Is gene X a good target for indication Y: yes/no
Has it been to phase III: yes/no

The challenge of predicting novel targets for drug development

At Owkin we have developed TargetMATCH, an end-to-end AI engine that uses multimodal patient data as the input, and outputs the top candidate targets and paired patient subgroups that would most benefit from therapeutic intervention on these targets.

TargetMATCH uses AI in a way that allows us to scale target discovery. For this to work, having an evaluation framework in place for reviewing candidate targets and ranking them by their predicted success in clinical trial settings is extremely important and challenging at the same time. First of all, the definition of a good drug target is not an exact science: one of the quantitative information on which there is some level of consensus is that using the drug/targets/indications that went into successful clinical trials is one of the best accessible “ground truth”, but more work is needed to establish the drivers of clinical success. Moreover, labels are scarce: less than 1,000 of the ~20,000 protein-coding genes are currently targeted by a drug, and targets for which drug development has failed are underreported. In addition, the field is evolving very quickly: new technologies and drug modalities keep changing what can be considered a good target.

In this scenario, it is unsurprising that the expert review of drug targets is a significant bottleneck. Biologists still dedicate a considerable amount of time to review shortlisted targets, a process that relies heavily on prior knowledge and lengthy literature reviews, and is susceptible to bias.

This is where Large Language Models (LLMs) come into play: their integration into our TargetMATCH target discovery engine augments our team’s expertise and accelerates time to results.

The “biologist-in-the-loop model”: augmenting expertise and enhancing productivity through LLMs

When presented with a novel biological hypothesis (such as a novel therapeutic target), human experts will perform an extensive literature review to validate its relevance. Trained on massive amounts of scientific literature, LLMs can significantly speed up and improve this task.

The application of these technologies works best when used alongside human expertise. This is a model that we refer to as the “biologist-in-the-loop”, which aims at augmenting expertise and enhancing productivity through AI.

At Owkin, we developed an LLM tool to obtain biological summaries of specific genes instantaneously.

To do this, our expert biologists design scientific questions to evaluate the quality of a therapeutic target (e.g. “Which pathway is target X involved in and what is its role?”). Via carefully crafted question-specific queries to the Pubmed API and RAG (retrieval augmented generation), we automatically gather literature of interest in the form of abstracts. For each selected abstract the LLM generates a concise response to the original scientific question. Finally, the LLM summarizes the responses from the individual abstracts into a single answer that provides a complete picture.

The tool is designed with interpretability and trustworthiness in mind: it highlights the specific sources and data points in the biomedical literature that support its answers and suggestions, allowing our biologists to trace the reasoning behind each target recommendation, prevent hallucination by design, and find new resources of interest.

When used to augment our data scientists and biologists’ expertise, our LLMs allow us to generate and evaluate labels for targets faster and without human bias, through the automation of the literature review process. This has the added benefit of freeing up researchers’ time to focus on tasks like designing experiments and interpreting results.

‍

Our LLM tool is designed to obtain biological summaries of specific genes instantaneously.

‍

In the near future we’ll be able to use LLMs for additional tasks in target discovery. For example, to extract labels (directly from the literature) for targets based on novel technologies, such as spatial omics: this is important to identify targets that are currently under-represented in clinical trials but may be promising for drug development.

Harnessing human expertise and artificial intelligence to predict better targets

Evaluating targets discovered through an AI-based pipeline is a complex process that relies on the availability of high-quality labels, both positive (targets more likely to succeed) and negative (targets less likely to succeed). The goal of this evaluation is to increase the pool of promising targets for drug development the pipeline can identify in successive iterations.

More specifically, we want to improve the curation of our target labels as much as possible for our predictive models. To do this, we can use an active learning-like strategy known as “Uncertainty Sampling”. In this approach, we focus on targets that we consider false positives, i.e. targets that are highly ranked by our algorithm but that haven’t been to clinical trials, and false negatives, i.e. targets that are ranked low in our pipeline but have been to clinical trials. By feeding these targets (in a pair with an indication of interest) to our target summary tools previously described, we can now instantaneously gather relevant literature information and automatically generate labels (good/bad/risky) and flags (“early phase I launched”, “nuclear localization”) for those targets. These AI-generated labels and flags are validated by our biomedical experts, in a feedback loop that allows us to include additional scores and filters in our pipeline. This allows our researchers to refine our AI algorithms to predict the most promising targets and to filter out unsuitable targets.

Conclusion

The integration of LLMs into target discovery can have a significant impact on drug development. LLMs can scale and shorten timelines for identifying new promising drug targets, translating to faster drug discovery and development of new treatments. LLMs can also help uncover hidden connections and lead to the discovery of targets for previously intractable diseases. The collaborative “biologist-in-a-loop” model fosters innovation by harnessing the unique strengths of AI and humans. This powerful partnership promises to accelerate the pace of discovery, ultimately leading to the development of life-saving treatments and a brighter future for healthcare.