AI in drug discovery
Artificial Intelligence (AI) technologies are currently one of the key drivers in the global pharmaceutical industry’s transition towards a data-centric approach to computational drug discovery. These technologies are increasingly being applied across multiple stages of the drug discovery pipeline, where they may help improve prioritization, inform decision-making, and address some development bottlenecks.
The focus of AI in drug discovery has also expanded beyond small molecule modalities to large molecule therapeutics or biologics such as antibodies and RNA-based therapies. As technology continues to evolve, AI technologies, used alongside experimental approaches, are becoming an important component of many early-stage in silico drug discovery workflows.
Increasingly, the value of AI in drug discovery is determined not by model complexity alone, but by how well biological context is preserved across data, computation, and experimentation. Platforms such as MindWalk are being developed with a focus on biological fidelity, traceability, and integration with experimental workflows.
In this article, we will take a closer look at some potential applications of AI in modern in silico drug discovery.
1. What is modern in silico drug discovery?
Modern in silico drug discovery is powered by a combination of conventional Computer-Aided Drug Design (CADD) techniques, the predictive capabilities of AI technologies, and knowledge derived from experimental research. This blended approach to early-stage drug discovery enables researchers to use in silico models to support candidate prioritization before traditional experimental validation.
2. Why use AI in drug discovery?
A 2023 report on the potential of AI in drug discovery presented a high-level value model to assess the comparative impact of AI technologies across key phases (target identification and validation, target to hit, hit to lead, lead optimization, and preclinical) of the small molecule discovery value chain vis-a-vis typical experimental drug discovery. The model, which assessed the potential impact on three scenarios (new molecule discovery for poorly understood targets, well-understood targets, and repurposing existing molecules for new targets) suggested the potential for time, cost, and performance improvements across multiple phases of early-stage drug discovery.
In the target identification and validation phase, AI-based approaches can support faster hypothesis generation, target prioritization, and analysis of disease-target relationships. Specifically for small molecules, during the target-to-hit stage, AI-driven in silico models can broaden the chemical space available for exploration and support the identification of novel target-molecule relationships. In the hit-to-lead and lead optimization phase, the predictive capabilities of AI are designed to enable prediction of compound properties and support for more efficient design-make-test cycles. Finally, preclinical AI applications include data-driven assessments of toxicity, pharmacokinetics, pharmacodynamics, etc. that facilitate prioritization of compounds for testing and may support more efficient prioritization of compounds for testing.
AI-driven approaches are also helping address several pain points in early-stage antibody design and optimization. For instance, some of the key challenges include the identification of relevant/superior epitopes, the selection of the most appropriate candidate across experimental outputs/human repertoires, and characterization of epitope/paratope interactions.
The main advantages of these new AI models are their ability to find patterns in their training data, to put these patterns into an abstract mathematical representation, and to find optimal solutions through “simple” algebraic operations, in a more automated and scalable way than manual analysis alone, or a machine could through ab initio model (such as molecular dynamics), and with the benefit of being fairly automated.
On one hand, antibody-specific language models are becoming increasingly important to cope with the vast amounts of data generated by next-generation antibody sequencing technologies and are powering a wide range of applications including advanced sequence generation, prediction of properties such as binding sites, humanness, thermostability etc. Hence, a language model trained specifically for inferring thermostability can in some cases generate predictions at scale within relatively short runtimes, depending on the model and infrastructure used. These models are typically used at the early stage of the discovery funnel.
On the other hand, significant progress has been made by introducing deep learning models trained on extensive protein (and antibody) structure data sets for tasks including prediction of antibody CDR domain and H3 loops, prediction of epitope (linear and conformational), prediction of paratope and prediction of complex conformations. For instance, deep learning models such as AlphaFold, IgFold, and ABodyBuilder have expanded the capabilities of antibody structural modelling beyond traditional homology-based approaches. In some applications, these methods have shown improved predictive performance in some applications relative to earlier computational techniques. In turn, these advances may support more reliable modelling of antibody-antigen conformations in certain settings. AI models for the prediction of the bound conformation of antibody-antigen are typically more demanding in terms of resources, and large-scale virtual screening (as performed for small molecules) cannot be done in this case. Hence, these AI models are more useful downstream the discovery funnel, for instance when optimizing a lead candidate drug.
Deep generative architectures, integrating geometric graph neural networks (GNNs) with large-scale protein language models, combine both sequence and structure information and supporting efforts to improve antibody sequence–structure co-design and enable researchers to simultaneously optimize the amino acid sequence and the 3D structure of antibodies design antibodies against target properties such as binding affinity, stability, and specificity.
Finally, AI tools support more streamlined antibody development workflows and earlier assessment of affinity and developability-related attributes.
AI in drug discovery, however, is not a monolithic system but rather a confluence of technologies that empower in silico drug development. The concept of AI itself is continuously evolving based on the emergence of a range of technological concepts such as machine learning (ML), deep learning (DL), artificial neural networks (ANN), reinforcement learning (RL), deep reinforcement learning (DRL), natural language processing (NLP), large language models (LLMs), etc.
Read: AI, ML, DL, and NLP: An Overview
Read: NLP, NLU & NLG : What is the difference?
In recent years, there has been a lot of attention on AI-powered technologies like NLP to unlock the value embedded in vast volumes of unstructured textual data in the life sciences industry. The key approaches to NLP in drug discovery can further be broadly classified as rules-based, ML-based, and hybrid approaches.
Currently, everyone’s talking about LLMs and Generative AI, with the focus squarely on broadening access to these transformative technologies by integrating these technologies into bioinformatics platforms, pipelines, and workflows.
And finally, no conversation about data-centric AI applications will be complete without mention of the must-know innovation of Knowledge Graphs (KGs). Knowledge graphs are at the epicenter of AI-driven drug discovery for their ability to integrate all life sciences data, structured and unstructured, required to build the ML models that drive decision-making and the power of context to augment AI/ML approaches. The NLP-based semantic data integration and representation capabilities of these graph models, integrated with the scale of LLMs, will form the foundation for next-generation drug discovery.
3. How is AI being used in drug discovery and development?
Accelerated drug design
Back in 2019, an article in Nature Biotechnology detailed a new deep generative model, called generative tensorial reinforcement learning (GENTRL), that was able to discover six potent inhibitors of DDR1, a kinase target implicated in fibrosis and other diseases, in just 21 days. Of the six compounds identified, four were active in biochemical assays, two were validated in cell-based assays and one lead candidate demonstrated favorable pharmacokinetics in mice.
Earlier this year, a team of researchers at MIT unveiled the DIFFDOCK, a diffusion generative model that reframed molecular docking as a generative modeling problem rather than a regression problem. This new approach has been described as a promising approach for docking on computationally folded structures approach for docking on computationally folded structures, with reported improvements in inference speed and pose prediction in some benchmarking settings. The capabilities of generative diffusion models are also being integrated into antibody design, docking, and optimization pipelines to enhance antibody functionality and to enable antibody sequence–structure co-design with a focus on favorable developability attributes.
Today, DL methods are now widely used across multiple areas of drug discovery, including molecule generation, molecular property prediction, retrosynthesis, and reaction prediction. Though predominantly focused on ligand-based approaches, these techniques are now being applied to accelerate structure-based drug discovery by addressing key challenges such as polypharmacology by design, selectivity optimization, activity cliff prediction, and target deorphanization.
Generative AI frameworks, which include DL algorithms, have helped expand the data analysis, pattern identification, and prediction capabilities of classic AI systems to a new paradigm of leveraging a variety of inputs, including text, images, audio, video, 3D models, etc., to create brand new outputs. The multimodal capabilities of generative AI to scale across diverse data types open up several new opportunities in early-stage research and discovery. Deep generative models, with their ability to generate novel chemical and biological structures with desired properties, are expanding the horizon for drug design in small as well as large molecules. Generative AI applications in drug design include target-agnostic/target-aware molecule design, molecular conformation generation, protein and antibody representation learning, protein structure prediction, and the design of novel proteins with desired functionalities.
Despite the growing adoption of these technologies in drug discovery, there are still several challenges, such as requirements for large, high-quality training datasets, hallucinations, and ethical and regulatory considerations, that still have to be addressed.
Prediction of drug bioactivity
AI is increasingly used in bioactivity prediction, the biological effect or response of a drug on a specific target or pathway. This is an important stage in drug discovery to identify potential candidates with therapeutic effects and understand their interactions with biological systems.
AI techniques help address several of the time and cost-related challenges of classical approaches to predicting bioactivity and are used to support bioactivity prediction for both small and large molecules.
There are distinct approaches to AI-driven drug bioactivity prediction. At a foundational level, the ability of AI technologies to integrate diverse multi-omics data for a holistic molecular understanding of biology can enhance bioactivity prediction. AI techniques have also paved the way for integrating chemoinformatics and bioinformatics data, chemical databases, biological assays, and clinical data, for a comprehensive understanding of drug-target interactions and how changes in chemical structure may impact bioactivity.
Deep learning techniques, such as graph neural networks, can be trained on diverse datasets, including chemical structures and biological activity profiles to learn patterns and correlations to predict bioactivity for new compounds. Machine learning paradigms such as reinforcement learning (RL) and deep reinforcement learning (DRL) have also been used to explore the chemical space to identify optimal compounds in terms of pharmacokinetic properties and bioactivity.
In some property-prediction tasks, DL models have shown stronger performance than traditional ML approaches, though interpretability remains a limitation.
Advancing risk assessment: immunogenicity and developability
There are three broad computational approaches to addressing developability and immunogenicity risks in early-stage antibody drug development. The classical approach relies on standard sequence and structure-based drug design tools to re-engineer antibody hits derived from traditional hit discovery frameworks. The contemporary approach combines next-generation sequencing (NGS) and machine learning (ML) to engineer antibody libraries and minimize developability liabilities. The emerging approach leverages advanced DL and AI frameworks for protein structure prediction and computational design to generate candidates for evaluation against multiple potency and developability criteria.
ML-based computational tools are increasingly being used in antibody drug discovery to predict the developability and immunogenicity of antibody candidates based on either physicochemical properties or antibody sequences or structures. Both immunogenicity and developability screening requires a blend of in silico and in vitro assays that are holistically predictive, with the process starting with in silico assessments and progressing to in vitro/ex vivo assays as required.
In recent years the availability of a critical mass of immunogenicity-related preclinical and clinical data has opened up the potential for the application of AI/ML technologies to learn from data even in the absence of hypotheses to test. AI-driven methods are also emerging as first-tier screening tools to profile small molecules and provide liability estimations.
Efficacy & toxicity prediction and optimization
An analysis of clinical trial data from 2010 to 2017 revealed that the two most common reasons for new drug failures were lack of clinical efficacy (40–50%) and unmanageable toxicity (30%). Therefore, the early and accurate prediction of toxicity and efficacy, which depend predominantly on pharmacokinetic and pharmacodynamic parameters, is important for candidate prioritization and risk assessment during drug development.
Currently, ML models are widely used to predict drug efficacy and toxicity both as assessment frameworks, where potential efficacy or toxicity has to be predicted for a predefined therapeutic entity, or as drug design frameworks, where the models generate candidate therapeutics for further evaluation against safety and efficacy-related criteria
AI algorithms capable of analyzing vast volumes of chemical and biological data are used to design and optimize drug candidates based on efficacy, toxicity, and pharmacokinetic predictions. Generative AI models, based on advancements in DL techniques such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), are now being used to produce new chemical structures with properties similar to the training set and generate novel compounds with desirable properties, such as low toxicity. DL-enabled natural language processing is driving the automatic identification of directional pharmacokinetic drug–drug interactions from ever-increasing volumes of biomedical literature.
Multi-drug synergy prediction
Combination therapies, which involve administering several separate drugs or several drugs combined into a single medication, are used in many therapeutic settings to address disease through complementary mechanisms of action. In complex diseases like cancer, a synergistic amalgamation of anti-cancer drugs has become an important part of treatment strategies in some cancers.
However, identifying novel synergistic combinations remained a long-standing challenge due to the sheer size of the combinatorial space and the exponential number of possible chemical combinations.
In recent years, there has been growing research interest in computational approaches to in silico screening of potential drug synergies. DL techniques have shown promising performance relative to some standard machine learning approaches in synergy prediction tasks.
DeepSynergy was one of the first DL models to apply deep neural networks (DNNs) to process chemical and genomic information and model drug synergies. This was quickly followed by a slew of progressive DL frameworks that have progressively advanced in silico synergy prediction. These include AuDNNsynergy (integrating multi-omics and chemical data), SYNDEEP (combining physicochemical, genomic, protein–protein interaction, and protein-metabolite interaction information), DeepTraSynergy (including drug–target interaction, protein–protein interaction, and cell–target interaction), and MatchMaker (based on chemical structure information and gene expression profiles of cell lines).
The focus of the research into computational drug synergy analysis is now expanding beyond predicting pairwise combinations to identifying complex higher-order combinations that are more effective for complex and precision applications. DeepMDS, a DL-based model multi-drug synergy prediction model leverages a large-scale dataset integrated by target information, drug response data, and large-scale genomic profile of cancer cell lines from varied tissues. This new approach is capable of DeepMDS predicting and ranking the most potent synergistic three or more drug combinations against a specific cell line or subtype of interest.
Advancing drug design & synthesis pathway generation
Advances in AI technologies are designed to power this new paradigm to design entirely new molecules or to modify existing molecules to optimize their desired properties. Generative AI models, such as variational autoencoders (VAEs) or generative adversarial networks (GANs), are capable of generating novel molecular structures that conform to multiple potency, solubility, toxicity, and synthesizability criteria.
Generative AI is also being explored for synthesis planning, including the proposal and ranking of potential synthetic routes. These approaches may support route generation for complex molecules in some use cases.
More importantly, these technologies address several specific challenges and bottlenecks in small molecule and antibody drug discovery.
In the case of the small molecule discovery pipeline, some of the key limitations include the lack of access to critical data sources, like ADMET, for example, the inability to scale across the vast chemical space, and the prolonged design-make-test cycles. The application of AI to small molecule discovery expands the scope of identifying compounds beyond the screening of existing chemical libraries to include generative design and facilitate efficient identification of hit- or lead-like molecules that are optimized for favorable properties.
In the case of antibody drug discovery, there are several challenges in identifying superior epitopes, selecting the appropriate candidates from experimental outputs/human repertoires, designing and optimizing novel antibodies, and the extended lead times to improve antibody preclinical properties. Here again, AI technologies are designed to have a range of benefits including efficient screening of pre-existing libraries, the integration of design capabilities, and the identification and subsequent optimization of antibody structures and formats for desired properties.
So, AI technologies are influencing multiple stages of the drug discovery pipeline and are expected to remain an important part of how in silico discovery workflows evolve.
4. The future of AI drug discovery
As mentioned, AI in drug discovery is not so much about a monolithic system as it is about a diverse array of technologies, each with a specific and strategic role to play in end-to-end in silico drug discovery.
An AI-powered data integration and management architecture is designed to seamlessly and automate integration of large volumes of high-quality, well-governed data from disparate and distributed data sources. A metadata-driven semantic knowledge graph can support the integration and curation of large volumes of data, including incoming data, into a unified and contextualized framework that is both machine and human-readable. The integration of AI-enabled LLMs to these knowledge graphs may extend the usefulness of these systems across drug discovery tasks. Techniques, such as Retrieval Augmented Generation (RAG), are likely to play an important role in improving reasoning, retrieval, and context management. The future of in silico drug discovery is likely to depend on closer integration of these technologies into one unified, scalable, end-to-end life sciences research platform.

Fully-integrated therapeutic end-to-end lead generation workflow with LensAI
5. AI drug discovery with LensAI Bio-native suite
The LensAI™ Bio-native suite supports antibody discovery workflows by combining data integration, analytics, and AI-enabled tooling within a unified environment. Platform capabilities include multi-omics data integration, high-throughput learning workflows, and flexible access models such as fee-for-service, SaaS, API-based deployment, and strategic partnerships. These capabilities are intended to support the organization, analysis, and use of complex biological data across discovery workflows.
Reach out to our team to learn more about the platform and its application in antibody discovery workflows.

LensAI Foundation AI Model for multiscale biological data integration