Machine learning can hasten the process of drug discovery. The new ML technique, DeepBAR, makes it possible as it quickly calculates the binding affinities between drug candidates and their targets.

Artificial intelligence and machine learning techniques are already proving effective in pharmaceutical procedures. Drug discovery is one of the crucial procedures to find new candidate medications in the field of medicine, biotechnology and pharmacology. According to the U.S. FDA, there are five steps for the development of a new drug. These include discovery and development, preclinical research, clinical research, FDA review, and FDA post-market safety monitoring. Since drug discovery requires huge amounts of data and research, many pharmaceutical companies are embracing AI and machine learning to accelerate the pace of drug discovery.
AI and ML techniques can also lower the costs of drug development. Drug discovery is a data-driven process. It involves a voluminous amount of data such as high-resolution medical images, genomic profiles, metabolites, molecular structures, and biological information. Machine learning and deep learning-fuelled artificial intelligence can correlate, integrate, and connect existing data more rapidly to help discover patterns in the data pools.
As drugs can only work based on their stickiness to their target proteins in the body, analyzing that stickiness is a key hurdle in the drug discovery and screening process. New research combining chemistry and machine learning could lower that hurdle. The new technique, called DeepBAR, can quickly calculate the binding affinities between drug candidates and their targets. DeepBAR combines traditional chemistry calculations with recent advances in machine learning. It computes binding free energy exactly, but it requires just a fraction of the calculations demanded by previous methods.
The “BAR” in DeepBAR stands for “Bennett acceptance ratio”. It is a decades-old algorithm used in exact calculations of binding free energy. According to the researchers, DeepBAR could one day quicken the pace of drug discovery and protein engineering.
The research has appeared in the Journal of Physical Chemistry Letters and led by Xinqiang Ding, a postdoc in MIT’s Department of Chemistry.
As per the study, using the Bennet acceptance ratio typically requires knowledge of two “endpoint” states. A drug molecule bound to a protein and a drug molecule completely dissociated from a protein, plus knowledge of many intermediate states, e.g., varying levels of partial binding, all of which bog down calculation speed.
The new machine learning technique slashes those in-between states by implementing the Bennett acceptance ratio in machine learning frameworks called deep generative models. These models create a reference state for each endpoint, the bound state and the unbound state, according to Bin Zhang, the Pfizer-Laubach Career Development Professor in Chemistry at MIT, and a co-author of a new paper describing the technique.
In using deep generative models, the researchers were borrowing from the field of computer vision. Though adapting a computer vision approach to chemistry was DeepBAR’s key innovation, the crossover also raised some challenges. “These models were originally developed for 2D images,” says Xinqiang Ding. “But here we have proteins and moleculesit’s really a 3D structure. So, adapting those methods in our case was the biggest technical challenge we had to overcome.”
In tests using small protein-like molecules, DeepBAR calculated binding free energy nearly 50 times faster than previous methods. The researchers then start thinking about using this to do drug screening, particularly in the context of COVID. “DeepBAR has the exact same accuracy as the gold standard, but it’s much faster,” says Zhang. They also believe that in addition to drug screening, DeepBAR could aid protein design and engineering, since the method could be used to model interactions between multiple proteins. They also plan to improve the ability of the new machine learning technique in the future to run calculations for large proteins, a task made feasible by recent advances in computer science.