Thoughts on a Paper About AI-Based Virtual Screening Systems
The paper I want to discuss today is titled “An artificial intelligence accelerated virtual screening platform for drug discovery”, published in Nature Communications on September 5, 2024. It’s available as Open Access, so you can read the full text regardless of your subscription status.
It’s not often that papers on topics like virtual screening appear in prestigious journals like Nature Communications. Congratulations to the authors.
I asked ChatGPT to summarize the abstract of the paper, and here’s what it came up with:
Summary
Structure-based virtual screening is a crucial tool in early drug discovery, with increasing interest in screening multi-billion chemical compound libraries. This study develops a new method called RosettaVS to enhance the accuracy of docking poses and binding affinities. The method demonstrates superior performance compared to existing state-of-the-art approaches, partly due to its ability to model receptor flexibility. Using this platform, promising compounds were identified through screening against two unrelated targets.
Key Points
RosettaVS is a highly accurate structure-based virtual screening method for predicting docking poses and binding affinities.
It outperforms other state-of-the-art methods by modeling receptor flexibility.
Hit rates of 14% for KLHDC2 and 44% for NaV1.7 were achieved.
Screening was completed in less than seven days, and the predicted docking pose for the KLHDC2 ligand complex was validated by high-resolution X-ray crystallography.
To summarize this even more briefly:
RosettaVS is a new method for virtual screening that not only shows high hit rates but also generates accurate poses, as confirmed by experiments.
The superior performance of RosettaVS is likely due to its ability to consider receptor flexibility.
For those of us working directly in this field, a detailed analysis of the paper’s content is essential. However, for many who wish to use such methods without developing them themselves, it’s more important to grasp the key messages of the paper rather than conduct an in-depth analysis. Here are some key takeaways for those who want to apply this method to their research:
Key Takeaways
Understanding Datasets is Crucial for Comparisons
As with all papers, this one uses various datasets to prove the superiority of its methodology. For example, Figure 1b of the paper shows results from the DUD (Directory of Useful Decoys) benchmark. Without knowing what the DUD dataset entails, you can’t accurately interpret these comparisons.
Understand the Differences Between Benchmarks and Your Specific Situation
The benchmarks of this paper were conducted on 40 different proteins to demonstrate versatility. If you’re focused on a specific protein, you’ll want to check whether similar proteins were included in these results. Reading Figure 1f suggests that using RosettaVS gives you about a 50% chance of finding a hit compound within the top 1%, but you cannot find the results of each protein here.
In the case of KLHDC2 ubiquitin ligase (it is a brilliant case study), approximately 6 million compounds were docked (using faster method) from the Enamine REAL database (around 5.5 billion virtual compounds), and the top 50,000 were re-docked considering protein flexibility using better method. Out of these, 1,000 were selected and filtered down to 54 compounds through property prediction and similarity clustering. Eventually, 29 compounds were synthesized, with one (C29) showing a hit at around 3 µM, and its derivatives displayed similar activity.
Although benchmarks indicate a roughly 50% chance of finding a hit among the top 1% of compounds, in practice, 29 out of 1,000 selected compounds were synthesized, with one meeting the typical criteria for a hit compound. This shows that selecting compounds from an initial pool is crucial.
There is No De facto Standard in Virtual Screening
What if the KLHDC2 case was handled using another program instead of RosettaVS? Could Autodock Vina have identified good hit compounds? No one can answer that without trying it.
It’s clear that virtual screening isn’t solely about the excellence of docking simulations. The KLHDC2 case is a good practical example showing that virtual screening isn’t just about docking millions of compounds and selecting the top 1%. While having a good docking simulation tool increases your chances of finding hit compounds, it’s not the sole decisive factor.
If a particular software consistently produced the best results, other software would fall out of use. In reality, many software tools are used in various ways to produce results. Therefore, there’s no definitive answer to “Which docking simulation software is the best?” Instead, it’s about choosing the right tool for your situation and using it efficiently. For example, if considering protein flexibility is crucial for your protein research, knowing that RosettaVS can help in such cases will allow you to choose the right tool when needed.
Comments