CACHE Challenge and where the current virtual screening technologies are

The CACHE Challenge focuses on identifying small molecule compounds that bind to target proteins. While the CASP competition addresses protein structure prediction, CACHE serves a similar purpose in virtual screening.

The first challenge, sponsored by the Michael J. Fox Foundation, involved finding compounds for the WDR domain of the LRRK2 protein, which is associated with Parkinson’s Disease. Participant registration ran from December 1, 2021, to January 30, 2022, with results published in the paper titled CACHE Challenge #1: Targeting the WDR Domain of LRRK2, A Parkinson’s Disease Associated Protein on November 5th, 2024. The abstract of this paper provides a good overview, but I want to highlight one part:

“First-in-class, experimentally confirmed compounds were rare and weakly potent, indicating that recent advances are not sufficient to effectively address challenging targets.”

AI has rapidly transformed the established field of Computer-Aided Drug Design (CADD), demonstrating true innovation in areas like protein structure prediction - evidenced by this year’s Nobel Prize in Chemistry. However, it has not yet shown sufficient advancement in virtual screening.

Protein structure prediction is a challenging problem, but existing and new data often occupy similar spaces, making it relatively advantageous for AI innovation. In contrast, the task of finding small molecules that bind to specific proteins poses several difficulties for AI technologies:

  1. Importance of Pocket Structure: The pocket structure is diverse and defined not only by the protein but also by its interactions with ligands. If a single protein can bind 1,000 compounds, it could be argued that there are hundreds of distinct pockets.

  2. Uncharted Chemical Space: The number of commercially available compounds is in the billions, but this represents only a tiny fraction of the theoretically possible chemical space.

  3. Understanding Intermolecular Interactions: Both pockets and ligands are flexible. Using molecular dynamics to understand pocket flexibility is already complex and costly; adding a flexible ligand increases uncertainty significantly.

  4. Lack of Enough-quality Data: High-quality experimental data needed to address these issues is hard to obtain. While PubChem contains a vast dataset of bioactivity, assessing the reliability of individual experimental results is challenging or impossible, as is determining the necessary level of confidence.

Despite these challenges, virtual screening plays a crucial role in drug discovery for novel proteins and has already yielded significant achievements for decades. Although CACHE requires more time and effort compared to CASP, I believe it will significantly impact the advancement of this vital technology. Therefore, I encourage many companies and disease foundations to actively participate in supporting it.

Researchers may have varying opinions on how to address the aforementioned challenges. Personally, I believe that developing far better chemical representation methods is essential. Given the difficulty of overcoming data scarcity, leveraging existing data more intelligently should be prioritized. Improved scientific representations of pockets, molecular flexibility, and interactions will enable more effective application of advanced AI technologies. While this may be fundamental research, its potential impact could be substantial. I sincerely hope more researchers, especially in academia, will engage in this area.

You may want to read comments on my LinkedIn post discussing this article.

Updated:

Comments