Since proteins are the molecular machines that are responsible for most life processes, studying them gives insight into how diseases develop at the microscopic level. Such information helps in a number of avenues, including drug development.
Folding@Home, winner of VentureBeatâ€™s 2021 AI for Good award, simulates protein behavior with massive distributed computing power. It uses AI to strategically map each protein that it evaluates, to allocate computing resources, and to identify structural protein anomalies that might indicate signs of brewing disease. Based at the School of Medicine at Washington University in St. Louis, Folding@Home launched in 2000 and works with other labs around the world, including ones at Memorial Sloan Kettering Cancer Center and Temple University.
The case for protein study
Linear chains of amino acids fold in specific ways to form proteins. If the mechanism goes awry, it can lead to disease. Alzheimerâ€™s and Huntingtonâ€™s are caused by such â€œmisfoldingâ€ events.
Conventional methods, such as X-ray crystallography, have helped scientists understand protein structures, but understanding folding mechanisms or how the proteins perform their functions over time requires more sophisticated techniques. Computer simulations based on physical models help bridge the gap. Thereâ€™s a problem here too: scale. â€œSome of the more complex simulations could easily take hundreds of years for a desktop computer to work through,â€ said Greg Bowman, PhD, director of Folding@Home. â€œWe need supercomputers to run these simulations. Folding@Home solves this challenge through a distributed mechanism, using the computing power of volunteersâ€™ machines to conduct the required simulations.â€
A volunteer â€œcitizen scientistâ€ is assigned a simulation that matches the specific hardware. â€œWe will send you a starting point, an initial structure of a protein, and the parameters of the model,â€ Bowman explains. The volunteer captures and sends snapshots of the protein structures at regular intervals. Another volunteer in the chain picks up right where the previous simulation ends. â€œIn this way we have created a whole map, with the snapshots being GPS coordinates,â€ Bowman said.
How AI helps
Given the scale of the project, Folding@Home has to be smart about its mapping procedures. A blind approach that relentlessly simulates everything is probably not necessary. Folding@Home iterates between simulations and building maps, which in turn tell them where to look next. AI helps with this decision-making as it sifts through results and determines which parts of the protein are more of the same and which ones are likely to yield more interesting results. After all, some regions of proteins are featureless, like plains, and others have more things going on, like New York City, Bowman explained.
Another challenge that AI addresses: the heterogeneity of volunteer computing resources. Computers with more power, for example, should be assigned more complex simulations. Folding@Homeâ€™s unsupervised AI learning models understand the resource match and make recommendations accordingly.
Finally, unsupervised AI is also helping researchers at Folding@Home find differences in proteins that can be tied more emphatically to disease. â€œWe have developed some deep learning tools where we can take different data sets and learn what distinguishes them,â€ Bowman said. In such cases, AI can parse through multiple sets of â€œnormalâ€ proteins and learn what â€œabnormalâ€ looks like.
More recently, Folding@Home shifted attention to SARS-CoV-2, the virus that causes COVID-19. Simulations of the spike protein on the virus and its behavior over time have helped scientists with vaccine and drug development, through the COVID Moonshot collaboration, which crowdsources cures for COVID-19.
Folding@Home has moved beyond a focus on protein folding mechanisms, Bowman said. He likens the process to studying a car and graduating to its ecosystem of many moving parts. â€œWhat would I need to do to change my car design to make it go faster, carry more cargo, or take on more difficult terrain?â€ Bowman asked. Folding@Home is asking similar questions of proteins.
The exascale project â€” Folding@Home performed a billion billion operations per second â€” is just getting started with a few proteins. Given that the human body is estimated to contain 80,000-400,000 proteins, there is still plenty of unexplored territory. â€œIt feels very much like being an explorer. Only weâ€™re studying intellectual space instead of new continents,â€ Bowman said.