This article was contributed by Â Luis Voloch, cofounder and chief technology officer at Immunai
Digital biology is in the same stage (early, exciting, and transformative) of development as the internet was back in the 90s. At the time, the concept of IP addresses was new, and being â€œtech-savvyâ€ meant you knew how to use the internet. Fast-forward three decades, and today we enjoy industrialized communication on the internet without having to know anything about how it works. The internet has a mature infrastructure that the entire world benefits from.
We need to bring similar industrialization to biology. Fully tapping into its potential will help us fight devastating diseases like cancer. A16z has rephrased its famous motto of â€œSoftware is eating the worldâ€ to â€œBiology is eating the world.â€ Biology is not just a science; itâ€™s also becoming an engineering discipline. We are getting closer to being able to â€˜program biologyâ€™ for diagnostic and treatment purposes.
Integrating advanced technology like machine learning into fields such as drug discovery will make it possible to accelerate the process of digitized biology. However, to get there, there are large challenges to overcome.
Digitized biology: Swimming in oceans of data
Not so long after gigabytes of biological data was considered a lot, we expect the biological data generated over the coming years to be counted in exabytes. Working with data at these scales is a massive challenge. To face this challenge, the industry has to develop and adopt modern data management and processing practices.
The biotech industry does not yet have a mature culture of data management. Results of experiments are gathered and stored in different locations, in a variety of messy formats. This is a significant obstacle to preparing the data for machine learning training and doing analyses quickly. It can take months to prepare digitized data and biological datasets for analysis.
Advancing biological data management practices will also require standards for describing digitized biology and biological data, similar to our standards for communication protocols.
Indexing datasets in central data stores and following data management practices that have become mainstream in the software industry will make it much easier to prepare and use datasets at the scale we collectively need. For this to happen, biopharma companies will need C-suite support and widespread cultural and operational changes.
Welcome to the world of simulation
It can cost millions of dollars to run a single biological experiment. Costs of this magnitude make it prohibitive to run experiments at the scale we would need, for example, to bring true personalization to healthcare â€” from drug discovery to treatment planning. The only way to address this challenge is to use simulation (in-silico experiments) to augment biological experiments. This means that we need to integrate machine learning (ML) workflows into biological research as a top priority.
With the artificial intelligence industry booming and with the development of computer chips designed specifically for machine learning workloads, we will soon be able to run millions of in-silico experiments in a matter of days for the same cost that a single live experiment takes to run over a period of months.
Of course, simulated experiments suffer from a lack of fidelity relative to biological experiments. One way to overcome this is to run the in-silico experiments in vitro or in vivo to get the most interesting results. Integrating in-silico data from vitro/vivo experiments leads to a feedback loop where results of in vitro/vivo experiments become training data for future predictions, leading to increased accuracies and reduced experimental costs in the long run. Several academic groups and companies are already using such approaches and have reduced costs by 50 times.
This approach of using machine learning models to select experiments and to consistently feed experimental data to ML training should become an industry standard.
Masters of the universe
As Steve Jobs once famously said, â€œThe people who are crazy enough to think they can change the world are the ones who do.â€
The last two decades have brought epic technological advancements in genome sequencing, software development, and machine learning. All these advancements are immediately applicable to the field of biology. All of us have the chance to participate and to create products that can significantly improve conditions for humanity as a whole.
Biology needs software engineers, more infrastructure engineers, and more machine learning engineers. Without their help, it will take decades to digitize biology. The main challenge is that biology as a domain is so complex that it intimidates people. In this sense, biology reminds me of computer science in the late 80s, where developers needed to know electrical engineering in order to develop software.
For anyone in the software industry, perhaps I can suggest a different way of viewing this complexity: Think of the complexity of biology as an opportunity rather than an insurmountable challenge. Computing and software have become powerful enough to switch us into an entire new gear of biological understanding. You are the first generation of programmers to have this opportunity. Grab it with both arms.
Bring your skills, your intelligence, and your expertise to biology. Help biologists to scale the capacity of technologies like CRISPR, single-cell genomics, immunology, and cell engineering. Help discover new treatments for cancer, Alzheimerâ€™s, and so many other conditions against which we have been powerless for millennia. Until now.