News
Jul 10 '24
Mining for Molecular Function from a Universe of Knowledge: Building Terpene DB
#AIFS
#Education
Did you know that many antioxidant bioactives, anticancer therapeutics, essential oils, and food flavors all come from the same class of molecules?
In the world of natural molecules, terpenes are one of the largest, most studied, and widely commercially-utilized classes. Among other qualities, terpenes are known for their strong and often pleasant odors, which we encounter in essential oils and food flavorings.
Biomolecules such as terpenes provide an enormous opportunity to better our lives — from life-saving therapeutics, to foods, to energy. Each molecule has a unique property, and is produced through a unique set of proteins.
The key to unlocking this potential is identifying which protein possesses the functionality we need to generate the terpene of interest. The massive challenge today is to determine which protein sequences we want. The possibilities are beyond astronomical — literally, at 10 to the 400 power.
To get to the function, we need to connect a protein sequence to its structure, which then dictates function.
In the early 21st century, we learned how to readily obtain billions of protein sequences, but understanding how the sequence relates to structure and function remained a bottleneck. In 2021, Google's DeepMind achieved a major breakthrough by applying cutting-edge AI tools to accurately obtain structure in a rapid, cost-effective, and commoditized way.
With a 'Holy Grail' of biosciences in sight, a race to the finish line is on. In the last six months alone, we have seen about $2 billion of startup investment towards crossing the last line connecting protein sequence to function, unlocking biomolecular solutions to the world's biggest problems.
In academia, we don't have the money to compete with these large companies, so what is our role?
- To provide training to students who can work at these companies or start their own
- To provide public knowledge about technology gaps
- To generate public datasets to address knowledge gaps
Dr. Siegel is working together with grad student, Ian Anderson, to illuminate approaches using natural language processing and machine learning for the isolation and utilization of high-value terpenes, expanding their footprint in the therapeutics, pesticides, flavors, and materials markets.
In his talk, Dr. Siegel explains the steps they used to create Terpene DB and the surprises they encountered along the way. This model can be replicated with other classes of molecules to expand the library of publicly-available datasets in the area of biomolecular function.
FEATURED SPEAKER: Justin B. Siegel, PhD, Professor of Chemistry, Biochemistry & Molecular Medicine and Faculty Director of the Innovation Institute for Food and Health (IIFH) at the University of California, Davis.