Predicting Health Effects from Food Composition via Large-Scale Information Extraction

Description

BACKGROUND

Much work has been done to systematically store food composition information beyond the ~160 nutrients quantified in USDA food composition tables. FoodDB has approximately 800 foods and approximately 70,000 known and predicted compounds, of which approximately 15,000 have been observed in food. The literature curation required to populate FoodDB and other similar databases is a manual and error-prone effort. As a result, FoodDB and similar databases still suffer from the existence of “nutritional dark matter”, compounds that are reported in papers but are not quantified in any centralized database. While private and academic efforts have been made to predict health effects from food, there is no approach so far that integrates large-scale automated text mining with machine learning to augment predictive capability.

GOALS

Create a software package for the automated curation of nutrition data from published papers
Apply the software package to a corpus of hundreds of food and nutrition papers, and organize the resulting information in a Knowledge Base (KB)
Use the novel KB to augment published datasets linking food intake to health outcomes to create a predictor of health effects from chemical composition of food.

IMPACT

Create a healthier food system by understanding what food is and what it does to our health state
Reveal the food composition information hidden in published papers and integrating it to a KB
Provide a way to organize, interrogate and connect this data trove to other data resources that together can be used to link food and health states