AI-Powered Data-Mining Study Illuminates Clinical Complexity of Tooth Decay
Study identifies new subtypes and patterns for tooth decay, which may require different types of screening and risk prediction approaches
Philadelphia — An artificial intelligence-powered “data cleaning-and-analysis” process fixed data quality problems in a large health dataset and evaluated the data to identify previously unrecognized clinical subtypes and illuminate risk factors for tooth-decay (caries), in a study from researchers at Penn Dental Medicine.
Caries is considered the most common chronic disease afflicting humans. Although it has long been associated with sugary diets and poor dental hygiene, the landscape of associated factors and patient characteristics is complex and far from being fully understood.
In the new study, published in the Journal of Dental Research, the researchers developed a process based on machine learning, a type of artificial intelligence, to organize and analyze dental and other data from the National Health and Nutrition Examination Survey (NHANES). Their demonstration analysis revealed patterns of data suggesting new clinical subtypes of caries and illuminating links to factors such as lead exposure.
“This kind of machine-learning pipeline can turn complex national health data into clearer hypotheses and better predictive models—starting with oral health, and potentially extending to other areas of medicine,” said study co-senior author Hyun (Michel) Koo, DDS, MS, PhD, a professor in the Department of Orthodontics and Divisions of Community Oral Health and Pediatrics at Penn Dental Medicine and Co-Founding Director of the Center for Innovation & Precision Dentistry (CiPD), a joint center between Penn Dental Medicine and Penn Engineering.
The study was a collaboration that in addition to Koo included a postdoctoral trainee and a DMD student within the CiPD training program funded by the National Institute of Dental and Craniofacial Research, the Penn Dental Medicine Department of Community Oral Health, Penn Nursing, the Penn Institute for Biomedical Informatics, and Cedars-Sinai Medical Center.
NHANES surveys, overseen by the Centers for Disease Control and Prevention and conducted in two-year cycles since 1999, are rich in information relating to Americans’ health and various determinants of health. These datasets are somewhat messy, however. There are often missing data and other non-uniform aspects within a given survey, plus changes in data collection from one survey to the next. This creates a substantial “pre-processing” challenge for researchers who hope to apply sophisticated computational methods to find patterns in the data.
The researchers developed their machine-learning-based process to organize and then analyze relevant 2017-18 NHANES dental and other data. When the analysis examined caries cases by age, for example, it found that the strongest signs of cavities showed up at two life stages: very young children and older adults.
As expected, sugar mattered, but the analysis added granular detail by identifying “socially recognizable” clusters of certain sugar-laden products associated with caries, including apple juice, energy drinks, flavored milk, and ice cream.
The study also examined lead-related patterns. While people with caries showed higher blood levels of lead in NHANES—supporting findings from past research— the analysis also linked these cases to higher levels of the heavy metal cadmium and the nicotine metabolite cotinine. Together, these patterns suggest that elevated blood lead levels may be an indication of broader high-risk environmental conditions for caries, rather than evidence of a specific causal role for lead.
In addition, sleep habits (number of hours slept) surfaced as a factor that may interact with exposures and caries susceptibility—an unexpected finding that the authors say warrants further study.
Overall, the findings underscore the complexity of caries and the need for more precise, multidimensional strategies tailored to different groups.
“One-size-fits-all won’t close the cavity gap,” Koo said. “Our results point to the importance of age-targeted prevention and prediction—especially for young children and older adults—guided by real-world diet patterns, lab signals, environmental risk context, and potentially other signals such as sleep.”
The authors note the current analysis is limited to NHANES 2017–2018 and say future multi-year analyses will be needed to assess trends over time.
Other co-authors are A. Orlenko, J.D. Mure, J.I. Gluch, J. Gregg, C.W. Compher, Z. Ren, H. Koo, and J.H. Moore.
The work was supported in part by the National Library of Medicine (LM010098) and by the National Institute of Dental and Craniofacial Research (R90DE031532).