One of the goals of CEINT is to develop skill in mapping nanoparticle properties to environmental risk. This ability would simplify nanoparticle risk assessment by allowing groups of particles with similar properties to be treated as a class, instead of studying every variant individually. Furthermore, identifying information-rich particle attributes contributes to the discussion of the minimum set of attributes to be measured and reported in nanotoxicology studies. We have developed methods for identifying nanoparticle attributes with high information content for predicting nanotoxicology endpoints using a group of machine learning techniques including regression trees and random forests. An analysis of CNT inhalational toxicity data has found that, while contaminants play a role in CNT toxicity (especially cobalt, iron, and oxidized carbon) the contribution of contaminants to toxicity is smaller than the contributions of variables describing particle and aggregate geometry. Of the descriptors of aggregation, aggregate diameter is the most important variable (reported both as median diameter based on particle count and mode diameter based on the mass distribution). Shape variables for unaggregated CNT fibers, both length and diameter, influence toxicity. The shortest CNT lengths of a length distribution and larger diameters (a surrogate variable for MWCNT) are the most information-rich variables. Of variables with little information content, the N2-BET measurement of CNT surface area dose does not seem to be relevant to the biological interaction with aggregated CNTs, nor do the longest lengths of unaggregated CNT fibers in a distribution, or the contaminants, Cu and Cr.
In collaboration with researchers at the EPA National Center for Computation Toxicology (ORD), we are analyzing an EPA data base of inhalational toxicity studies of nano-TiO2. The study will compare the random forest models to linear regression and principal components models of toxicity and employs a variety of sensitivity tests to identify the most (and least) consistently important factors explaining variation in experimental results. (A. Wang, J. Gernand, Q. Meng, K. Houck, W. Setzer and E. Casman, “Factors explaining differences in pulmonary toxicity of nano-titanium dioxide” in preparation).
Thrust C has also been refining a method for prioritizing research allocations among sets of linked research questions. The concept is that knowledge in one area of a field influences the progress in other areas and the sequential allocation of research resources can be chosen to optimally increase the knowledge of the entire system. A method like this might be used, for example, to determine which parts of a proposal to study first when a grant has been funded at less than the full amount or in other scenarios of resource limitation. The method uses text analysis to develop a research map of hypothetical and causal statements about a field of study. Experts are interviewed to obtain quantitative estimates of the current level of understanding of each hypothesis (link) in the map. The map is then analyzed to find the links that, if incremented by a unit of knowledge, most increase the overall knowledge of the system. The links are incremented and the process repeated until the entire system is completely “understood.” This yields a temporal sequence of research investments yielding the theoretical optimal strategy for increasing knowledge.