A novel framework for predicting in vivo toxicities from in vitro data using optimal methods for dense and sparse matrix reordering and logistic regression.

Authors: DiMaggio PA, Subramani A, Judson RS, Floudas CA.
Publisher/Year: Toxicological Sciences. 2010;118(1):251–265.
Pub Med ID/Journal ID: PMID: 20702588


In this work, we combine the strengths of mixed-integer linear optimization (MILP) and logistic regression for predicting the in vivo toxicity of chemicals using only their measured in vitro assay data. The proposed approach utilizes a biclustering method based on iterative optimal reordering (DiMaggio, P. A., McAllister, S. R., Floudas, C. A., Feng, X. J., Rabinowitz, J. D., and Rabitz, H. A. (2008). Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies. BMC Bioinformatics 9, 458–474.; DiMaggio, P. A., McAllister, S. R., Floudas, C. A., Feng, X. J., Rabinowitz, J. D., and Rabitz, H. A. (2010b). A network flow model for biclustering via optimal re-ordering of data matrices. J. Global. Optim. 47, 343–354.) to identify biclusters corresponding to subsets of chemicals that have similar responses over distinct subsets of the in vitro assays. The biclustering of the in vitro assays is shown to result in significant clustering based on assay target (e.g., cytochrome P450 [CYP] and nuclear receptors) and type (e.g., downregulated BioMAP and biochemical high-throughput screening protein kinase activity assays). An optimal method based on mixed-integer linear optimization for reordering sparse data matrices (DiMaggio, P. A., McAllister, S. R., Floudas, C. A., Feng, X. J., Li, G. Y., Rabinowitz, J. D., and Rabitz, H. A. (2010a). Enhancing molecular discovery using descriptor-free rearrangement clustering techniques for sparse data sets. AIChE J. 56, 405–418.; McAllister, S. R., DiMaggio, P. A., and Floudas, C. A. (2009). Mathematical modeling and efficient optimization methods for the distance-dependent rearrangement clustering problem. J. Global. Optim. 45, 111–129) is then applied to the in vivo data set (21.7% sparse) in order to cluster end points that have similar lowest effect level (LEL) values, where it is observed that the end points are effectively clustered according to (1) animal species (i.e., the chronic mouse and chronic rat end points were clearly separated) and (2) similar physiological attributes (i.e., liver- and reproductive-related end points were found to separately cluster together). As the liver and reproductive end points exhibited the largest degree of correlation, we further analyzed them using regularized logistic regression in a rank-and-drop framework to identify which subset of in vitro features could be utilized for in vivo toxicity prediction. It was observed that the in vivo end points that had similar LEL responses over the 309 chemicals (as determined by the sparse clustering results) also shared a significant subset of selected in vitrodescriptors. Comparing the significant descriptors between the two different categories of end points revealed a specificity of the CYP assays for the liver end points and preferential selection of the estrogen/androgen nuclear receptors by the reproductive end points.