Genetic programming based quantitative structure-retention relationships for the prediction of Kovats retention indices

TitleGenetic programming based quantitative structure-retention relationships for the prediction of Kovats retention indices
Publication TypeJournal Article
Year of Publication2015
AuthorsGoel, P, Bapat, S, Vyas, R, Tambe, A, Tambe, SS
JournalJournal of Chromatography A
Volume1420
Pagination98-109
Date PublishedNOV
ISSN0021-9673
KeywordsArtificial intelligence, Gas chromatography, genetic programming, Kovats retention index, Molecular descriptors, Quantitative structure-retention relationships
Abstract

The development of quantitative structure-retention relationships (QSRR) aims at constructing an appropriate linear/nonlinear model for the prediction of the retention behavior (such as Kovats retention index) of a solute on a chromatographic column. Commonly, multi-linear regression and artificial neural networks are used in the QSRR development in the gas chromatography (GC). In this study, an artificial intelligence based data-driven modeling formalism, namely genetic programming (GP), has been introduced for the development of quantitative structure based models predicting Kovats retention indices (KRI). The novelty of the GP formalism is that given an example dataset, it searches and optimizes both the form (structure) and the parameters of an appropriate linear/nonlinear data-fitting model. Thus, it is not necessary to pre-specify the form of the data-fitting model in the GP-based modeling. These models are also less complex, simple to understand, and easy to deploy. The effectiveness of GP in constructing QSRRs has been demonstrated by developing models predicting KRIs of light hydrocarbons (case study-I) and adamantane derivatives (case study-II). In each case study, two-, three- and four-descriptor models have been developed using the KRI data available in the literature. The results of these studies clearly indicate that the GP-based models possess an excellent KRI prediction accuracy and generalization capability. Specifically, the best performing four-descriptor models in both the case studies have yielded high (>0.9) values of the coefficient of determination (R-2) and low values of root mean squared error (RMSE) and mean absolute percent error (MAPE) for training, test and validation set data. The characteristic feature of this study is that it introduces a practical and an effective GP-based method for developing QSRRs in gas chromatography that can be gainfully utilized for developing other types of data-driven models in chromatography science. (C) 2015 Elsevier B.V. All rights reserved.

DOI10.1016/j.chroma.2015.09.086
Type of Journal (Indian or Foreign)

Foreign

Impact Factor (IF)3.926
Divison category: 
Chemical Engineering & Process Development