<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Nandi, Sutanu</style></author><author><style face="normal" font="default" size="100%">Subramanian, Abhishek</style></author><author><style face="normal" font="default" size="100%">Sarkar, Ram Rup</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Integrative machine learning strategy for improved prediction of essential genes in Escherichia coli metabolism using flux-coupled features</style></title><secondary-title><style face="normal" font="default" size="100%">Molecular Biosystems</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2017</style></year><pub-dates><date><style  face="normal" font="default" size="100%">AUG</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">13</style></volume><pages><style face="normal" font="default" size="100%">1584-1596</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">Prediction of essential genes helps to identify a minimal set of genes that are absolutely required for the appropriate functioning and survival of a cell. The available machine learning techniques for essential gene prediction have inherent problems, like imbalanced provision of training datasets, biased choice of the best model for a given balanced dataset, choice of a complex machine learning algorithm, and data-based automated selection of biologically relevant features for classification. Here, we propose a simple support vector machine-based learning strategy for the prediction of essential genes in Escherichia coli K-12 MG1655 metabolism that integrates a non-conventional combination of an appropriate sample balanced training set, a unique organism-specific genotype, phenotype attributes that characterize essential genes, and optimal parameters of the learning algorithm to generate the best machine learning model (the model with the highest accuracy among all the models trained for different sample training sets). For the first time, we also introduce flux-coupled metabolic subnetwork-based features for enhancing the classification performance. Our strategy proves to be superior as compared to previous SVM-based strategies in obtaining a biologically relevant classification of genes with high sensitivity and specificity. This methodology was also trained with datasets of other recent supervised classification techniques for essential gene classification and tested using reported test datasets. The testing accuracy was always high as compared to the known techniques, proving that our method outperforms known methods. Observations from our study indicate that essential genes are conserved among homologous bacterial species, demonstrate high codon usage bias, GC content and gene expression, and predominantly possess a tendency to form physiological flux modules in metabolism.</style></abstract><issue><style face="normal" font="default" size="100%">8</style></issue><work-type><style face="normal" font="default" size="100%">Article</style></work-type><custom3><style face="normal" font="default" size="100%">Foreign</style></custom3><custom4><style face="normal" font="default" size="100%">2.829</style></custom4></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Bose, Samik</style></author><author><style face="normal" font="default" size="100%">Dhawan, Diksha</style></author><author><style face="normal" font="default" size="100%">Nandi, Sutanu</style></author><author><style face="normal" font="default" size="100%">Sarkar, Ram Rup</style></author><author><style face="normal" font="default" size="100%">Ghosh, Debashree</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Machine learning prediction of interaction energies in rigid water clusters</style></title><secondary-title><style face="normal" font="default" size="100%">Physical Chemistry Chemical Physics </style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2018</style></year><pub-dates><date><style  face="normal" font="default" size="100%">SEP</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">20</style></volume><pages><style face="normal" font="default" size="100%">22987-22996</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">Classical force fields form a computationally efficient avenue for calculating the energetics of large systems. However, due to the constraints of the underlying analytical form, it is sometimes not accurate enough. Quantum mechanical (QM) methods, although accurate, are computationally prohibitive for large systems. In order to circumvent the bottle-neck of interaction energy estimation of large systems, data driven approaches based on machine learning (ML) have been employed in recent years. In most of these studies, the method of choice is artificial neural networks (ANN). In this work, we have shown an alternative ML method, support vector regression (SVR), that provides comparable accuracy with better computational efficiency. We have further used many body expansion (MBE) along with SVR to predict interaction energies in water clusters (decamers). In the case of dimer and trimer interaction energies, the root mean square errors (RMSEs) of the SVR based scheme are 0.12 kcal mol(-1) and 0.34 kcal mol(-1), respectively. We show that the SVR and MBE based scheme has a RMSE of 2.78% in the estimation of decamer interaction energy against the parent QM method in a computationally efficient way.</style></abstract><issue><style face="normal" font="default" size="100%">35</style></issue><work-type><style face="normal" font="default" size="100%">Article</style></work-type><custom3><style face="normal" font="default" size="100%">Foreign</style></custom3><custom4><style face="normal" font="default" size="100%">3.906</style></custom4></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Nandi, Sutanu</style></author><author><style face="normal" font="default" size="100%">Ganguli, Piyali</style></author><author><style face="normal" font="default" size="100%">Sarkar, Ram Rup</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Essential gene prediction using limited gene essentiality information-an integrative semi-supervised machine learning strategy</style></title><secondary-title><style face="normal" font="default" size="100%">PloS One</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2020</style></year><pub-dates><date><style  face="normal" font="default" size="100%">NOV </style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">15</style></volume><pages><style face="normal" font="default" size="100%">e0242943</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Essential gene prediction helps to find minimal genes indispensable for the survival of any organism. Machine learning (ML) algorithms have been useful for the prediction of gene essentiality. However, currently available ML pipelines perform poorly for organisms with limited experimental data. The objective is the development of a new ML pipeline to help in the annotation of essential genes of less explored disease-causing organisms for which minimal experimental data is available. The proposed strategy combines unsupervised feature selection technique, dimension reduction using the Kamada-Kawai algorithm, and semi-supervised ML algorithm employing Laplacian Support Vector Machine (LapSVM) for prediction of essential and non-essential genes from genome-scale metabolic networks using very limited labeled dataset. A novel scoring technique, Semi-Supervised Model Selection Score, equivalent to area under the ROC curve (auROC), has been proposed for the selection of the best model when supervised performance metrics calculation is difficult due to lack of data. The unsupervised feature selection followed by dimension reduction helped to observe a distinct circular pattern in the clustering of essential and non-essential genes. LapSVM then created a curve that dissected this circle for the classification and prediction of essential genes with high accuracy (auROC &amp;gt; 0.85) even with 1% labeled data for model training. After successful validation of this ML pipeline on both Eukaryotes and Prokaryotes that show high accuracy even when the labeled dataset is very limited, this strategy is used for the prediction of essential genes of organisms with inadequate experimentally known data, such as Leishmania sp. Using a graph-based semi-supervised machine learning scheme, a novel integrative approach has been proposed for essential gene prediction that shows universality in application to both Prokaryotes and Eukaryotes with limited labeled data. The essential genes predicted using the pipeline provide an important lead for the prediction of gene essentiality and identification of novel therapeutic targets for antibiotic and vaccine development against disease-causing parasites.&lt;/p&gt;
</style></abstract><issue><style face="normal" font="default" size="100%">11</style></issue><work-type><style face="normal" font="default" size="100%">Article</style></work-type><custom3><style face="normal" font="default" size="100%">&lt;p&gt;Foreign&lt;/p&gt;
</style></custom3><custom4><style face="normal" font="default" size="100%">&lt;p&gt;2.740&lt;/p&gt;
</style></custom4></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Saurabh, Rochi</style></author><author><style face="normal" font="default" size="100%">Nandi, Sutanu</style></author><author><style face="normal" font="default" size="100%">Sinha, Noopur</style></author><author><style face="normal" font="default" size="100%">Shukla, Mudita</style></author><author><style face="normal" font="default" size="100%">Sarkar, Ram Rup</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Prediction of survival rate and effect of drugs on cancer patients with somatic mutations of genes: an AI-based approach</style></title><secondary-title><style face="normal" font="default" size="100%">Chemical Biology &amp; Drug Design</style></secondary-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">gene expression and copy number variation</style></keyword><keyword><style  face="normal" font="default" size="100%">gliomas</style></keyword><keyword><style  face="normal" font="default" size="100%">grade and survival prediction</style></keyword><keyword><style  face="normal" font="default" size="100%">machine learning strategy</style></keyword><keyword><style  face="normal" font="default" size="100%">significant gene prediction and effect of drugs</style></keyword><keyword><style  face="normal" font="default" size="100%">somatic mutation</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2020</style></year><pub-dates><date><style  face="normal" font="default" size="100%">SEP</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">96</style></volume><pages><style face="normal" font="default" size="100%">1005-1019</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;The causal role of somatic mutation and its interrelationship with gene expression profile during tumor development has already been observed, which plays a major role to decide the cancer grades and overall survival. Accurate and robust prediction of tumor grades and patients' overall survival are important for prognosis, risk factors identification and betterment of the treatment strategy, especially for highly lethal tumors, like gliomas. Here, with the help of more accurate and widely used machine learning-based approaches, we propose an integrative computational pipeline that incorporates somatic mutations and gene expression profile for survival and grade prediction of glioma patients and simultaneously relates it to the drugs to be administered. This study gives us a clear understanding that the same drug is not effective for the treatment of same grade of cancer if the gene mutations are different. The alteration in a specific gene plays a very important role in tumor progression and should also be considered for the selection of appropriate drugs. This proposed framework includes all the necessary factors required for enhancement of therapeutic designs and could be useful for clinicians in determining an accurate and personalized treatment strategy for individual patients suffering from different life threatening diseases.&lt;/p&gt;
</style></abstract><issue><style face="normal" font="default" size="100%">3</style></issue><work-type><style face="normal" font="default" size="100%">Article</style></work-type><custom3><style face="normal" font="default" size="100%">&lt;p&gt;Foreign&lt;/p&gt;
</style></custom3><custom4><style face="normal" font="default" size="100%">&lt;p&gt;2.548&lt;/p&gt;
</style></custom4></record></records></xml>