Prediction of bacterial resistance phenotypes from whole-genome sequences using k-mers and a stability selection approach
星期四, 21 九月, 2017 - 14:00
Résumé :
Several recent studies demonstrated the feasibility of predicting bacterial antibiotic resistance phenotypes from whole-genome sequences, the prediction process usually amounting to detecting the presence of genes involved in antibiotic resistance mechanisms, or of specific mutations, previously identified from a training panel of strains, within these genes. In this work we address the problem from the supervised statistical learning perspective, not relying on prior information about such genetic resistance determinants. We rely for this purpose on a k-mer based strain genotyping scheme and the logistic regression model, thereby combining several k-mers into a probabilistic model. To identify a small yet predictive set of k-mers, we rely on the stability selection approach of Meinshausen and Bühlmann (2010), that consists in penalizing logistic regression models with a Lasso penalty, coupled with extensive resampling procedures. Using public datasets involved in previous studies, we applied the resulting classifiers to two bacterial species and achieved predictive performance equivalent to state of the art. The models are extremely sparse, involving 1 to 8 k-mers per antibiotic, hence are remarkably easy and fast to evaluate on new genomes (from raw reads or assemblies). This proof of concept therefore demonstrates that stability-based selection is a powerful approach to investigate bacterial genotypes/phenotypes relationships.
Institution de l'orateur :
BioMérieux / UGA
Thème de recherche :
Probabilités
Salle :
IMAG salle 106