Comparison of machine learning methods for the classification of high-throughput gene expression data
The high-throughput technology allows us to look at patterns of gene expression for thousands of genes at a single assay and examine the effect of many genes on an organism. This led to development of prognostic and predictive tests based on the classification of this high-dimensional data using various machine learning methods (for instance: support vector machines, random forest algorithms, linear and nonlinear regression methods etc.). However, the machine learning methods applied for the classification of high-dimensional data are highly problem dependent, so that there is no standard methodology for the given problem.
The main aim of the project is to implement and compare various machine learning methods on high-throughput transcriptomics data (RNA seq, microarray). Further, we want to identify optimal genetic signatures between normal and mutated M. marinum (macrobacterium marinum which causes opportunistic infections in humans) based on the best performed machine learning method. The optimal genetic signatures will be then further used to reconstruct a gene regulatory network of M.marinum.
Master students who have the background in Bioinformatics, Informatics, Electrical Engineering, Physics or Applied Mathematics.
Programming skills for this project
R or Matlab