A MapReduce Implementation of a Hybrid Spectral Library- Database Search Method for Peptide Identification
Kalyanaraman, A., Latt, B., Baxter, D. J. and Cannon, W.R.. Bioinformatics (2011) 27 (21): 3072-3073. doi: 10.1093/bioinformatics/btr523
A MapReduce based implementation called MR-MSPolygraph for parallelizing peptide identification from mass spectrometry data is presented. The underlying serial method, MSPoly-graph, uses a novel hybrid approach to match an experimental spectrum against a combination of a protein sequence database and a spectral library. Our MapReduce implementation can run on any Hadoop cluster environment. Experimental results demonstrate that, relative to the serial version, MR-MSPolygraph reduces the time to solution from weeks to hours, for processing tens of thousands of experimental spectra. Speedup and other related performance studies are also reported on a 400-core Hadoop cluster using spectral data sets from environmental microbial communities as inputs.