Anyone who has tried to match an unfamiliar bird’s features to its field guide portrait knows that reality rarely provides a perfect comparison to the ideal specimen.
Scientists have faced a similar problem when attempting to decode protein patterns found in living cells – a field known as proteomics. Using mass spectrometry, the technology of choice for protein identification, scientists try to match protein fragments, or peptides, against idealized patterns in peptide databases. These databases often provide a poor
But using bioinformatics techniques, researchers at Pacific Northwest National Laboratory (PNNL) have developed a pattern-matching algorithm that improves the accuracy of peptide identification by between 50 and 150 percent, compared with standard approaches.correspondence – the industry standard for positive peptide identification is usually a dismal 15 to 20 percent.
– See more at: http://ascr-discovery.science.doe.gov/kernels/peptide1.shtml#sthash.c5s44M2w.dpuf
Cannon, W.R., Rawlins, M. M., Baxter, D., J., Lipton, M., Callister, S., and Bryant, D. A., J. Proteome Res., 2011, 10 (5), pp 2306–2317, DOI: 10.1021/pr101130b
We report a hybrid search method combining database and spectral library searches that allows for a straightforward approach to characterizing the error rates from the combined data. Using these methods, we demonstrate significantly increased sensitivity and specificity in matching peptides to tandem mass spectra. The hybrid search method increased the number of spectra that can be assigned to a peptide in a global proteomics study by 57−147% at an estimated false discovery rate of 5%, with clear room for even greater improvements. The approach combines the general utility of using consensus model spectra typical of database search methods with the accuracy of the intensity information contained in spectral libraries. A common scoring metric based on recent developments linking data analysis and statistical thermodynamics is used, which allows the use of a conservative estimate of error rates for the combined data. We applied this approach to proteomics analysis of Synechococcus sp. PCC 7002, a cyanobacterium that is a model organism for studies of photosynthetic carbon fixation and biofuels development. The increased specificity and sensitivity of this approach allowed us to identify many more peptides involved in the processes important for photoautotrophic growth.