Maximizing IDs by Combining Multiple Search Engines
I have posted a few times regarding multiple search engines to increase PSMs (peptide spectrum matches). There are quite a few search engines out there but using all of them seem to be unreasonable. In this post I am going to discuss about how to maximize IDs using multiple search programs. First, the search programs for mass spec can be categorized into 3 kinds.
1) Search against protein sequence database
2) Search against spectra libraries
3) denovo sequencing
Well known sequence search engines such as SEQUEST, X!tandem, MSGF+ and Myrimatch are based on searching against theoretical fragmentation of peptide generated from sequence database, so they are belong to 1).
Sample prep for mass spec is time consuming and costly, while computation is getting faster and faster. If you have access to cloud computer or cluster, you will get your searches done quickly.
HOW MANY SEARCH PROGRAMS SHOULD WE USE?
The short answer is….. depends on your computational power. If you have unlimited computer resources, you can use as many as search engines you want. According to David Shteynberg’s paper (MCP 12.9, 2013, 2383-2393), the more engines you add, the more IDs you get even at strict false discovery rate. They tested combination of multiple engines including SEQUEST, Inspect, X!tandem, MASCOT, Myrimatch and OMSSA. Many people are familiar with these search programs. Their results show that you get maximum IDs when you search with all these programs. Since you may not have such computational resources to perform 6 searches per mass spec sample, you may want to know individual performance. If you use only one program, the best performer to worst performer is
I believe the results vary with different samples and parameters (e.g. modification), therefore one should be cautious about which ones should be chosen. For example, you can specify precursor ion tolerance asymmetrically for X!tandem (e.g -0.5 and +2.0m/z) and it will give better results than symmetric error tolerance. Some programs don’t allow such an option (e.g. myrimatch). Nevertheless the performance above is somewhat similar to what I experienced too. I routinely use MSGF+ and MSGF+ usually perform better than most of programs with similar FDR. That’s why currently my default search is with MSGF+, Myrimatch and X!tandem. Anyway, If you want to use two programs from the list, 1) + 2) works the best as expected. For three programs, 1) + 2) + 3), 1) + 2) +4), 2) + 3) + 4) and 1)+2) +5) perform similarly.
Shteynberg et al., (MCP, 2013)
It is interesting to note that two programs SEQUEST and X!tandem perform well by itself, combining them didn’t do so well. In fact, InSpect is the worst performer by itself, but if you combine InSpect with SEQUEST, they perform pretty decently. The authors mentioned that if two algorithms with similar algorithms such as SEQUEST and X!tandem are used, they don’t necessary performs better than using two programs with more different algorthms.
SPECTRAL LIBRARY SEARCH PROGRAM SHOULD BE INCLUDED IF YOU CAN
Spectral library search is very different from database search program in terms of algorithm and very sensitive because it actually compares to real spectrum obtained by mass spectrometry. Database search programs create peptide sequences based on enzyme specificity (normally Trypsin) and generate artificial spectrum (-y and -b ions). If precursor ion m/z is within the error tolerance, and the artificially generated spectrum match to your ms/ms spectrum, you get IDs. Fragmentation pattern may look quite different in a real life and if it is the case, you don’t get IDs. Unfortunately, fragmentation depends on the type of instrument (ion trap/collision cell) and fragmentation method (CID, PQD, ETD, HCD). If you have phosphopeptide enriched samples, it may not work well unless it contains such spectrum. If you go to National Institute of Standard Technology (NIST) website, there are MS/MS spectral libraries for certain instrument and species.
In the Shteynberg’s paper, they compared SpectraST, a spectral library search program with 6 search engines combined. Surprisingly, SpectraST search (with Human spectral library) gives quite a few more IDs than 6 programs combined (15% more). In the end, if they combine SpectraST and 6 search programs combined, they got even more IDs (25% more than 6 search programs combined) .
Bottom of the line
One can increase the number of IDs with high confidence by combining multiple search engines. The number of programs used will be dependent on the computational resources he/she has. If one uses an instrument and species matches to the one in the spectral library, he/she should consider spectra library search as it will likely increase the number of IDs.