Determining Unknown Peptide Identification & Modification- Who is the best?

Proteome Informatics Research Group (iPRG) held a competition for peptide identification and modification among experts in the Proteomics and Bioinformatics field. The results for the  2012 competition were recently published in MCP.

First, 24 participants were given ~18000 spectra and protein sequence databases with 42K entries. Decoy sequences were also provided by scrambling amino acid sequences of true databases. The spectra were generated from tryptic digests of yeast lysate plus 70 spike controls with various modifications.

Each participant was encouraged to use whatever methods they like to identify as many CID spectra as possible at a <1% false discovery rate (FDR). For modified peptides, participants were required to report types of modifications and their localizations.  However, possible modifications were not named or identified as choices; participants are only told that there are a wide variety of modifications present in the samples, both biological and chemical in nature.

The summary of the results is shown below. First, there is a wide range of peptide spectrum matches (PSMs) among research groups. The highest PSM was >7000 and the lowest PSM was <2500. A total of 13 different search engines were used by different groups. Some groups used multiple search engines. 
Screen shot 2014-01-07 at 10.58.40 AM

Many participants used Mascot as search engine, but their performances varied widely. The reason for this is likely due to how each group handled variable modifications. There are several approaches to determine post-translational modifications (PTMs). If you use all possible modifications, the search spaces will be too big and you will likely end up with fewer identifications.

A multiple-path search is another way to determine PTMs, first, search with a few common modifications, and then search unmatched spectra with less common modifications. However, authors noted the multiple-path strategy gave participants a hard time in determining proper FDR and required more manual examination. Different search engines handle certain modifications more efficiently.

Unfortunately, there is no correct answer for identifications of yeast lysate tryptic digests. Therefore authors generated “consensus” PSMs from all PSMs which were identified by at least three groups. Blue bars indicate the number of consensus PSMs identified by each participant. It seems that it is difficult to assess who is the best performer without truly “correct” answers. It is likely that consensus PSMs contains more PSMs from search engines with similar algorithms. If you look at gray and yellow bars, the participants used less common search engines. However, it is interesting to see  that the highest number of PSMs can be obtained from a single search engine. This means that there are many factors that affect the final results, therefore optimization is most critical use of different programs.

Screen shot 2014-01-07 at 11.31.24 AM

I wish they had also added spike controls without modifications to see how each participant optimized searches. Nevertheless, it is quite interesting to see how each participant performed in identifying these control peptides. Participants 11211 and 58409 were top performers in total PSMs, but they didn’t do well in identifying spike controls. It seems that localization of modifications is still a difficult task.

In any case, even the best performer couldn’t identify ~1500 consensus PSMs (roughly 20% of the original spectra), and the authors note that there is  quite a bit of room to improve each group’s approach.

The original data including spectrum and database files can be downloaded here (use login and password provided in the text). Why don’t you try your own search and see if you can beat these expert participants?

About bioinfomagician

Bioinformatic Scientist @ UCLA

2 responses to “Determining Unknown Peptide Identification & Modification- Who is the best?”

  1. Brett Chapman says :

    Interesting post. I would have liked to have seen how they compared when searching for unmodified peptides. I’ve been paying close attention to MS-GFDB/MS-GF+. I was surprised to see it didn’t fair as well, based on what I’ve read, which includes Sangtae’s Thesis. Byonics done well I noticed with the most consensus matches, although I’m not surprised. A read up on their web site indicates they do rigorous screening of PTMs when searching with wildcards, unlike the strategy Inspect or MS-GFDB takes by just accounting for larger mass-shifts. They seem to be more suited for PTM identification. I think a lot of developers should take notes and consider strategies other developers take (without infringing on IP of course (where applicable)).

  2. Johnc673 says :

    This is one awesome blog post. Keep writing. aceddeeegecb

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: