Archive | September 2013

Command Line Interface of Globus Online

This week I am exploring command line mode of Globus online. Web-based Globus online is very easy to use and there are enough features for normal use. However, there are some limitations, for example you cannot change file names when you transfer. Command line mode enables more fine tuning of transfer and allows to modify details. Another example is you can specify name for each transfer, so you can keep track of each task more easily.

I am going to do this using Linux terminal but If you are window environment, one way to do it in command line is to install Cygwin. This software provides Unix/Linux like environment, so that you can run similar commands to Unix/Linux. Those who are already using Linux environment, it is not necessary to do anything. You can go to linux terminal and start there.

After installing Cygwin on windows machine, you go to Cygwin directory (usually C:\cygwin) and edit  Cygwin.bat file.

@echo off

C:
chdir C:\cygwin\bin
set CYGWIN=binmode ntsec
bash --login -i

After editing, save and double click Cygwin.bat file to run the program.  It will open a command line terminal you see below. Then type

cygrunsrv -h

If Cygwin is successfully installed, you will see options for cygrunsrv command.

ssh_test

CONNECTING USING SSH

OK, from here I will be doing everything in Linux terminal.  A lot of details are provided here (intro) , here (getting started) and here (beyond basics). So please refer these sites if you need more information. I am also assuming you already have user ID and several endopoints activated for Globus online. For the first time, you need to generate SSH Keys.

>ssh-keygen -t rsa -b 2048

It will generate a key in the file name called id_rsa.
It will also ask you to enter passphrase. Please remember what you type in.

Generating public/private rsa key pair.
Enter file in which to save the key 
(/home/user_name/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):

Open id_rsa.pub file and copy the entire content. Then go to Manage Identifies.Click “Add SSH Public Key box”. Enter alias (name) and paste the key in SSH Public Key. Click “Add SSH Key”.
public_key

Now, go back to Linux terminal, then try connecting your Linux to globus online. The format for connecting to globus online is

ssh globus_username@cli.globusonline.org

You may see an error message

Permission denied (publickey,gssapi-keyex,gssapi-with-mic
)."

This error is fixed by changing permission of this file to “read only”.

 chmod 400 /path/id_rsa.pub

Try ssh command again, this time hopefully you will see well come message after ask you enter passphrase that you specified above.

Welcome to globusonline.org, user_name. 
Type 'help' for help'
$

Now your machine is connected to globus online. Note that you see a ‘$’ on command line prompt.

TRANSFERRING FILES

Let’s try transferring some file using command line. The basic format for transfer is

transfer -- user_name#endpoint1/path/to/source/file
 user_name#endpoint2/path/to/destination/dir

Once the command is executed you will see a message.

Task ID: 26116978-2703-11e3-99f8-12313d2005b7
Created transfer task with 1 file(s)

There are a number of options you can use. To see all option, go here or type transfer -help.

If you want to change file name after transfer, you can put the file name in the destination directory.

RENEWING CREDENTIALS

You cannot transfer files if your credentials for the endopoint is expired. You need to renew credential if it is the case, type

endpoint-activate

This will prompt to enter username and password for each endpoint you have an access to.

If you want to activate specific endopoint, type

endopoint -m myproxy_server

It will prompt to ask you to enter username and password for this proxy server.

OTHER COMMANDS

There are a number of commands you can use in command line. mkdir can create a new directory. rename command can change the name of file or directory. ls is to show the content of remote server. Please refer here for more details.

If you want to quit command line mode of globus online, simply type,

quit

Maximizing IDs by Combining Multiple Search Engines

I have posted a few times regarding multiple search engines to increase PSMs (peptide spectrum matches). There are quite a few search engines out there but using all of them seem to be unreasonable. In this post I am going to discuss about how to maximize IDs using multiple search programs.  First, the search programs for mass spec can be categorized into 3 kinds.

1) Search against protein sequence database
2) Search against spectra libraries
3) denovo sequencing

Well known sequence search engines such as SEQUEST, X!tandem, MSGF+ and Myrimatch are based on searching against theoretical fragmentation of peptide generated from sequence database, so they are belong to 1).
Sample prep for mass spec is time consuming and costly, while computation is getting faster and faster. If you have access to cloud computer or cluster, you will get your searches done quickly.

HOW MANY SEARCH PROGRAMS SHOULD WE USE?

The short answer is….. depends on your computational power. If you have unlimited computer resources, you can use as many as search engines you want. According to David Shteynberg’s paper (MCP 12.9, 2013, 2383-2393), the more engines you add, the more IDs you get even at strict false discovery rate. They tested combination of multiple engines including SEQUEST, Inspect, X!tandem, MASCOT, Myrimatch and OMSSA. Many people are familiar with these search  programs. Their results show that you get maximum IDs when you search with all these programs. Since you may not have such computational resources to perform 6 searches per mass spec sample, you may want to know individual performance. If you use only one program, the best performer to worst performer is

1) SEQUEST
2) Myrimatch
3) X!tandem
4) OMSSA
5) Inspect

I believe the results vary with different samples and parameters (e.g. modification), therefore one should be cautious about which ones should be chosen. For example, you can specify precursor ion tolerance asymmetrically for X!tandem (e.g -0.5 and +2.0m/z) and it will give better results than symmetric error tolerance. Some programs don’t allow such an option (e.g. myrimatch).  Nevertheless the performance above is somewhat similar to what I experienced too. I routinely use MSGF+ and MSGF+ usually perform better than most of programs with similar FDR. That’s why currently my default search is with MSGF+,  Myrimatch and X!tandem. Anyway, If you want to use two programs from the list, 1) + 2) works the best as expected. For three programs, 1) + 2) + 3), 1) + 2) +4), 2) + 3) + 4) and 1)+2) +5) perform similarly.
multiple_search_engine
Shteynberg et al., (MCP, 2013)

It is interesting to note that two programs SEQUEST and X!tandem perform well by itself, combining them didn’t do so well.  In fact, InSpect is the worst performer by itself, but if you combine InSpect with SEQUEST, they perform pretty decently. The authors mentioned that if two algorithms with similar algorithms such as SEQUEST and X!tandem are used, they don’t necessary performs better than using two programs with more different algorthms.

SPECTRAL LIBRARY SEARCH PROGRAM SHOULD BE INCLUDED IF YOU CAN

Spectral library search is very different from database search program in terms of algorithm and very sensitive because it actually compares to real spectrum obtained by mass spectrometry. Database search programs create peptide sequences based on enzyme specificity (normally Trypsin) and generate artificial spectrum (-y and -b ions).  If precursor ion m/z is within the error tolerance, and the artificially generated spectrum match to your ms/ms spectrum, you get IDs. Fragmentation pattern may look quite different in a real life and if it is the case, you don’t get IDs.  Unfortunately, fragmentation depends on the type of instrument (ion trap/collision cell) and fragmentation method (CID, PQD, ETD, HCD). If you have phosphopeptide enriched samples, it may not work well unless it contains such spectrum. If you go to National Institute of Standard Technology (NIST) website, there are MS/MS spectral libraries for certain instrument and species.

NIST_spectrum_library

The list is pretty short at this moment-, but I believe it will grow more in the future. There is another website that contains spectral libraries such as Peptide Atlas and X!Hunters.

In the Shteynberg’s paper, they compared SpectraST, a spectral library search program with 6 search engines combined. Surprisingly, SpectraST search (with Human  spectral library) gives quite a few more IDs than 6 programs combined (15% more). In the end, if they combine SpectraST and 6 search programs combined, they got even more IDs (25% more than 6 search programs combined) .

Bottom of the line

One can increase the number of IDs with high confidence by combining multiple search engines. The number of programs used will be dependent on the computational resources he/she has. If one uses an instrument and species matches to the one in the spectral library, he/she should consider spectra library search as it will likely increase the number of IDs.

Transferring Files to Your Own Server with Globus Online

I have been benefited from Globus online a lot as I have many files to search for mass spec everyday on computer cluster in my institution. In this post, I want to explore how to set up own server to send files back and forth from your desktop PC. This will be useful in general sending relatively large files from one place to the other.

First I am assuming you have a server computer you have full access to. In my case, I have a server at home running ubuntu 10.04.02.  You need to discover the right distribution of globus-connect-multiuser program. You can see the list from here. The instruction for installation of globus connect multiuser is written here and please use it as guidance.

In my case, I couldn’t see the one for ubuntu, so I asked Globus team. They told me I should use “globus-repository-5.2-stable-lucid_0.0.3_all.deb”. Here is the steps to configure server for globus online multiuser.

1) Download package
>sudo curl -LOs http://www.globus.org/ftppub/gt5/5.2/stable/installers/repoglobus-repository-5.2-stable-lucid_0.0.3_all.deb

2) Intall Debian-based distribution>sudo dpkg -i globus-repository-5.2-stable-lucid_0.0.3_all.deb

3) Get update
>sudo aptitude update

4) Install globus-connct-multiuser> sudo aptitude-y install globus-connect-multiuser

5) Update configuration file. This file is present in /etc/ directory. To modify you need to have permission
>sudo vim /etc/globus-connect-multiuser.conf

configuration_globus_multiuser

There are quite few things you need to change in order to get it work.  What I am going to show here is a minimum setting. For more detailed setting, please consult Globus online customer service.
First, you need to change following lines. Note: you need to remove % and s and semi colon (;) for the lines you need to configure.

L11  User = user_name_you_use_to_log_in_globus_online
L16  Password = your_password_for_globus_online
L22  Endpoint = same_as_User
L29  Name = server (whatever you want to call your server)
L103 Server = XXX.XXX.XXX.XXX  (the server’s IP address)
L112 ServerBehindNAT = True
L193 server = XXX.XXX.XXX.XXX  (the server’s IP address)

6) Run the installed program. This will take a few moments to be in effect
>sudo globus-connect-multiuser-setup

7) Check if essential ports are open (LISTEN). Type sudo lsof -i

Screen Shot 2013-09-06 at 10.50.28 PM
Pay attention to the far right column. These are the status of ports currently used in your server. You can see port 7512 is open (LISTEN) for Myproxy, and gsiftp is also open (LISTEN). If you want to know the port number for gsiftp, you can look up in the configuration file.

>vim /etc/services
Screen Shot 2013-09-06 at 10.48.19 PM
T
his shows only the part of the file, but you can see port 2811 is used for gsiftp. Now ports are open for globus connect multiuser. But you need to make sure the ports are accessible (open) from remote computer. This site is easy to test whether certain ports on your server is actually open or not. You can simply type the IP address and port number (7512 and 2811). If it says ports are closed, you should check if portforwarding is correctly set on your router.

7) Go to globus online website. Log-in and go to Manage Data, and click manage endopoints. Here you are going to add your server.

add_endpoint

Enter Endopoint Name : username#server
Choose Myproxy for Idneify Providers. Hostname should be the same IP addresses used above.
Leave Server DN empty.
Server Domain should be the same IP address used above.
Keep the default server port: 2811
Hit the Create Endpoint button. Then click the  activate tag and hit activate now button.  Now it will ask you to enter User name, Passphrase, Server DN and Credential Lifetime. Enter username and passphrase used to log-in your linux server. You can leave the Server DN empty and put some numbers (e.g. 24) for crediential lifetime. Then you will see an error message. Copy the text after MYPROXY_SERVER_DN=, go to Server and paste into the Server DN (no double quotations).  Hit Save.
Screen Shot 2013-09-06 at 11.03.32 PM
Try entering the linux user ID and passphrase, then activate again. This time it should be activated.
Screen Shot 2013-09-06 at 11.06.41 PM
Now your server is activated and ready to transfer files. Go to Manage Data and click start transfer. Then enter your Endopoint for your server and click Go.  Now you need to enter again your user ID and passphrase for linux server and credential time.

Screen Shot 2013-09-06 at 11.15.07 PM

Once everything is successfully configured, you should see directory in the window. Now you can start transfer your files. Essentially this is to set up FTP server but you can transfer files with much faster speed.

Initially I had a problem transferring files. I saw directory structures on both sides, but when I initiated transfer, the transferred files had no contents. I could create and delete files & directories, but transferring files were unsuccessful. If you encounter a similar problem think about these possibility.

1) Linux firewall and/or your router firewall is blocking
2) Port forwarding is not set up correctly on your router

If firewall is blocking certain port, it may cause trouble sending files. Remember, globus-connect-multiuser uses port 50000-51000 by default. In my case, 2) was the problem. My router has port forwarding setting, but it separates specific port forwarding and port range forwarding. Once I fixed it, it works flawlessly.

If activation of server, connection to the server and port setting are done correctly, globus connect allows transferring files between your PC and server. If you look at port usage, you will see a new connection is established.

after_initiating_transfer

Batch XML Conversion of X!Tanem Output Files (Windows)

This post is going to be a short one . When you run X!tandem search on linux, you get output files with .t.xml extension. This file can be open on gpm-web site to see the models.  It can be also opened in PeptideShaker. But if you want to use IDPicker to look at the results, you need to convert the file to xml file that is compatible with the software. The utility software for conversion is called Tandem2XML.exe, and it is one of tools used in TPP.  If you have already installed TPP on your PC, you can find the program in the following directory.

C:\Inetpub\tpp-bin\

You can also download it from here (may not be fast). To use this in the command line, you can type

>Tandem2XML.exe [FILE_PATH\INPUT_FILE_NAME.t.xml] [FILE_PATH\OUTPUT_FILE_NAME.xml]

Once the file is converted to xml file (pepXML), you can import it to IDPicker. This utility is simply to use, but typing file path and name for multiple files is pretty cumbersome. So I wrote a script which will find all .t.xml files in a specified directory and automatically convert to .xml.

>ECHO OFF
>set msfd=”C:\FILE_PATH_FOR_THE_TANDEM_RESULTS_DIRECTORY”
>:: read file names from the directory and create a new file with all file names
>echo %msfd%
>dir -F %msfd%\*.t.xml /a:-d /b>file_name.txt
>for /f “tokens=*” %%l in (file_name.txt) do Tandem2XML.exe %msfd%\%%l %msfd%\%%l.xml

You can save this script as AutoTandem2XML.bat and place the Tandem2XML.exe in the same directory as this program.

To use, just change the second line for the directory containing .t.xml files.  When you execute either by double clicking the program or running in the command line, first it creates a file file_name.txt which contains the file names ending with .t.xml.

output_tandem2xml

Then it will convert the .t.xml to IDPicker compatible xml file in the same directory.  You may see some error messages like  output files will not contain retention time. But the output files still works in IDPicker.

error_tandem2xml

%d bloggers like this: