Batch X!tandem Search on Linux

Screen Shot 2013-07-30 at 6.50.42 PM

Everyday I generate 6-10 MS/MS spectrum files and I got tired of repeated clicking and typing of file names, output names species for X! tandem. I want to automate the search, so I can do something else by saving time. I know there is a program for batch X! tandem, but you may not necessarily have graphical interface in your linux system (at least my linux core doesn’t) . So it is useful and more flexible if you can do this in command line. OK,  in order to run X! tandem, you need 4 files in the same directory
1) input.xml
2) default.xml
3) taxonomy.xml
4) tandem.exe
MS/MS spectrum file name and output file name  are the one you change often and these are stored in input.xml file. In order to automate search, there are several ways to do it.

1) Create as  many input.xml files as MS/MS spectrum files and then write a script to sequentially run tandem.exe
2) Create a file that contains MS/MS spectrum file names, then write a script to read it line by line, and modify input.xml file. Execute tandem.exe until all files are searched.
3) Place all MS/MS spectrum files in one directory, and write a script to run tandem.exe for all files in the directory

For 1), if you have only a few files, it is easy to implement. But if you have more files (>10), it is cumbersome  and 2) will work better. If you have many files (>50), typing (or copying + pasting) file names take time, so 3) will work the best.

Here, I am going to show you how to implement method 2). First you create a file, let’s say called “file_name.txt”. This file contains all MS/MS spectrum file names and directory information. For example,

../msdata/073013_exp1.mgf
../msdata/073013_exp2.mgf
../msdata/073013_exp3.mgf
../msdata/073013_exp4.mgf
………
………

Place this file in the same directory as all the other necessary files listed above. Then write shell scripts to automate the search.

1   while read line
2   do
3     echo -e “Writing $line in input.xml\n”
4     sed ‘s=spectrum, path”>*.*<=spectrum, path”>’$line'<=’ <input.xml  >input1.xml
5     sed ‘s=output, path”>*.*<=output, path”>’${line%.*}’_output.xml<=’ <input1.xml >input2.xml
6     ./tandem.exe input2.xml
done < file_name.txt

I used while loop to read each line in file_name.txt until it reaches to the end. Each line is stored in a variable ($line) and I want to insert this variable in the certain places in the txt file. Now if you look at input.xml file in X! tandem, input file and outputfile names are defined in two lines (2nd and 3rd line from the end).

Screen Shot 2013-07-30 at 11.11.41 PM

SED command is very useful to find & replace a character string. The basic format is

sed /s/abc/def/ <file

Here string abc is replaced with def if it finds in file. Slash (/) is used as delimiter. However, you have to be careful what delimiter you want to use. As $line contains slash, you cannot use slash as delimiter. I used equal character (=) which is not used in this regular expression. For more detailed usage of SED command, click here.

Finally, results will be written as input_file_name_output_XXXX_XX_XX_XX_XX_XX.t.xml in the same directory as the input MS/MS files.

About bioinfomagician

Bioinformatic Scientist @ UCLA

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: