Archive | December 2013

Making the Best of PubMed Search

The PubMed website has been constantly improved over time, so many more features have been added since I started using it more than 15 years ago. In this post, I want to discuss about how to improve your PubMed search in order to get what you want.

PubMed records contain more than 23 million citations as of Dec 2013 for biomedical literature. Without proper search terms, you will get overwhelmed quickly. PubMed has extensive tutorial sections, but you may not have time to watch all videos and read the documents. Many parts of PubMed are quite intuitive, so I am not going to discuss the features people can easily figure out.

Use Special Characters Properly to Enhance Your Search

In PubMed search, you have several special characters which make your search more specific. These are “” (double quotation), AND/OR/NOT (boolean), * (asterisk), parenthesis ( ) and square brackets [ ].

Put Words between Double Quotations to Search Phrases

Similar to google search, if you place words between double quotations, PubMed will search the articles which contains the specific phrase. For example, “Lung cancer” will not match lung and breast cancer, which would be matched without the double quotations. PubMed search is NOT case sensitive, so it doesn’t matter if any part of the word is capitalized.

Boolean Operators To Retrieve Intersections & Unions

AND/OR/NOT will be recognized as a boolean operator whether or not they are capitalized. If you want an intersection of the two terms, you use AND. OR works as a union, so you will get articles that contain one of the terms or both. If you have too many hits, NOT is useful to refine your search.

Screen Shot 2013-12-20 at 4.03.24 PM

Using Asterisk For Ambiguous Search

Asterisk (*) is called a truncation symbol in PubMed search.  It works like a wild card in regular expression.  If you want to perform an ambiguous search, this is useful. For example, ferment* will match fermented, fermentation, fermenting and any words that start with ferment.

Parenthesis with Boolean Operators For Complex Searches

Words in parenthesis are considered a set. By combining parenthesis and boolean operators, you can perform more complex searches.

Example: vaccine that may either cause red skin or muscle pain

Answer: vaccine AND (red skin OR muscle pain)

It is up to you whether use double quotations to be more specific or not.

In an advanced search, you can search complex searches from your history using boolean operators. If you are doing complicated searches or you want to exclude references you have already seen, this may be useful.

Screen Shot 2013-12-20 at 8.13.07 PM

Square Brackets To Specify Search Field

You probably know that the interpretation of your search term is shown in the box of search details on the right. You notice that every word is followed by square brackets that contain field information. Try remembering common field named so that you can do advance search without using the advance search page.  Examples: [Author], [Title], [Text], [ad], [Journal]

Finding the Right Author with a Common Name

It happens often after going to a meeting, I met a person and want to find his/her paper(s) on PubMed. But his/her name is very common and if you know only the initial of the first name, PubMed search will give huge number of hits.

Use [ad] option to search

Even if the author you are looking for has a common last name, if you know the part of the name of his/her institutional association, it will significantly help the search.

Use the Journal Name

If you know the name of the journal in which the author published, you can use [Journal] option to search in addition to the author name.

Finding the Correct Abbreviation of Journal Title

Journal abbreviation is commonly used in citation in CV and many other places. You can also use it in PubMed search. However, you need to use the exact abbreviation for each journal, otherwise the search will not return the correct results. First, you need to go to PubMed main page and click “Journals in NCBI Database” in the “more resources” column on the right. If you cannot find it, click here.

Next, type the journal title in the search field. If you see the exact title when you are typing in the name, select it and enter. You will see two abbreviation types for The Journal of Biological Chemistry (known as JBC) below. You cannot use the term “JBC” to search JBC articles, you need to use one of the ones in the red rectangle. For PubMed search, both abbreviations in the red rectangle will work fine.

journal_abbreviation

Want to Find a Trend in a Particular Research Area

trends

PubMed results show the trend by year only if the search term contains more than 10,000 citations. If fewer, the search doesn’t show the trend. Fortunately, there is a website which does the same job without such a limitation. Click the following URL and type in the search term (it takes a little time to show the results).

http://dan.corlan.net/medline-trend.html

Screen Shot 2013-12-19 at 10.36.00 PM

Copy the results and paste in text editor (e.g. word, notepad). Then save as text file (.txt).

Run excel and open the text file you just saved. When you open the document, excel will ask you if you want to separate the field by certain characters. Select “Delimited” and go next.

Then click space (see below).

trend_search_excel

Now all data are placed in each cell, so you can manipulate the data and create a trend graph.

Be Greedy When You Find a Good Reference

If you find a good hit with a PubMed search, try clicking the related citations. It is possible you may find more articles that you like.

related_citation

Mobile App for PubMed Search

I found this pretty easy to use and convenient for mobile users. You can make comments on articles and save on your mobile.

Screen Shot 2013-12-31 at 2.20.12 PM

Advertisements

Rmpi Tutorial 4: Getting Data Back From Slaves

This is going to be the last tutorial for Rmpi. In this post I am going to cover how to receive data from slaves in Rmpi. Let’s think of a situation in the picture below. You want to gather data from slaves and combine them with the data in the master.

mpi_gather_Robj

mpi.gather.Robj() will retrieve data from each slave and put them together like the diagram above.  Let’s try some examples

Example 1: Getting slave number information from each slave

library('Rmpi')
mpi.spawn.Rslaves(nslaves=3)
mpi.bcast.cmd(id<-mpi.comm.rank())
mpi.bcast.cmd(x<-paste("I am slave no.",id))
mpi.bcast.cmd(mpi.gather.Robj(x))
x<-"I am a master"
mpi.gather.Robj(x)
mpi.remote.exec(x)
mpi.close.Rslaves()

Here is the output (showing only the last part)

> mpi.gather.Robj(x)
[1] "I am a master"    "I am slave no. 1" 
"I am slave no. 2" "I am slave no. 3"
> mpi.remote.exec(x)
$slave1
[1] "I am slave no. 1"
$slave2
[1] "I am slave no. 2"
$slave3
[1] "I am slave no. 3"

If you want to retrieve data from each slave and give the whole data to all slaves, you use mpi.allgather.Robj().
mpi_allgather_Robj

Example 2: Send a string “fruit” to master” and “apple”, “banana” and “orange” to slave 1 to 3. Then retrieve data from each slave and send the all data to master and all slaves.

library('Rmpi')
mpi.spawn.Rslaves(nslaves=3)
x<-c("fruits","apple","banana","orange")
mpi.bcast.cmd(x<-mpi.scatter.Robj())
x<-mpi.scatter.Robj(x)
mpi.remote.exec(x)
mpi.bcast.cmd(x<-mpi.allgather.Robj(x))
mpi.allgather.Robj(x)
mpi.remote.exec(x)
mpi.close.Rslave()

Here is the output

> mpi.remote.exec(x) 
$slave1
[1] "apple" 
$slave2 
[1] "banana" 
$slave3 
[1] "orange" 
....
>mpi.allgather.Robj(x)
[1] "fruits" "apple" "banana" "orange" 
>mpi.remote.exec(x) 
$slave1 
[1] "fruits" "apple" "banana" "orange" 
$slave2 
[1] "fruits" "apple" "banana" "orange" 
$slave3 
[1] "fruits" "apple" "banana" "orange"

mpi.reduce and mpi.allreduce Will Reduce Data By Simple Operation

mpi_reduce_maxloc_minloc

mpi.reduce command examines a variable in the slaves & the master, do simple operation such as finding minimum or maximum value then return the value. The variable needs to exist in every slave including master, the returned value is a single value. In order for it to work, you need to call this command from all slaves and master, otherwise it will go to infinite loop.

Example 3: Set a value of x to 1 in the master, 2,3, & 4 in slave 1, 2, & 3.  Then using mpi.reduce to return a sum of all x.

library('Rmpi')
mpi.spawn.Rslaves(nslave=3)
# Define function for reduction
red<-function(option="sum"){
    mpi.reduce(x,type=2,op=option)
}
# Send a function to all slaves
mpi.bcast.Robj2slave(red)
# Set object x and send to slaves
x<-c(1,2,3,4)
mpi.bcast.cmd(x<-mpi.scatter.Robj())
x<-mpi.scatter.Robj(x)
mpi.remote.exec(x)
# call the function in slaves
mpi.remote.exec(red("sum"))
# call the same function in master
mpi.reduce(x,2,"sum")
mpi.close.Rslaves()

Here is the output

> mpi.remote.exec(red("sum"))
  X1 X2 X3
1  2  3  4
> mpi.reduce(x,2,"sum")
[1] 10

If you use mpi.allreduce, it will send the final value to all slaves. There are two more options in mpi.reduce and they are maxloc and minloc. If you use these options, the command will return two values, the value resulting from the operation (either minimum or maximum) and the location of the value. This can be useful to find the slave which provides the value.

> mpi.reduce(x,2,"maxloc")
[1] 4 3
> mpi.reduce(x,2,"minloc")
[1] 1 0

Note: the rank for the master is 0

Rmpi Tutorial 3: Sending Data to A Specific Slave

mpi_send_recv

Today’s topic is point-to-point communication using Rmpi. If you want to send data to a specific slave, you will need to use a pair command of mpi.send and mpi.recv. These commands have to be used as a pair, and mpi.send comes always before mpi.recv, otherwise R will crash. This is because both mpi.send and mpi.recv are blocking calls and it will wait for data which haven’t been sent at the moment.

Another way of explaining blocking calls is that the program won’t return until it completes send & receive. Thus, if receive is called before send, it goes into infinite wait and this situation is called deadlock.

In addition, because  mpi.recv command needs to be executed on a slave CPU, usually codes containing mpi.recv() is sent as a function to slaves first. Then the function is called after calling mpi.send by master. Here is the basic process to send data to specific slave.

1. Define a function to receive data from master
2. Send the function to slave using mpi.send.Robj()
3. mpi.send() from master
4. Call the function
5. Check the results

The syntax for mpi.send() and mpi.recv() is

mpi.send(x,type,dest,tag, comm)

mpi.recv(x,type,source,tag,comm)

Arguments:

x     : data to be sent
type  : 1 for integer,2 for double and 3 for character
dest  : destination rank
source: source rank 
       (Use mpi.any.source for any source)
tag   : non-negative number 
       (Use mpi.any.tag for any tag flag)
comm  : communication number (default=1)

For tag and source, you can use wild card (mpi.any.tag, and mpi.any.source). If  you use it, the receiver will receive data with any tag value or no matter where the data is coming from .

Example: Send an integer from the master CPU to slave #2

library('Rmpi')
mpi.spawn.Rslaves(nslaves=3)

#define function to receive data in slave
srecv<-function(){
if(mpi.comm.rank()==2)
x<-mpi.recv(x,1,0,1,1)
}

#send the function to all slaves
mpi.bcast.Robj2slave(srecv)

#send an integer from master
x<-as.integer(21.34)
mpi.send(x,1,2,1,1)

#create x to receive data in slaves
mpi.bcast.cmd(x<-integer(1))

#call the function
mpi.bcast.cmd(srecv())

#check results
mpi.remote.exec(x)
mpi.close.Rslaves()

Here is the output

> mpi.remote.exec(x)
  X1 X2 X3
1  0 21  0

mpi.send command can be inside of the function instead of being called in the main program. In this case, you place mpi.send inside of if-close using mpi.comm.rank to run only by master. However, the if-statement needs to be before mpi.recv command to work properly.

if (mpi.comm.rank()==0){
mpi.send (.......)
}

mpi.send and mpi.recv are used to send small data to a slave. For a large data, you can use  mpi.send.Robj and mpi.recv.Robj. The syntax for these commands are similar to mpi.send and mpi.recv.

mpi.send.Rojb(obj,dest,tag,comm=1)
mpi.recv.Robj(source,tag,comm=1,status=0)

mpi.isend and mpi.irecv are Non-Blocking Calls

If you use blocking calls such as mpi.send or mpi.recv, you may have a deadlock situation. To avoid crash, you can use non-blocking calls such as mpi.isend and mpi.irecv (i stands for immediate). These commands will not wait to complete send/receive, and go to the next code immediately. This means that even if you don’t succeed sending data to slaves, you will go to next line of the program.

What is a Benefit of Using Non-Blocking Calls?

Besides avoiding crash, using non-blocking call may bring better performance. For example, if you send data but you don’t need to use the data right away, you can perform some tasks before you check to see if sending data is completed.

### slave side # do not wait to complete receiving data x<-mpi.irecv(...) ... some codes (no access to x) ... some codes (no access to x) mpi.wait() #codes using x starts here

If you have multiple requests, you can use mpi.waitall. Please see more details for these functions here.

Rmpi Tutorial 2: Sending Data

In MPI, there are a number of commands for sending data. It is important to meet the requirement when you send data to slaves, otherwise R will crash. I am going to cover a few important ones in this post.

Sending Identical Data to Slave CPUs

For large scale data analysis, you may want to divide the work to slave CPUs. These CPUs may need to receive some constant values that are necessary for downstream computation. To send an object to each slave, you can use mpi.bcast.Robj() command.

Example: Send a character string to 3 slaves

1. Spawn 3 slaves and create an variable x in each slave.

>mpi.spawn.Rslaves(nslaves=3)
>mpi.bcast.cmd(x<-mpi.bcast.Robj())
>x<-c("This is a test.")

2. Send x to each slave

>mpi.bcast.Robj(x)

3. Print x in each slave

>mpi.remote.exec(x)
$slave1
[1] "This is a test."
$slave2
[1] "This is a test."
$slave3
[1] "This is a test."

4. Close mpi

>mpi.close.Rslaves()

Sending Non-identical Data to Slave CPUs

When you have a large set of data, first you divide the data and put them in a list, then send the list to each slave. There are two commands mpi.scatter.Robj() and mpi.scatter.Robj2slave() to send list to slave CPUs.

When you use these commands, you need to have exactly the same number of object potions as the number of slave CPUs. If not, you will get an error message. For example, if you spawn 3 slave CPUs, and your object to send is a list of 4, mpi.scatter.Robj2slave() will not work because you don’t have the equal numbers. However, mpi.scatter.Rbj() will work because master plus slaves equal to the number of list in the object.

Example 1: Split and send an object of list of 4 to master and slaves

 mpiscatterrobj

This code is very similar to the one above except that x is a list of character.

>mpi.spawn.Rslaves(nslaves=3)
>x<-c("This","is","an","example")
>mpi.bcast.cmd(x<-mpi.scatter.Robj())
>mpi.scatter.Robj(x)
[1] "This"
>mpi.remote.exec(x)
>$slave1
[1] "is"
$slave2
[1] "an"
$slave3
[1] "example"
>mpi.close.Rslaves()

Note that master receive object x but it does not overwrite existing x.

Example 2: Divide 8×4 matrix into 4 blocks and send to slaves

mpiscatterrobj2
1. Spawn 4 slave CPUs, and create a 8×4 matrix with random numbers

>mpi.spawn.Rslaves(nslaves=4)
>mat<-matrix(rnorm(32),8)
>mat
           [,1]       [,2]       [,3]       [,4]
[1,] -0.3718508  0.8075626 -0.1145767  1.2152244
[2,]  1.2414776 -1.7983161  0.8113792 -0.0577753
[3,]  0.2291987 -1.8194346  0.5902288 -0.5519079
[4,] -0.6056088  0.7028118 -1.0299552  1.4069104
[5,]  0.5006542  1.2469203 -1.5266182 -0.2712369
[6,]  0.9899981  1.0211666 -1.0916166  0.9721620
[7,] -1.6689545  0.2618148 -1.0774920 -0.4962599
[8,] -0.3919911  0.1678641  0.5198690  0.7932334

2. Split the matrix into 4 of 2×4 matrices

>smat<-lapply(.splitIndices(nrow(mat),4),function(i)
mat[i,])
>smat
[[1]]
           [,1]       [,2]       [,3]       [,4]
[1,] -0.3718508  0.8075626 -0.1145767  1.2152244
[2,]  1.2414776 -1.7983161  0.8113792 -0.0577753

[[2]]
           [,1]       [,2]       [,3]       [,4]
[1,]  0.2291987 -1.8194346  0.5902288 -0.5519079
[2,] -0.6056088  0.7028118 -1.0299552  1.4069104

[[3]]
          [,1]     [,2]      [,3]       [,4]
[1,] 0.5006542 1.246920 -1.526618 -0.2712369
[2,] 0.9899981 1.021167 -1.091617  0.9721620

[[4]]
           [,1]      [,2]      [,3]       [,4]
[1,] -1.6689545 0.2618148 -1.077492 -0.4962599
[2,] -0.3919911 0.1678641  0.519869  0.7932334

3. Send each matrix to slave CPUs

> mpi.scatter.Robj2slave(smat)
> mpi.remote.exec(smat)
$slave1
[,1] [,2] [,3] [,4]
[1,] -0.3718508 0.8075626 -0.1145767 1.2152244
[2,] 1.2414776 -1.7983161 0.8113792 -0.0577753

$slave2
[,1] [,2] [,3] [,4]
[1,] 0.2291987 -1.8194346 0.5902288 -0.5519079
[2,] -0.6056088 0.7028118 -1.0299552 1.4069104

$slave3
[,1] [,2] [,3] [,4]
[1,] 0.5006542 1.246920 -1.526618 -0.2712369
[2,] 0.9899981 1.021167 -1.091617 0.9721620

$slave4
[,1] [,2] [,3] [,4]
[1,] -1.6689545 0.2618148 -1.077492 -0.4962599
[2,] -0.3919911 0.1678641 0.519869 0.7932334

4. Close mpi

>mpi.close.Rslaves()

Note: The results may vary as the seed was not set for this code.

%d bloggers like this: