Solution to Trasferring Extremely Large Files

Bandwidth is a communication speed (bit-rate) when you access data to computer resources. Usually, it is expressed in bit/s, kbit/s, Mbit/s or Gbit/s (these are bits not bytes: 8bit=1byte).  If you have home internet access, you usually pay for the speed. In the case of Timewarner cable, you pay $19 for 1Mbps, $29 for 3Mbps, for $34 for 15MBps per month and so on . One Mbps is 1 mega bit per second, which is 125Kbyte/sec and usually this defines the maximum transfer speed of the data.

As I work at university, I have a pretty fast connection (100Mbps=12.5mega byte per sec). But I hardly get such speed for any transfer. For example, if I use SCP, the transfer rate is usually 80-200kbyte per sec which is about 1-2% of maximum speed. Even with Globus online service, I get about 1-1.5MByte per second. At this rate, transferring 1GB file takes about 10 min. Not so bad, but if I want to transfer 100GB, it will take 16 hrs+. I don’t know if this is tolerable hours to you… It depends on your situation. Certainly, if you wan to move 1TB, this will take 160hrs =6.7 days. The time is assumed if the transfer is done flawlessly. Obviously, you rather send 1TB drive using FedEx overnight instead of using Globus online in this case.

Why can  we use only a fraction of bandwidth for data transfer?

The problem of using FTP or HTTP for data transfer is you are relying on TCP (transmission control protocol). TCP or TCP/IP provides reliable, ordered error-checked delivery over the net. But it is a very very slow protocol. There is an inherent problem of transferring data in the long distance using TCP. It loses packet more frequently and therefore, the speed gets slower.

The sender of the TCP packet has to receive the acknowledgement from the receiver before it sends more data. When the acknowledgement is not received, the sender slows down the transfer try to avoid congestion (even there is no congestion).

Solution to the slow transfer

Aspera is a company which  provides a solution to bottle neck issue using TCP for transfer. Their program is completely independent of network delay and suffers little packet loss even in the long distance (e.g. inter-continent). The program is called fasp, which uses a new large data transfer protocol. In this protocol, even at 10% packet loss, it achieves 90% utilization of bandwidth with minimum redundant data transfer.

It is really fast

They claim the transfer speed is up to 1000 times of standard FTP. The benchmark on their website showed below

aspera_benchmark

You see it if you use their program you can transfer the large files at blazing speed. If you have the fastest bandwidth, 100GB file transfer takes only 1.4 min. Wow!!

 Is it really true?

I tried downloading a few files using Aspera program. Using their program to send your own files is not free, but I can download files from the site that uses Aspera can be free (if they don’t charge for downloading files).  Clinical Proteomic Tumor Analysis Consortium (CPTAC) has data collection of proteomic research that can be freely downloaded from their website. When you download files on their website, you will be able to use Aspera program plug-in for free.

CPTAC_site

It is pretty fast

In attempt to downloading 700MB files (total), it took about 2 min. You can see with my bandwidth of 100Mbps, it uses 43% of total capacity for downloading. 43Mbps is 5.4Mbyte per second and this is 50-100 times faster than FTP and 5 times faster than using Globus online. At this speed, I can download 100GB file under 5 hours. It doesn’t seem to be able to use 90% capacity of bandwidth I have, but it is still significantly faster. If I have gigabit per second connection, this should be done less than 1 hr.  I can certainly see the advantage of using their program for very large file transfer. In fact, large companies such as Netflix uses Aspera as they need to transfer large amount of data everyday.

aspera_transfer

How much does it cost?

It is not free, unfortunately. The good service doesn’t come for free. I tried searching on web for pricing and found that it charges by hour. I guess if you are frequent user, it needs to justify the cost. But if you have very fast connection and have lots of data to transfer, this could be the solution.

aspera_price

About bioinfomagician

Bioinformatic Scientist @ UCLA

3 responses to “Solution to Trasferring Extremely Large Files”

  1. Karen Ketchum says :

    Hi, Nice post. If you happen to try the download from the CPTAC portal again, you may be able to increase your speed by adjusting the control in the Transfer Monitor. In your figure aspera_transfer.jpg, this is the second window that you open from the main Transfer-Aspera Connect progress window. The horizontal bar on the monitor can be raised to 300 Mbps this is the top speed of our Aspera Connect Server. It would be interesting to see what speed you obtain with that modification. Karen at ESAC (CPTAC DCC)

    • bioinfomagician says :

      Hi Karen,
      Thanks for adjusting the speed control on the monitor window. My transfer speed is limited to 100Mbps, and if I increased the speed to the max, the transfer became unstable and it actually either took longer to transfer or failed completely. So at least in my case, 50% of max speed seems to provide both decent speed and good stability.

      • Karen Ketchum says :

        Interesting observations. We will follow up with our Aspera technical reps to see if they have some ideas on the instability. Thanks for taking the time to check. Best, Karen

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: