RNA-seq vs Microarray: what is the take?
When I was young, video tapes had two formats, VHS or beta. For computer data storage, there were cassette tapes and floppy disks. There were various sizes for floppies back then, I used at least 8″, 5″, 3.5″ and 3″ disks. As technology matured, many of them faded away and the one that remains on the market is the results of survival of the fittest. In terms of transcriptome analysis, DNA microarray has dominated the last decade. Recently, however, NextGen sequencing (NGS) technology has provided a new path for gene expression analysis. In this post, I want to compare gene expression analysis using two platforms: RNA-seq and DNA microarray.
High Reproducibility for RNA-seq and Microarray
When DNA microarray was technology first introduced, spot cDNA microarray was quite variable between arrays and it was necessary to run technical replicates as well as dye-swap experiments. However, in more recent years, both techniques are highly reproducible, and several reports say the technical replicates of the two methods have a correlation higher than 0.99. If you start with the same RNAs, the results are essentially the same. There is no need for technical replicates for either RNA-seq and DNA microarray.
RNA-seq has a wider dynamic range
In the cell, the dynamic range of mRNA abundance is huge. Some mRNAs have only a few copies per cell, while the most abundant ones have >10,000 copies per cell. Before talking about dynamic range, let’s talk about how similar data obtained by RNA-seq and microarray are. Correlation between RNAseq and microarray is usually pretty good. In my survey of dozens of papers, R-square is around 0.8. When log-transformed data for RNA-seq and microarray are plotted, however, they don’t look uniformly distributed around the trend line (see figure below).
Reference: Zhao et al. (2014), PLOS One
This is due to the difference in dynamic range. Dynamic range of RNA-seq is dependent on the depth of sequencing while microarray has more or less a fixed dynamic range. This means that for RNA-seq in theory, if you sequence deep enough, you can get the same dynamic range as the number of actual RNA molecules in the sample.
The majority of recent RNA-seq papers have 10-50million mapped reads on average, and this depth of sequencing is already giving more dynamic range (>10^5) than DNA microarray (10^3-10^4). At the high end, DNA microarray shows saturation, while at low end it suffers loss of signal (smaller signal than actual). In the middle part, these two technologies are highly correlated to each other. These effects are evident in the figure.
RNA-seq is more sensitive than microarray
There are at least a dozen papers which conducted both RNA-seq and microarray and reported that RNA-seq identified significantly more genes than microarray. Illumina says the sensitivity of microarray (human) for the major vendor is equivalent to 2 million mapped reads. While most recent research papers had >10 million mapped reads/sample on average, RNA-seq should provide a lot higher sensitivity than microarray.
How accurate are the data?
In order to find whether the results obtained from RNA-seq or microarray are accurate, quantitative real-time PCR (q-RT-PCR) is most commonly used. If primers are carefully designed so that they amplify only a specific gene with very high amplification efficiency, q-RT-PCR should provide the most accurate abundance of a particular RNA. In one study, RNA-seq and microarray results were validated for 488 significantly changed genes (>2.0fold) using q-RT-PCR.
Correct Total % correct
RNA-seq 415 460 90%
Microarray 314 340 93%
Both 296 312 95%
For both cases, agreement of gene expression change was greater than 90%. If both technologies agree, the accuracy was 95%. I saw in one other study the accuracy was quite lower than this case. However, if the same RNA was saved for q-RT-PCR and primers are carefully designed, similar accuracy should be obtained. If your q-RT-PCR results don’t agree, careful examination of PCR primers and exon/probe levels of RNAseq/microarray results are required.
While the above calculation is not determining actual accuracy, generally speaking, RNAseq proves more accurate in terms of fold change values. I can guess there would be no significant difference in medium to high expression genes between RNA-seq and microarray, however, the lower and higher ends of gene expression are likely more accurate for RNA-seq due to its better dynamic range.
Splicing Variant Detection
Let’s say you find a differentially expressed gene which is potentially very interesting in microarray. You went ahead and tried cloning this gene for further biochemical study, but you found that there are multiple splice variants for this gene. Then you examined the probes for this gene on microarray, and you found that probes only cover shared exons among the splice variants.
In this case, you need to figure out which form(s) of the gene is (are) actually differentially expressed. While this can be done with q-RT-PCR or northern blot analysis, it takes more time and effort to confirm it. The more microarray contains probes for possible variants, the fewer issues with identifying and quantifying specific variants.
RNA-seq is also capable of detecting single nucleotide polymorphisms (SNPs). Although it would be difficult to find de novo SNPs for low abundance RNAs (the error rate for Illumina’s Genome Analyzer is ~1%) , RNA-seq can detect a single nucleotide change as well as change in sequences by RNA-editing.
Is there a downside of RNA-seq?
RNA-seq is highly reliable and has higher dynamic range and sensitivity over microarray. In addition, it is capable of detecting novel splicing variants and mutations. However, RNA-seq is more costly ($300-$1000/sample) than microarrays ($100-200/sample). This is due to the extensive bioinformatic analysis requirement and the use of newer machines for RNA-seq.
There are a lot of tools for RNA-seq analysis and there is not yet one standard protocol. The size of RNA-seq files are much bigger than those for microarray. Normal uncompressed RNA-seq raw files can be easily >5GB while 30-40 times smaller for microarray.
Analysis of RNA-seq data requires extensive bioinformatic skills and computer resources (CPU and RAM). Large file sizes can be prohibitive to share data easily and costly to store, especially for large data sets.
For a quick-and-easy experiment, microarray can provide reliable and sensitive results. With accurate probe annotations and probe designs to distinguish splice variants and detect non-coding RNAs (e.g. miRNA or lincRNA), microarray analysis can get pretty close to what RNA-seq can offer at a significantly lower cost.
While the cost and complication of analysis can improve over time, I am more excited about the advent of third generation (3-gen) sequencing. With a lower rate of sequencing error (not enzyme based), no need of amplification, and deeper sequencing, the future of RNA-seq is certainly promising.