Skip to main content
Skip to main content

Identification of Sources of Platform Specific Bias in Single Cell RNA Sequencing

Life Sciences


High throughput single cell transcriptomics is a powerful tool for unbiased marker-free discovery of the new cell types and activation states. This process involves reverse transcribing RNA using beads containing oligonucleotide bar codes to perform whole genome amplification such that a barcoded cDNA library is produced. This library can then sequenced and aligned to a reference genome to form create a gene expression table. Commercial packages like Seurat can read in such data and convert them into sparse matrices to identify variable genes across cells for subsequent analysis. These packages allow users to perform clustering and dimensionality reduction before finding markers specific to groups of cells, visualizing data, and identifying cell specific responses to varying conditions. However, the final stage of this process has proven difficult as differing sequencing platforms have often been used to sequence different libraries before comparison of gene expression data. In fact, preliminary studies have shown that, when trying to visualize cell specific responses to treatments via dimensional reduction, sub populations of cells appear to cluster based largely on ribosomal genes, whose detection has been shown to vary between platforms. Thus, in order to confirm the validity of studies using both sequencing platforms, it is important examine why this pattern persists and determine sources of bias arising from each platform. To this end, we have compared sequencing results from five genomic libraries sequenced on both Nextseq and Hiseq instruments. Here we report identification of the systematic bias in detection of specific genes and, using computational and statistical approaches, demonstrate how this bias originates during the data acquisition, propagates through bioinformatics pipelines and affects estimation of the differentially expressed genes. Our findings are of high importance for the large scale integrative studies, such as Human Cell Atlas project. We also propose computational approaches for mitigating this bias.

Rohan Verma, et al.

Pulmonary and Critical Care

April, 2018

DOI: 10.21985/N2CX07