SINCERA: A Computational Pipeline for Single Cell RNA-seq Profiling Analysis ================================= DESCRIPTION ================================= Author: Minzhe Guo (minzhe.guo@cchmc.org) Version: a10142015 License: GNU General Public License v3 . This is the R source code of SINCERA analytic pipeline for analysis of single-cell RNA-seq data from heterogeneous cell populations. To use this script, you will need the R statistical computing environment (version 3.2.0 or later) and several packages freely available through Bioconductor and CRAN. SINCERA is in the ALPHA stage of development. Core features have been implemented. We are improving the documentation, features, and refining the interfaces. SINCERA is under heavy active development. Updates of SINCERA will be distributed primarily through the SINCERA website at: http://research.cchmc.org/pbge/sincera.html. We are in the processing to submit the package to Bioconductor (http://www.bioconductor.org) as well. If you publish results obtained using SINCERA, please cite Guo M, Wang H, Potter S, Whitsett JA, Xu Y. 2015. SINCERA: A Computational Pipeline for Single Cell RNA-seq Profiling Analysis. PLoS Computational Biology. In Press. ================================= FILES ================================= 1) E16.5.Rd: R data object file containing Single-cell RNA-seq expression data from mouse lung at E16.5; and gene-celltype association data downloaded from EBI Expression Atlas. expressions: the FPKM values of 36188 Ensembl genes in 148 cells cells: the information of 148 cells, the cluster membership of cells used in the manuscript is encoded in the column "CLUSTER" genes: the information of 36188 Ensembl genes mouse.ribosomal.genes: a list of ribosomal genes for determining a threshold for specificity filter associations.01112014: processed cell type and gene association data downloaded from EBI Expression Atlas (http://www.ebi.ac.uk/gxa/) Nkx2.1_data: data for demonstrating the consensus-maximization-based refinement of regulatory target prediction for Nkx2-1 in epithelial cells 2) sincera: R functions to implement the pipeline. 3) demo.R: R functions to utilize Sincera for analysing E16.5 data. 4) markers.txt: cell type markers in tab-delimited format. ================================= INSTALLATION AND RUNNING THE DEMO ================================= 1) Download and unzip the SINCERA package. 2) Open R GUI (Instructions for downloading and installation of the latest version of R computing environment can be found at http://cran.rstudio.com/). 3) Change the directory of R GUI to the directory of SINCERA. 4) Run the demo by sourcing demo.R in R GUI. ================================= DEPENDENCIES ================================= The following R and Bioconductor packages are needed by SINCERA: * Biobase * ROCR , * RobustRankAggreg * G1DBN * igraph , * ggplot2 * grid * ggdendro * plyr * zoo SINCERA will try to resolve the dependencies automatically. If the dependencies cannot be resolved, please try the following scripts for installation or refer to the website of each package for more information. if (!require(Biobase)) { source("http://bioconductor.org/biocLite.R") biocLite(c("Biobase")) library(Biobase) } if (!require(ROCR)) { install.packages("ROCR") library(ROCR) } if(!require(RobustRankAggreg)) { install.packages("RobustRankAggreg") library(RobustRankAggreg) } if(!require(G1DBN)) { install.packages("G1DBN") library(G1DBN) } if(!require(igraph)) { install.packages("igraph") library(igraph) } # Visualization Dependent Packages if(!require(ggplot2)) { install.packages("ggplot2") library(ggplot2) } if(!require(grid)) { install.packages("grid") library(grid) } if(!require(ggdendro)) { # for dendrogram visualization install.packages("ggdendro") library(ggdendro) } if(!require(plyr)) { # for dendrogram visualization install.packages("plyr") library(plyr) } if(!require(zoo)) { # for dendrogram visualization install.packages("zoo") library(zoo) } Additional dependencies tightClust is required for cell cluster identification based on tight clustering ConsensusClustPlus is required for cell cluster identification based on consensus clustering samr is required for differentiall expression test based on SAMseq The following packages are required for performing cell type enrichment analysis with Mouse scRNA-seq data using Entrez Gene or Entrez Symbol as unique identifiers AnnotationDbi org.Mm.eg.db The following packages are required for performing cell type enrichment analysis with Human scRNA-seq data AnnotationDbi org.Mm.eg.db hom.Hs.inp.db hom.Mm.inp.db org.Hs.eg.db Change log * add function plotProfiles * rename cluster.export4viz to expression.export4viz * improve functions expression.export4viz, get.diff.genes, consensus_maximization