4.3 From multiple samples

What if you have multiple samples for one project? Now it’s pretty common to profile multiple samples of the same tissue but under different conditions (e.g. drug treatment) in one project. Analyzing these samples one by one is crucial, and analyzing them in a combined manner may give you more prospects. For this purpose, scMINER provides a function, combineSparseEset(), to easily combine the sparse eSet objects of multiple samples.

## create a sparse eSet object of each sample to combined
demo1_mtx <- readInput_10x.dir(input_dir = system.file("extdata/demo_inputs/cell_matrix_10x", package = "scMINER"), featureType = "gene_symbol", removeSuffix = TRUE)

## Reading 10x Genomcis data from: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/scMINER/extdata/demo_inputs/cell_matrix_10x ...
##  Multiple data modalities were found: Gene Expression, Peaks . Only the gene expression data (under "Gene Expression") was kept.
## Done! The sparse gene expression matrix has been generated: 500 genes, 100 cells.

demo1.eset <- createSparseEset(input_matrix = demo1_mtx, projectID = "demo1", addMetaData = TRUE)

## Creating sparse eset from the input_matrix ...
##  Adding meta data based on input_matrix ...
## Done! The sparse eset has been generated: 500 genes, 100 cells.

demo2_mtx <- readInput_table(table_file = system.file("extdata/demo_inputs/table_file/demoData3.txt.gz", package = "scMINER"), sep = "\t", is.geneBYcell = TRUE, removeSuffix = TRUE)

## Reading table file: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/scMINER/extdata/demo_inputs/table_file/demoData3.txt.gz ...
##  Suffix removal was specified but skipped, since some barcodes do not carry "-1" suffix.
## Done! The sparse gene expression matrix has been generated: 1000 genes, 100 cells.

demo2.eset <- createSparseEset(input_matrix = demo2_mtx, projectID = "demo2", addMetaData = TRUE)

## Creating sparse eset from the input_matrix ...
##  Adding meta data based on input_matrix ...
## Done! The sparse eset has been generated: 1000 genes, 100 cells.

## combine the 4 sparse eSet objects
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:Biobase':
## 
##     combine

## The following objects are masked from 'package:BiocGenerics':
## 
##     combine, intersect, setdiff, union

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

combined.eset <- combineSparseEset(eset_list = c(demo1.eset, demo2.eset),
                                   projectID = c("sample1", "sample2"),
                                   addPrefix = c("demo1", "demo2"),
                                   addSurfix = NULL, addMetaData = TRUE, imputeNA = TRUE)

## Combining the input sparse eSets ...
## NA values were found in the merged matrix and have been replaced by the minimum value:  0 .
## Adding meta data based on merged data matrix ...
## Done! The combined sparse eset has been generated: 1500 genes, 200 cells.

dim(combined.eset)

## Features  Samples 
##     1500      200

A few questions you may have about the combineSparseEset() function:

What if the input eSets have different features? combineSparseEset() ALWAYS keep all features from the input eSets, and generate NA values wherever the data is not available. By default, this function impute the NA values with the minimum value of the combined matrix, which is usually but not always zero. If this imputation method doesn’t fit your study, you can set imputeNA to FALSE to disable it. If so, the NAs will retain in the eSet object, and you can manually impute them with your own method.
What if the input eSets have some same cell barcodes? combineSparseEset() ALWAYS keep all cells from the input eSets, and will report an error when same barcodes are found in different input eSets. This function provides two arguments, addPrefix and addSurfix, to solve this issue. You can easily avoid the same barcodes of different input eSets by adding a eSet-specific prefix and/or surfix to the barcodes.

head(pData(combined.eset))

##                                        CellID projectID nUMI nFeature pctMito
## demo1_AAACAGCCAAACGGGC demo1_AAACAGCCAAACGGGC   sample1  119       43       0
## demo1_AAACAGCCAACTAGCC demo1_AAACAGCCAACTAGCC   sample1   55       28       0
## demo1_AAACAGCCAATTAGGA demo1_AAACAGCCAATTAGGA   sample1   45       20       0
## demo1_AAACAGCCAGCCAGTT demo1_AAACAGCCAGCCAGTT   sample1  175       44       0
## demo1_AAACATGCAAAGCTCC demo1_AAACATGCAAAGCTCC   sample1   51       31       0
## demo1_AAACATGCAATAGCCC demo1_AAACATGCAATAGCCC   sample1  121       44       0
##                        pctSpikeIn
## demo1_AAACAGCCAAACGGGC          0
## demo1_AAACAGCCAACTAGCC          0
## demo1_AAACAGCCAATTAGGA          0
## demo1_AAACAGCCAGCCAGTT          0
## demo1_AAACATGCAAAGCTCC          0
## demo1_AAACATGCAATAGCCC          0

I have some customized column in the phenoData and/or featureData slots. How does combineSparseEset() handle them? combineSparseEset() only keep the columns of phenoData and featureData that shared by all input eSets. Your customized columns would be kept only when they are available in all input eSets.
Are the 5 meta data statistics in the combined eSet still same with those generated in each eSet? No. By default, combineSparseEset() will update (add, if they are not available in input eSets) these 5 meta data statistics based on the combined matrix. It’s not recommended but you can disable it by setting addMataData to FALSE.