4.3 From multiple samples
What if you have multiple samples for one project? Now it’s pretty common to profile multiple samples of the same tissue but under different conditions (e.g. drug treatment) in one project. Analyzing these samples one by one is crucial, and analyzing them in a combined manner may give you more prospects. For this purpose, scMINER provides a function, combineSparseEset()
, to easily combine the sparse eSet objects of multiple samples.
## create a sparse eSet object of each sample to combined
demo1_mtx <- readInput_10x.dir(input_dir = system.file("extdata/demo_inputs/cell_matrix_10x", package = "scMINER"), featureType = "gene_symbol", removeSuffix = TRUE)
## Reading 10x Genomcis data from: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/scMINER/extdata/demo_inputs/cell_matrix_10x ...
## Multiple data modalities were found: Gene Expression, Peaks . Only the gene expression data (under "Gene Expression") was kept.
## Done! The sparse gene expression matrix has been generated: 500 genes, 100 cells.
## Creating sparse eset from the input_matrix ...
## Adding meta data based on input_matrix ...
## Done! The sparse eset has been generated: 500 genes, 100 cells.
demo2_mtx <- readInput_table(table_file = system.file("extdata/demo_inputs/table_file/demoData3.txt.gz", package = "scMINER"), sep = "\t", is.geneBYcell = TRUE, removeSuffix = TRUE)
## Reading table file: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/scMINER/extdata/demo_inputs/table_file/demoData3.txt.gz ...
## Suffix removal was specified but skipped, since some barcodes do not carry "-1" suffix.
## Done! The sparse gene expression matrix has been generated: 1000 genes, 100 cells.
## Creating sparse eset from the input_matrix ...
## Adding meta data based on input_matrix ...
## Done! The sparse eset has been generated: 1000 genes, 100 cells.
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:Biobase':
##
## combine
## The following objects are masked from 'package:BiocGenerics':
##
## combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
combined.eset <- combineSparseEset(eset_list = c(demo1.eset, demo2.eset),
projectID = c("sample1", "sample2"),
addPrefix = c("demo1", "demo2"),
addSurfix = NULL, addMetaData = TRUE, imputeNA = TRUE)
## Combining the input sparse eSets ...
## NA values were found in the merged matrix and have been replaced by the minimum value: 0 .
## Adding meta data based on merged data matrix ...
## Done! The combined sparse eset has been generated: 1500 genes, 200 cells.
## Features Samples
## 1500 200
A few questions you may have about the combineSparseEset()
function:
- What if the input eSets have different features?
combineSparseEset()
ALWAYS keep all features from the input eSets, and generate NA values wherever the data is not available. By default, this function impute the NA values with the minimum value of the combined matrix, which is usually but not always zero. If this imputation method doesn’t fit your study, you can setimputeNA
toFALSE
to disable it. If so, the NAs will retain in the eSet object, and you can manually impute them with your own method. - What if the input eSets have some same cell barcodes?
combineSparseEset()
ALWAYS keep all cells from the input eSets, and will report an error when same barcodes are found in different input eSets. This function provides two arguments,addPrefix
andaddSurfix
, to solve this issue. You can easily avoid the same barcodes of different input eSets by adding a eSet-specific prefix and/or surfix to the barcodes.
## CellID projectID nUMI nFeature pctMito
## demo1_AAACAGCCAAACGGGC demo1_AAACAGCCAAACGGGC sample1 119 43 0
## demo1_AAACAGCCAACTAGCC demo1_AAACAGCCAACTAGCC sample1 55 28 0
## demo1_AAACAGCCAATTAGGA demo1_AAACAGCCAATTAGGA sample1 45 20 0
## demo1_AAACAGCCAGCCAGTT demo1_AAACAGCCAGCCAGTT sample1 175 44 0
## demo1_AAACATGCAAAGCTCC demo1_AAACATGCAAAGCTCC sample1 51 31 0
## demo1_AAACATGCAATAGCCC demo1_AAACATGCAATAGCCC sample1 121 44 0
## pctSpikeIn
## demo1_AAACAGCCAAACGGGC 0
## demo1_AAACAGCCAACTAGCC 0
## demo1_AAACAGCCAATTAGGA 0
## demo1_AAACAGCCAGCCAGTT 0
## demo1_AAACATGCAAAGCTCC 0
## demo1_AAACATGCAATAGCCC 0
- I have some customized column in the phenoData and/or featureData slots. How does
combineSparseEset()
handle them?combineSparseEset()
only keep the columns of phenoData and featureData that shared by all input eSets. Your customized columns would be kept only when they are available in all input eSets. - Are the 5 meta data statistics in the combined eSet still same with those generated in each eSet? No. By default,
combineSparseEset()
will update (add, if they are not available in input eSets) these 5 meta data statistics based on the combined matrix. It’s not recommended but you can disable it by settingaddMataData
toFALSE
.