4.1 Solely from the gene expression matrix

This is the most commonly used way to create the sparse eSet object with scMINER:

pbmc14k_raw.eset <- createSparseEset(input_matrix = pbmc14k_rawCount, projectID = "PBMC14k", addMetaData = TRUE)

## Creating sparse eset from the input_matrix ...
##  Adding meta data based on input_matrix ...
## Done! The sparse eset has been generated: 17986 genes, 14000 cells.

pbmc14k_raw.eset

## SparseExpressionSet (storageMode: environment)
## assayData: 17986 features, 14000 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: CACTTTGACGCAAT GTTACGGAAACGAA ... ACGTGCCTTAAAGG (14000
##     total)
##   varLabels: CellID projectID ... pctSpikeIn (6 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: AL627309.1 AP006222.2 ... SRSF10.1 (17986 total)
##   fvarLabels: GeneSymbol nCell
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:

input_matrix: it’s usually but not limited to a sparse matrix of raw UMI count.
- As for the data format, it accepts dgCMatrix, dgTMatrix, dgeMatrix, matrix, data.frame.
- As for the type of quantification measures, it takes raw counts, normalized counts (e.g. CPM or CP10k), TPM (Transcripts Per Million), FPKM/RPKM (Fragments/Reads Per Kilobase of transcript per Million) and others.
- What if a data frame object is given to it? When a non-matrix table is passed to input_matrix argument, the createSparseEset() function will automatically convert it to a matrix. And it the matrix, either converted from other format or directly passed from users, is not sparse. createSparseEset() will automatically convert it into sparse matrix, by default. This is controlled by another argument called do.sparseConversion, the default of which is TRUE. It’s not recommended but the users can set it as FALSE to disable the conversion. Then createSparseEset() will create the eSet based on the regular matrix.
addMetaData: when this argument is set TRUE (this is the default), createSparseEset() will automatically generate 5 statistics, 4 for cells and 1 for features, and add them into the phenoData and featureData slots. These 5 statistics will be used in quality control and data filtration.

## check the phenoData: metadata of cells
head(pData(pbmc14k_raw.eset))

##                        CellID projectID nUMI nFeature    pctMito pctSpikeIn
## CACTTTGACGCAAT CACTTTGACGCAAT   PBMC14k  764      354 0.01832461          0
## GTTACGGAAACGAA GTTACGGAAACGAA   PBMC14k  956      442 0.01569038          0
## AGTCACGACAGGAG AGTCACGACAGGAG   PBMC14k 7940     2163 0.01977330          0
## TTCGAGGACCAGTA TTCGAGGACCAGTA   PBMC14k 4177     1277 0.01149150          0
## CACTTATGAGTCGT CACTTATGAGTCGT   PBMC14k  629      323 0.02066773          0
## GCATGTGATTCTGT GCATGTGATTCTGT   PBMC14k  875      427 0.02628571          0

## check the featureData: metadata of features
head(fData(pbmc14k_raw.eset))

##                  GeneSymbol nCell
## AL627309.1       AL627309.1    50
## AP006222.2       AP006222.2     2
## RP11-206L10.3 RP11-206L10.3     1
## RP11-206L10.2 RP11-206L10.2    33
## RP11-206L10.9 RP11-206L10.9    17
## LINC00115         LINC00115   115