4.2 Using self-customized meta data

In some cases, you may have more meta data of either cells (e.g. sample id, treatment condition) or features (e.g. gene full name, gene type, genome location) which will be used in downstream analysis and you do want to add them into the sparse eSet object. the createSparseEset() function provides another two arguments, cellData and featureData, to take the self-customized meta data. For the PBMC14k dataset, we have the true labels of cell type and would like to add them to the sparse eSet object.

## read the true labels of cell type for PBMC14k dataset
true_label <- read.table(system.file("extdata/demo_pbmc14k/PBMC14k_trueLabel.txt.gz", package = "scMINER"), header = T, row.names = 1, sep = "\t", quote = "", stringsAsFactors = FALSE)

head(true_label)
##                trueLabel_full trueLabel
## CACTTTGACGCAAT CD14+ Monocyte  Monocyte
## GTTACGGAAACGAA CD14+ Monocyte  Monocyte
## AGTCACGACAGGAG CD14+ Monocyte  Monocyte
## TTCGAGGACCAGTA CD14+ Monocyte  Monocyte
## CACTTATGAGTCGT CD14+ Monocyte  Monocyte
## GCATGTGATTCTGT CD14+ Monocyte  Monocyte
table(true_label$trueLabel_full)
## 
##               CD14+ Monocyte                      CD19+ B 
##                         2000                         2000 
##              CD4+/CD25 T Reg   CD4+/CD45RA+/CD25- Naive T 
##                         2000                         2000 
##          CD4+/CD45RO+ Memory                     CD56+ NK 
##                         2000                         2000 
## CD8+/CD45RA+ Naive Cytotoxic 
##                         2000
## the true_label much cover all cells in the expression matrix
table(colnames(pbmc14k_rawCount) %in% row.names(true_label))
## 
##  TRUE 
## 14000
## create the sparse eSet object using the true_label
pbmc14k_raw.eset <- createSparseEset(input_matrix = pbmc14k_rawCount, cellData = true_label, featureData = NULL, projectID = "PBMC14k", addMetaData = TRUE)
## Creating sparse eset from the input_matrix ...
##  Adding meta data based on input_matrix ...
## Done! The sparse eset has been generated: 17986 genes, 14000 cells.
## check the true labels of cell type from sparse eSet object
head(pData(pbmc14k_raw.eset))
##                trueLabel_full trueLabel projectID nUMI nFeature    pctMito
## CACTTTGACGCAAT CD14+ Monocyte  Monocyte   PBMC14k  764      354 0.01832461
## GTTACGGAAACGAA CD14+ Monocyte  Monocyte   PBMC14k  956      442 0.01569038
## AGTCACGACAGGAG CD14+ Monocyte  Monocyte   PBMC14k 7940     2163 0.01977330
## TTCGAGGACCAGTA CD14+ Monocyte  Monocyte   PBMC14k 4177     1277 0.01149150
## CACTTATGAGTCGT CD14+ Monocyte  Monocyte   PBMC14k  629      323 0.02066773
## GCATGTGATTCTGT CD14+ Monocyte  Monocyte   PBMC14k  875      427 0.02628571
##                pctSpikeIn         CellID
## CACTTTGACGCAAT          0 CACTTTGACGCAAT
## GTTACGGAAACGAA          0 GTTACGGAAACGAA
## AGTCACGACAGGAG          0 AGTCACGACAGGAG
## TTCGAGGACCAGTA          0 TTCGAGGACCAGTA
## CACTTATGAGTCGT          0 CACTTATGAGTCGT
## GCATGTGATTCTGT          0 GCATGTGATTCTGT
table(pData(pbmc14k_raw.eset)$trueLabel_full)
## 
##               CD14+ Monocyte                      CD19+ B 
##                         2000                         2000 
##              CD4+/CD25 T Reg   CD4+/CD45RA+/CD25- Naive T 
##                         2000                         2000 
##          CD4+/CD45RO+ Memory                     CD56+ NK 
##                         2000                         2000 
## CD8+/CD45RA+ Naive Cytotoxic 
##                         2000