8.2 Unsupervised cell type annotation

Existing studies in the same or similar contexts are not always available, and there is a significant concern regarding the reliability of reference studies. This reliability largely depends on the expertise of the original authors who defined the markers and assigned the cell types. Therefore, we strongly encourage users to also try unsupervised methods, which can serve as a means of cross-validation.

scMINER provides a function, getDE(), to perform the differential expression analysis and identify the markers of each cluster. The getDE() function supports three different methods to perform the differential expression analysis, limma, wilcoxon and t.test. And it allows the users to define the groups to compare in a flexible way:

## 1. To perform differential expression analysis in a 1-vs-rest manner for all groups
de_res1 <- getDE(input_eset = pbmc14k_log2cpm.eset[500,], group_by = "clusterID", use_method = "limma")
## 7 groups were found in group_by column [ clusterID ].
## Since no group was specified, the differential analysis will be conducted among all groups in the group_by column [ clusterID ] in the 1-vs-rest manner.
##   1 / 7 : group 1 ( 1 ) vs the rest...
##   2505 cells were found for g1.
##   11100 cells were found for g0.
##   2 / 7 : group 1 ( 2 ) vs the rest...
##   2022 cells were found for g1.
##   11583 cells were found for g0.
##   3 / 7 : group 1 ( 3 ) vs the rest...
##   2014 cells were found for g1.
##   11591 cells were found for g0.
##   4 / 7 : group 1 ( 4 ) vs the rest...
##   1918 cells were found for g1.
##   11687 cells were found for g0.
##   5 / 7 : group 1 ( 5 ) vs the rest...
##   1912 cells were found for g1.
##   11693 cells were found for g0.
##   6 / 7 : group 1 ( 6 ) vs the rest...
##   1786 cells were found for g1.
##   11819 cells were found for g0.
##   7 / 7 : group 1 ( 7 ) vs the rest...
##   1448 cells were found for g1.
##   12157 cells were found for g0.
head(de_res1)
##  [1] feature g1_tag  g0_tag  g1_avg  g0_avg  g1_pct  g0_pct  log2FC  Pval   
## [10] FDR     Zscore 
## <0 rows> (or 0-length row.names)

Here is an brief introduction to the results of getDE():

  • feature: feature name;
  • g1_tag: a vector of clusters or subgroups involved in g1, the fore-ground group;
  • g0_tag: a vector of clusters or subgroups involved in g0, the back-ground group;
  • g1_avg: mean of gene expression of cells in g1;
  • g0_tag: mean of gene expression of cells in g0;
  • g1_pct: percentage of cells expressing the corresponding genes in group 1;
  • g0_pct: percentage of cells expressing the corresponding genes in group 0;
  • log2FC: log2Fold change of gene expression between g1 and g0;
  • Pval: P values of g1-g0 comparison;
  • FDR: FDR of g1-g0 comparison;
  • Zscore: Z score of g1-g0 comparison, signed by log2FC;
## 2. To perform differential expression analysis in a 1-vs-rest manner for one specific group
de_res2 <- getDE(input_eset = pbmc14k_log2cpm.eset, group_by = "clusterID", g1 = c("1"), use_method = "limma")

## 3. To perform differential expression analysis in a rest-vs-1 manner for one specific group
de_res3 <- getDE(input_eset = pbmc14k_log2cpm.eset, group_by = "clusterID", g0 = c("1"), use_method = "limma")

## 4. To perform differential expression analysis in a 1-vs-1 manner for any two groups
de_res4 <- getDE(input_eset = pbmc14k_log2cpm.eset, group_by = "clusterID", g1 = c("1"), g0 = c("3"), use_method = "limma")

scMINER also provides a function, getTopFeatures(), to easily extract the group-specific markers from the differential expression result:

cluster_markers <- getTopFeatures(input_table = de_res1, number = 10, group_by = "g1_tag", sort_by = "log2FC", sort_decreasing = TRUE)
dim(cluster_markers)
## [1]  0 11
head(cluster_markers)
##  [1] feature g1_tag  g0_tag  g1_avg  g0_avg  g1_pct  g0_pct  log2FC  Pval   
## [10] FDR     Zscore 
## <0 rows> (or 0-length row.names)