8.1 Supervised cell type annotation

In the past a few years, the scRNA-seq data has dramatically increased in both quality and quantity. For the majority of tissue types, some existing studies on the same/similar tissue type are most likely available, and from these existing studies, we can figure out a list of candidate cell types to expect and curate a list of markers for each of them. In this case, we know the 7 cell types involved in the dataset, and curated a marker list from some existing PBMCs studies.

8.1.1 Annotate using signature scores

Given a marker list of candidate cell types, scMINER can estimate a signature score, which is mathematically the weighted mean of the expression of marker genes involved, for each candidate cell type across all cell cluster. To do so, you will need to generate a signature table with three columns:

  • signature_name: name of cell types/signatures;
  • signature_feature: markers genes/features of corresponding cell type/signature;
  • weight: weight of corresponding maker/feature in corresponding cell type/signature. It ranges from -1 to 1, so both positive and negtive markers are supoorted.
## Signature table of PBMC14k dataset
signature_table <- read.table(system.file("extdata/demo_pbmc14k/PBMC14k_signatureTable.txt", package = "scMINER"), header = TRUE, sep = "\t", quote = "", stringsAsFactors = FALSE)
head(signature_table)
##   signature_name signature_feature weight
## 1       Monocyte              CD14      1
## 2       Monocyte               LYZ      1
## 3       Monocyte            S100A8      1
## 4       Monocyte            S100A9      1
## 5       Monocyte           S100A12      1
## 6             NK            FCGR3A      1

With this signature table, draw_bubbleplot() can estimate the signature scores and visualize them using bubble plot:

## Violin plot of marker genes across clusters
draw_bubbleplot(input_eset = pbmc14k_log2cpm.eset, signature_table = signature_table, group_by = "clusterID")
## 31 features of 7 signatures were found in the input eset and will be used in calculation.

In the bubble plot above, the color of the bubbles is proportional to the mean of signature score, and the size of the bubbles is proportional to the percentage of cells with higher signature score than mean. The cell type of each cluster is clear, except the cluster 7, which shows equally-high signature score of both CD4+ TCM and CD4+ Reg and higher percentage of CD4+ TCM cells.

8.1.2 Annotate using individual marker genes

scMINER also provides a variety of functions to visualize the selected features:

## For the demonstration purposes, we picked two well known markers for each of the 7 known cell types, plus "CD3D" and "CD4".
genes_of_interest <-c("CD14", "LYZ", "GZMB", "NKG7", "CD19", "MS4A1", "CD8A", "CD8B", "SELL", "CCR7", "IL2RA", "FOXP3", "IL7R", "S100A4", "CD3D", "CD4")

8.1.2.1 feature visualization: violin plot

## Violin plot of marker genes across clusters
feature_vlnplot(input_eset = pbmc14k_log2cpm.eset, features = genes_of_interest, group_by = "clusterID", ncol = 4)

8.1.2.2 feature visualization: box plot

## Box plot of marker genes across clusters
feature_boxplot(input_eset = pbmc14k_log2cpm.eset, features = genes_of_interest, group_by = "clusterID", ncol = 4)

8.1.2.3 feature visualization: scatter plot

## UMAP scatter plot of marker genes
feature_scatterplot(input_eset = pbmc14k_log2cpm.eset, features = genes_of_interest, ncol = 4, location_x = "UMAP_1", location_y =  "UMAP_2", point.size = 0.5, legend.key_height = 0.3, legend.key_width = 0.2, fontsize.legend_title = 8, fontsize.legend_text = 6, fontsize.axis_title = 8, legend.position = "none")

8.1.2.4 feature visualization: bubble plot

## Bubble plot of marker genes across clusters
feature_bubbleplot(input_eset = pbmc14k_log2cpm.eset, features = genes_of_interest, group_by = "clusterID", xlabel.angle = 45)

8.1.2.5 feature visualization: heatmap

## Heatmap of marker genes across clusters
feature_heatmap(input_eset = pbmc14k_log2cpm.eset, features = genes_of_interest, group_by = "clusterID", scale_method = "none", annotation_columns = c("trueLabel"))