8.1 Supervised cell type annotation
In the past a few years, the scRNA-seq data has dramatically increased in both quality and quantity. For the majority of tissue types, some existing studies on the same/similar tissue type are most likely available, and from these existing studies, we can figure out a list of candidate cell types to expect and curate a list of markers for each of them. In this case, we know the 7 cell types involved in the dataset, and curated a marker list from some existing PBMCs studies.
8.1.1 Annotate using signature scores
Given a marker list of candidate cell types, scMINER can estimate a signature score, which is mathematically the weighted mean of the expression of marker genes involved
, for each candidate cell type across all cell cluster. To do so, you will need to generate a signature table with three columns:
signature_name
: name of cell types/signatures;signature_feature
: markers genes/features of corresponding cell type/signature;weight
: weight of corresponding maker/feature in corresponding cell type/signature. It ranges from -1 to 1, so both positive and negtive markers are supoorted.
## Signature table of PBMC14k dataset
signature_table <- read.table(system.file("extdata/demo_pbmc14k/PBMC14k_signatureTable.txt", package = "scMINER"), header = TRUE, sep = "\t", quote = "", stringsAsFactors = FALSE)
head(signature_table)
## signature_name signature_feature weight
## 1 Monocyte CD14 1
## 2 Monocyte LYZ 1
## 3 Monocyte S100A8 1
## 4 Monocyte S100A9 1
## 5 Monocyte S100A12 1
## 6 NK FCGR3A 1
With this signature table, draw_bubbleplot()
can estimate the signature scores and visualize them using bubble plot:
## Violin plot of marker genes across clusters
draw_bubbleplot(input_eset = pbmc14k_log2cpm.eset, signature_table = signature_table, group_by = "clusterID")
## 31 features of 7 signatures were found in the input eset and will be used in calculation.
In the bubble plot above, the color of the bubbles is proportional to the mean of signature score, and the size of the bubbles is proportional to the percentage of cells with higher signature score than mean. The cell type of each cluster is clear, except the cluster 7, which shows equally-high signature score of both CD4+ TCM and CD4+ Reg and higher percentage of CD4+ TCM cells.
8.1.2 Annotate using individual marker genes
scMINER also provides a variety of functions to visualize the selected features:
## For the demonstration purposes, we picked two well known markers for each of the 7 known cell types, plus "CD3D" and "CD4".
genes_of_interest <-c("CD14", "LYZ", "GZMB", "NKG7", "CD19", "MS4A1", "CD8A", "CD8B", "SELL", "CCR7", "IL2RA", "FOXP3", "IL7R", "S100A4", "CD3D", "CD4")
8.1.2.1 feature visualization: violin plot
## Violin plot of marker genes across clusters
feature_vlnplot(input_eset = pbmc14k_log2cpm.eset, features = genes_of_interest, group_by = "clusterID", ncol = 4)
8.1.2.2 feature visualization: box plot
## Box plot of marker genes across clusters
feature_boxplot(input_eset = pbmc14k_log2cpm.eset, features = genes_of_interest, group_by = "clusterID", ncol = 4)
8.1.2.3 feature visualization: scatter plot
## UMAP scatter plot of marker genes
feature_scatterplot(input_eset = pbmc14k_log2cpm.eset, features = genes_of_interest, ncol = 4, location_x = "UMAP_1", location_y = "UMAP_2", point.size = 0.5, legend.key_height = 0.3, legend.key_width = 0.2, fontsize.legend_title = 8, fontsize.legend_text = 6, fontsize.axis_title = 8, legend.position = "none")