7.4 Integrate MICA outputs into SparseEset object
MICA generates several files and save all of them in the output directory specified by the user with -o
argument. The core, and only, output file we need for subsequent analysis is the clustering label file named in the format of ProjectName_clustering_VisualizeMethod_euclidean_NumberOfDimensions_Resolution.txt
. In this case, since we used a range of resolutions, there are several clustering label files generated, one for each resolution. Based on the knowledge about PBMC14k dataset, we compared the results of different resolutions and picked clustering_UMAP_euclidean_20_2.05.txt
for subsequent analysis.
micaOutput <- read.table(system.file("extdata/demo_pbmc14k/PBMC14k/MICA/clustering_UMAP_euclidean_20_2.05.txt", package = "scMINER"), header = TRUE, sep = "\t", quote = "", stringsAsFactors = F)
head(micaOutput)
## ID X Y label
## 1 CACTTTGACGCAAT 14.91650 13.04096 6
## 2 GTTACGGAAACGAA 14.57031 10.27093 6
## 3 CACTTATGAGTCGT 14.28869 13.61674 6
## 4 GCATGTGATTCTGT 14.12546 13.36319 6
## 5 TAGAATACGTATCG 14.91227 11.19407 6
## 6 CAAGAAGACCCTCA 15.34154 12.25821 6
As shown above, the clustering label file contains four columns:
ID
: cell barcodes;X
: coordinates of UMAP_1 or tSNE_1;Y
: coordinates of UMAP_2 or tSNE_2;label
: labels of predicted clusters.
The clustering result can be easily easily added to the SparseExpressionSet object by addMICAoutput()
:
pbmc14k_log2cpm.eset <- addMICAoutput(pbmc14k_log2cpm.eset, mica_output_file = system.file("extdata/demo_pbmc14k/PBMC14k/MICA/clustering_UMAP_euclidean_20_2.05.txt", package = "scMINER"), visual_method = "umap")
head(pData(pbmc14k_log2cpm.eset))
## trueLabel_full trueLabel projectID nUMI nFeature pctMito
## CACTTTGACGCAAT CD14+ Monocyte Monocyte PBMC14k 764 354 0.01832461
## GTTACGGAAACGAA CD14+ Monocyte Monocyte PBMC14k 956 442 0.01569038
## CACTTATGAGTCGT CD14+ Monocyte Monocyte PBMC14k 629 323 0.02066773
## GCATGTGATTCTGT CD14+ Monocyte Monocyte PBMC14k 875 427 0.02628571
## TAGAATACGTATCG CD14+ Monocyte Monocyte PBMC14k 1060 445 0.03207547
## CAAGAAGACCCTCA CD14+ Monocyte Monocyte PBMC14k 849 384 0.01531213
## pctSpikeIn CellID UMAP_1 UMAP_2 clusterID
## CACTTTGACGCAAT 0 CACTTTGACGCAAT 14.91650 13.04096 6
## GTTACGGAAACGAA 0 GTTACGGAAACGAA 14.57031 10.27093 6
## CACTTATGAGTCGT 0 CACTTATGAGTCGT 14.28869 13.61674 6
## GCATGTGATTCTGT 0 GCATGTGATTCTGT 14.12546 13.36319 6
## TAGAATACGTATCG 0 TAGAATACGTATCG 14.91227 11.19407 6
## CAAGAAGACCCTCA 0 CAAGAAGACCCTCA 15.34154 12.25821 6