7.4 Integrate MICA outputs into SparseEset object

MICA generates several files and save all of them in the output directory specified by the user with -o argument. The core, and only, output file we need for subsequent analysis is the clustering label file named in the format of ProjectName_clustering_VisualizeMethod_euclidean_NumberOfDimensions_Resolution.txt. In this case, since we used a range of resolutions, there are several clustering label files generated, one for each resolution. Based on the knowledge about PBMC14k dataset, we compared the results of different resolutions and picked clustering_UMAP_euclidean_20_2.05.txt for subsequent analysis.

micaOutput <- read.table(system.file("extdata/demo_pbmc14k/PBMC14k/MICA/clustering_UMAP_euclidean_20_2.05.txt", package = "scMINER"), header = TRUE, sep = "\t", quote = "", stringsAsFactors = F)
head(micaOutput)
##               ID        X        Y label
## 1 CACTTTGACGCAAT 14.91650 13.04096     6
## 2 GTTACGGAAACGAA 14.57031 10.27093     6
## 3 CACTTATGAGTCGT 14.28869 13.61674     6
## 4 GCATGTGATTCTGT 14.12546 13.36319     6
## 5 TAGAATACGTATCG 14.91227 11.19407     6
## 6 CAAGAAGACCCTCA 15.34154 12.25821     6

As shown above, the clustering label file contains four columns:

  • ID: cell barcodes;
  • X: coordinates of UMAP_1 or tSNE_1;
  • Y: coordinates of UMAP_2 or tSNE_2;
  • label: labels of predicted clusters.

The clustering result can be easily easily added to the SparseExpressionSet object by addMICAoutput():

pbmc14k_log2cpm.eset <- addMICAoutput(pbmc14k_log2cpm.eset, mica_output_file = system.file("extdata/demo_pbmc14k/PBMC14k/MICA/clustering_UMAP_euclidean_20_2.05.txt", package = "scMINER"), visual_method = "umap")
head(pData(pbmc14k_log2cpm.eset))
##                trueLabel_full trueLabel projectID nUMI nFeature    pctMito
## CACTTTGACGCAAT CD14+ Monocyte  Monocyte   PBMC14k  764      354 0.01832461
## GTTACGGAAACGAA CD14+ Monocyte  Monocyte   PBMC14k  956      442 0.01569038
## CACTTATGAGTCGT CD14+ Monocyte  Monocyte   PBMC14k  629      323 0.02066773
## GCATGTGATTCTGT CD14+ Monocyte  Monocyte   PBMC14k  875      427 0.02628571
## TAGAATACGTATCG CD14+ Monocyte  Monocyte   PBMC14k 1060      445 0.03207547
## CAAGAAGACCCTCA CD14+ Monocyte  Monocyte   PBMC14k  849      384 0.01531213
##                pctSpikeIn         CellID   UMAP_1   UMAP_2 clusterID
## CACTTTGACGCAAT          0 CACTTTGACGCAAT 14.91650 13.04096         6
## GTTACGGAAACGAA          0 GTTACGGAAACGAA 14.57031 10.27093         6
## CACTTATGAGTCGT          0 CACTTATGAGTCGT 14.28869 13.61674         6
## GCATGTGATTCTGT          0 GCATGTGATTCTGT 14.12546 13.36319         6
## TAGAATACGTATCG          0 TAGAATACGTATCG 14.91227 11.19407         6
## CAAGAAGACCCTCA          0 CAAGAAGACCCTCA 15.34154 12.25821         6