3.1 From data directory by 10x Genomics

This is the most popular input format of scRNA-seq data generated by 10x Genomics. Usually, the data directory contains three files:

  • matrix.mtx: a sparse matrix format containing the raw UMI count per cell-gene combination
  • barcodes.tsv: a tab-separated matrix containing the cell barcodes
  • features.tsv: a tab-separated matrix containing the features/genes and their annotations

For more details about this format, please check out here.

data_dir <- system.file("extdata/demo_inputs/cell_matrix_10x", package = "scMINER")
list.files(path = data_dir, full.names = FALSE)
## [1] "barcodes.tsv.gz" "features.tsv.gz" "matrix.mtx.gz"
demo1_mtx <- readInput_10x.dir(input_dir = data_dir, featureType = "gene_symbol", removeSuffix = TRUE, addPrefix = "demo1")
## Reading 10x Genomcis data from: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/scMINER/extdata/demo_inputs/cell_matrix_10x ...
##  Multiple data modalities were found: Gene Expression, Peaks . Only the gene expression data (under "Gene Expression") was kept.
## Done! The sparse gene expression matrix has been generated: 500 genes, 100 cells.
demo1_mtx[1:5,1:5]
## 5 x 5 sparse Matrix of class "dgTMatrix"
##            demo1_AAACAGCCAAACGGGC demo1_AAACAGCCAACTAGCC demo1_AAACAGCCAATTAGGA
## AL590822.3                      .                      .                      .
## MORN1                           .                      .                      .
## AL589739.1                      .                      .                      .
## AL513477.2                      .                      .                      .
## RER1                            .                      .                      .
##            demo1_AAACAGCCAGCCAGTT demo1_AAACATGCAAAGCTCC
## AL590822.3                      .                      .
## MORN1                           .                      .
## AL589739.1                      .                      .
## AL513477.2                      .                      .
## RER1                            1                      .

The readInput_10x.dir() function can handle these conditions:

  • Alternative file names for feature data: for the datasets generated by CellRanger 3.0 or earlier, the file name is genes.tsv;
  • Compressed input files: one or more input files are compressed, usually in “.gz” format;
  • Data with multiple modalities: like the single cell multiome data. In this case, readInput_10x.dir() only retains the data of “Gene Expression” by default.