3.1 From data directory by 10x Genomics
This is the most popular input format of scRNA-seq data generated by 10x Genomics. Usually, the data directory contains three files:
- matrix.mtx: a sparse matrix format containing the raw UMI count per cell-gene combination
- barcodes.tsv: a tab-separated matrix containing the cell barcodes
- features.tsv: a tab-separated matrix containing the features/genes and their annotations
For more details about this format, please check out here.
data_dir <- system.file("extdata/demo_inputs/cell_matrix_10x", package = "scMINER")
list.files(path = data_dir, full.names = FALSE)
## [1] "barcodes.tsv.gz" "features.tsv.gz" "matrix.mtx.gz"
demo1_mtx <- readInput_10x.dir(input_dir = data_dir, featureType = "gene_symbol", removeSuffix = TRUE, addPrefix = "demo1")
## Reading 10x Genomcis data from: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/scMINER/extdata/demo_inputs/cell_matrix_10x ...
## Multiple data modalities were found: Gene Expression, Peaks . Only the gene expression data (under "Gene Expression") was kept.
## Done! The sparse gene expression matrix has been generated: 500 genes, 100 cells.
## 5 x 5 sparse Matrix of class "dgTMatrix"
## demo1_AAACAGCCAAACGGGC demo1_AAACAGCCAACTAGCC demo1_AAACAGCCAATTAGGA
## AL590822.3 . . .
## MORN1 . . .
## AL589739.1 . . .
## AL513477.2 . . .
## RER1 . . .
## demo1_AAACAGCCAGCCAGTT demo1_AAACATGCAAAGCTCC
## AL590822.3 . .
## MORN1 . .
## AL589739.1 . .
## AL513477.2 . .
## RER1 1 .
The readInput_10x.dir()
function can handle these conditions:
- Alternative file names for feature data: for the datasets generated by CellRanger 3.0 or earlier, the file name is genes.tsv;
- Compressed input files: one or more input files are compressed, usually in “.gz” format;
- Data with multiple modalities: like the single cell multiome data. In this case,
readInput_10x.dir()
only retains the data of “Gene Expression” by default.