9.1 Generate SJARACNe input files
The network inference is usually conducted in a cluster- or cell type-specific basis. Given the column names for grouping, generateSJARACNeInput()
will create a folder for each group named by the group label.
IMPORTANT NOTE: Any illegal characters in path in group labels may cause issues in subsequent analysis. To avoid it, scMINER only accept letters(A-Za-z), numbers(0-9), underscores(’_‘) and periods(’.’).
In this case, the true labels of cell type are available, so we use them to define the groups for network inference.
## Columns with any illegal characters can not be used for groupping
generateSJARACNeInput(input_eset = pbmc14k_log2cpm.eset, group_name = "trueLabel", sjaracne_dir = "/work-path/PBMC14k/SJARACNe", species_type = "hg", driver_type = "TF_SIG", downSample_N = NULL)
For big datasets, generateSJARACNeInput()
provides an argument, downSample_N
, to allow users to down sample size of each group. The default value of downSample_N
is 1,000, any group with >= 1,000 cells will be down-sample to 1,000.
## one folder for each group
list.dirs(system.file("extdata/demo_pbmc14k/PBMC14k/SJARACNe", package = "scMINER"), full.names = FALSE, recursive = FALSE)
## [1] "B" "CD4TCM" "CD4TN" "CD4Treg" "CD8TN" "Monocyte" "NK"
## file structure of each folder
list.files(system.file("extdata/demo_pbmc14k/PBMC14k/SJARACNe/B", package = "scMINER"), full.names = FALSE, recursive = TRUE, include.dirs = FALSE, pattern = "[^consensus_network_ncol_.txt]")
## [1] "B.8572_1902.exp.txt" "config_cwlexec.json"
## [3] "runSJARACNe.sh" "SIG/B.4148_1902.sig.txt"
## [5] "TF/B.835_1902.tf.txt"
The standard input files of SJARACNe, for each group, include:
- a “
.exp.txt
” file: a tab-separated genes/transcripts/proteins by cells/samples expression matrix with the first two columns being ID and symbol. - a “
TF
” folder containing a “.tf.txt
” file: a list of significant gene/transcript/protein IDs of TF drivers. - a “
SIG
” folder containing a “.sig.txt
” file: a list of significant gene/transcript/protein IDs of SIG drivers. - a bash script (
runSJARACNe.sh
) to run SJARACNe. Further modification is needed to run it. - a json file (
config_cwlexec.json
) containing parameters to run SJARACNe.
Usually, the ground truth of cell types is not available. Then the cluster labels, or cell type annotations of the clusters, can be used for grouping in network rewiring, since it’s expected that cells with same cluster label/annotated cell type are of similar gene expression profiles. To generate from annotated cell types, you can run: