5.2 Filter the sparse eset object
From the quality control report generated above, we have got a better sense about the data quality and the cutoffs to use for filtration. scMINER provides a function, filterSparseEset() for this purpose, and it can work in two modes:
auto
: in this mode, scMINER will use the cutoffs estimated by Median ± 3*MAD (maximum absolute deviation). Based on our tests, in most cases, this mode works well with the matrix of both raw UMI counts and TPM values.manual
: in this mode, the users can manually specify the cutoffs, both low and high, of all 5 metrics: nUMI, nFeature, pctMito, pctSpikeIn for cells, and nCell for genes. No cells or features would be removed under the default cutoffs of each metrics.
No matter which mode to use, filterSparseEset() returns a summary table with detailed information of filtration statistics. You can refer to it and adjust the cutoffs accordingly.
5.2.1 Data filtration with auto mode
To conduct the filtering using the cutoffs recommended by scMINER:
## Filter eSet under the auto mode
pbmc14k_filtered.eset <- filterSparseEset(pbmc14k_raw.eset, filter_mode = "auto", filter_type = "both")
## Checking the availability of the 5 metrics ('nCell', 'nUMI', 'nFeature', 'pctMito', 'pctSpikeIn') used for filtration ...
## Checking passed! All 5 metrics are available.
## Filtration is done!
## Filtration Summary:
## 8846/17986 genes passed!
## 13605/14000 cells passed!
##
## For more details:
## Gene filtration statistics:
## Metrics nCell
## Cutoff_Low 70
## Cutoff_High Inf
## Gene_total 17986
## Gene_passed 8846(49.18%)
## Gene_failed 9140(50.82%)
##
## Cell filtration statistics:
## Metrics nUMI nFeature pctMito pctSpikeIn Combined
## Cutoff_Low 458 221 0 0 NA
## Cutoff_High 3694 Inf 0.0408 0.0000 NA
## Cell_total 14000 14000 14000 14000 14000
## Cell_passed 13826(98.76%) 14000(100.00%) 13778(98.41%) 14000(100.00%) 13605(97.18%)
## Cell_failed 174(1.24%) 0(0.00%) 222(1.59%) 0(0.00%) 395(2.82%)
In some cases, you may find that most of the cutoffs generated by the auto mode are good, except one or two. Though there is no ‘hybrid’ mode, scMINER does allow you to customize some of the cutoffs generated by the auto mode. This can be easily done by adding the cutoffs you would customize under the auto mode:
## Filter eSet under the auto mode, with customized values
pbmc14k_filtered.eset <- filterSparseEset(pbmc14k_raw.eset, filter_mode = "auto", filter_type = "both", gene.nCell_min = 5)
With the code above, scMINER will filter the eSet using all of the cutoffs generated by auto mode, except gene.nCell_min
.
5.2.2 Data filtration with manual mode
To apply the self-customized cutoffs:
## Filter eSet under the manual mode
pbmc14k_filtered.eset <- filterSparseEset(pbmc14k_raw.eset, filter_mode = "manual", filter_type = "both", gene.nCell_min = 10, cell.nUMI_min = 500, cell.nUMI_max = 6500, cell.nFeature_min = 200, cell.nFeature_max = 2500, cell.pctMito_max = 0.1)
For any unspecified cutoff arguments, like gene.nCell_max
, filterSparseEset()
will automatically assign the default values to them. The default values of any cutoff argument would not filter out any cells or features. So, if you want to skip some metrics, just leave the cutoffs of them unspecified. For example, in the codes above, gene.nCell_max
is unspecified. Then filterSparseEset()
wil assign the default value, which is Inf
, to it. No features would be filtered out by this argument.