5.2 Filter the sparse eset object

From the quality control report generated above, we have got a better sense about the data quality and the cutoffs to use for filtration. scMINER provides a function, filterSparseEset() for this purpose, and it can work in two modes:

  • auto: in this mode, scMINER will use the cutoffs estimated by Median ± 3*MAD (maximum absolute deviation). Based on our tests, in most cases, this mode works well with the matrix of both raw UMI counts and TPM values.
  • manual: in this mode, the users can manually specify the cutoffs, both low and high, of all 5 metrics: nUMI, nFeature, pctMito, pctSpikeIn for cells, and nCell for genes. No cells or features would be removed under the default cutoffs of each metrics.

No matter which mode to use, filterSparseEset() returns a summary table with detailed information of filtration statistics. You can refer to it and adjust the cutoffs accordingly.

5.2.1 Data filtration with auto mode

To conduct the filtering using the cutoffs recommended by scMINER:

## Filter eSet under the auto mode
pbmc14k_filtered.eset <- filterSparseEset(pbmc14k_raw.eset, filter_mode = "auto", filter_type = "both")
## Checking the availability of the 5 metrics ('nCell', 'nUMI', 'nFeature', 'pctMito', 'pctSpikeIn') used for filtration ...
## Checking passed! All 5 metrics are available.
## Filtration is done!
## Filtration Summary:
##  8846/17986 genes passed!
##  13605/14000 cells passed!
## 
## For more details:
##  Gene filtration statistics:
##      Metrics     nCell
##      Cutoff_Low  70
##      Cutoff_High Inf
##      Gene_total  17986
##      Gene_passed 8846(49.18%)
##      Gene_failed 9140(50.82%)
## 
##  Cell filtration statistics:
##      Metrics     nUMI        nFeature    pctMito     pctSpikeIn  Combined
##      Cutoff_Low  458     221     0       0       NA
##      Cutoff_High 3694        Inf     0.0408      0.0000      NA
##      Cell_total  14000       14000       14000       14000       14000
##      Cell_passed 13826(98.76%)   14000(100.00%)  13778(98.41%)   14000(100.00%)  13605(97.18%)
##      Cell_failed 174(1.24%)  0(0.00%)    222(1.59%)  0(0.00%)    395(2.82%)

In some cases, you may find that most of the cutoffs generated by the auto mode are good, except one or two. Though there is no ‘hybrid’ mode, scMINER does allow you to customize some of the cutoffs generated by the auto mode. This can be easily done by adding the cutoffs you would customize under the auto mode:

## Filter eSet under the auto mode, with customized values
pbmc14k_filtered.eset <- filterSparseEset(pbmc14k_raw.eset, filter_mode = "auto", filter_type = "both", gene.nCell_min = 5)

With the code above, scMINER will filter the eSet using all of the cutoffs generated by auto mode, except gene.nCell_min.

5.2.2 Data filtration with manual mode

To apply the self-customized cutoffs:

## Filter eSet under the manual mode
pbmc14k_filtered.eset <- filterSparseEset(pbmc14k_raw.eset, filter_mode = "manual", filter_type = "both", gene.nCell_min = 10, cell.nUMI_min = 500, cell.nUMI_max = 6500, cell.nFeature_min = 200, cell.nFeature_max = 2500, cell.pctMito_max = 0.1)

For any unspecified cutoff arguments, like gene.nCell_max, filterSparseEset() will automatically assign the default values to them. The default values of any cutoff argument would not filter out any cells or features. So, if you want to skip some metrics, just leave the cutoffs of them unspecified. For example, in the codes above, gene.nCell_max is unspecified. Then filterSparseEset() wil assign the default value, which is Inf, to it. No features would be filtered out by this argument.