SCReadCounts Usage
Synopsis
Graphical User Interface:
scReadCounts
Command-line:
scReadCounts -r <bam_file> -s <snv_list_file> -o <output_file> [options]
Description
SCReadCounts has two programs. The program readCounts requires two input files: a pooled single cell alignment and a list of genomic positions of interest. readCounts utilizes the barcode information from the pooled single cell alignments and outputs the variant and reference read counts (nvar and nref, respectively), for each barcode (cell) present in the barcodes.tsv
file, in a tab separated text file. This file is then used as an input for the second program - readCountsMatrix - which, upon providing an output prefix, generates two outputs: (1) a cell-position matrix with n_var and n_ref estimates, and (2) a cell-position matrix with the expressed variant allele fraction (VAFRNA = nvar / (nvar + nref)). VAFRNA is estimated at a user-defined threshold of minimum required sequencing reads (minR); default minR = 5. readCountsMatrix is time-efficient and can be re-run multiple times at various minR thresholds.
Some methods for extracting cell-barcodes from the BAM files (see Read Grouping) can use a file of cell-barcodes to restrict these to an acceptable list. The STARsolo cell-barcode read grouping method, for example, presumes valid cell-barcodes are in the file barcodes.tsv
and the current working directory (from where the script is executed). See also, the Valid Read Groups options (-b, –barcode_acceptlist command-line options) below.
Graphical User Interface
Click the help icon (question mark) at the top right of the GUI and then an input field for help. Multiple files can be selected in the file-chooser using Ctrl-Click or Shift-Click. Fields can be reset to their default values using the Reset button. Click OK to execute SCReadCounts.
Additional GUI option tabs are documented below.
Options
SNVs, -s SNVS, –snvs=SNVS
Single-nucleotide-polymophisms (SNVs). Tabular and VCF format SNVs are supported. Multiple files are specified inside quotes, separated by spaces, and by using file globbing. The list of genomic positions of interest is accepted in a tab-separated format with no header, and contains the chromosome, position, reference and variant nucleotide. Examples of genomic positions of interest include single nucleotide variant (SNV) sites, somatic mutations, or RNA-editing loci. List of genomic positions of interest can be generated from a variant call on the corresponding datasets, or pre-defined from existing sources, such as COSMIC or dbSNP. See Input Files for more information.
Read Alignment Files, -r ALIGNMENTS, –readalignments=ALIGNMENTS
Read alignments files in indexed BAM format, with extension
.bam
. BAM index with extension.bam.bai
must be located in the same directory. Multiple files are specified inside quotes, separated by spaces, and by using file globbing. scReadCounts accepts alignment files generated by popular aligning tools; the test dataset uses a STAR-generated alignment. See Input Files for more information.
Cell Barcode, -C CELLBARCODE, –cellbarcode=CELLBARCODE
Group reads based on cell-barcodes extracted from read name/identifiers or BAM-file tags. See Read Grouping for more details. Default: UMI-tools.
UMI Count, -U UMICOUNT, –umicount=UMICOUNT
Count unique identifiers (UMI) based on read name/identifiers or BAM-file tags. Default: None, count reads not UMIs.
Alignment Filter, -f FILTER, –alignmentfilter
Alignment filtering strategy. See Read Filtering for more details. Default: Basic.
Output Folder, -o OUTPUT, –output=OUTPUT
Output file. Requires extension-specific filenames. Accecptable extensions: csv, tsv, xls, xlsx eg: output.csv. Note: All the 3 outputs will have this extension.
–version
Show program’s version number and exit.
-h, –help
Show program help and exit.
Advanced Options
Min. Reads, -m MINREADS, –minreads=MINREADS
Minimum number of good reads at each SNV locus per alignment file. Default=5. This affects only VAF calculations.
Max. Reads, -m MAXREADS, –maxreads=MAXREADS
Scale read counts at high-coverage loci to ensure at most this many good reads at SNV locus per alignment file. Values greater than 1 indicate absolute read counts, otherwise the value indicates the coverage distribution percentile. Default=No maximum.
Directional, -D, –directional
Output directional (forward and reverse complement) VAF and read counts. Default: False
Valid Cell Barcodes, -b BARCODES, –barcode_acceptlist BARCODES
File of white-space separated, acceptable cell-barcodes. Overrides accept list, if any, specified by Cell Barcode option. Use None to remove a default accept list.
Threads, -t THREADS, –threads=THREADS
Worker threads. Default: 0, indicating single-threaded serial execution.
Force, -F, –force
Force all output files to be re-computed, even if already present. Default: False.
Quiet, -q, –quiet
Quiet. Default=verbose.