Complete orientation-aware counting system for genomic variants
- π High Performance: Rust-powered core engine with multi-threading
- 𧬠Complete Variant Support: SNP, MNP, insertion, deletion, and complex variants (DelIns, SNP+Indel)
- π§ͺ WFA + PairHMM Phase 3: Pangenomic fast-path WFA alignment with PairHMM fallback for complex multi-allelic classification
- π Orientation-Aware: Forward and reverse strand analysis with fragment counting
- π mFSD (Mutant Fragment Size Distribution): Per-allele cfDNA fragment size profiling with KS test and log-likelihood ratio
- π¬ Statistical Analysis: Fisher's exact test for strand bias (read-level and fragment-level)
- π Flexible I/O: VCF and MAF input/output formats
- π― Quality Filters: 8 configurable read and quality filtering options with heuristic BAQ
- 𧬠RNA Mode: Transcriptome-aware counting with strandedness, splice detection, and A-to-I editing
- π UMI Support: Molecule-level deduplication with UMI-aware fragment grouping
- π§ Normalize Command: Standalone variant normalization (left-align + REF validation) without counting
Quick install:
pip install gbcmsFrom source (requires Rust):
git clone https://github.com/msk-access/gbcms.git
cd gbcms
pip install .Docker:
docker pull ghcr.io/msk-access/gbcms:X.Y.Z # Replace X.Y.Z with latest from PyPIπ Full documentation: https://msk-access.github.io/gbcms/
gbcms can be used in two ways:
Best for: Quick analysis, local processing, direct control
gbcms dna \
--variants variants.vcf \
--bam sample1.bam \
--fasta reference.fa \
--output-dir results/Output: results/sample1.vcf
Learn more:
- π CLI Quick Start
- π CLI Reference β DNA
- π CLI Reference β RNA
- π CLI Reference β Normalize
Best for: Many samples, HPC clusters (SLURM), reproducible pipelines
nextflow run nextflow/main.nf \
--input samplesheet.csv \
--variants variants.vcf \
--fasta reference.fa \
--mode dna \
-profile slurmFeatures:
- β Automatic parallelization across samples
- β SLURM/HPC integration
- β Container support (Docker/Singularity)
- β Resume failed runs
Learn more:
| Scenario | Recommendation |
|---|---|
| 1-10 samples, local machine | CLI |
| 10+ samples, HPC cluster | Nextflow |
| Quick ad-hoc analysis | CLI |
| Production pipeline | Nextflow |
| Need auto-parallelization | Nextflow |
| Full manual control | CLI |
gbcms dna \
--variants variants.vcf \
--bam tumor.bam \
--fasta hg19.fa \
--output-dir results/ \
--threads 4gbcms rna \
--variants variants.vcf \
--bam rna_sample:aligned.bam \
--fasta hg19.fa \
--rna-editing-db TABLE1_hg38.txt.gz \
--output-dir results/gbcms normalize \
--variants variants.vcf \
--fasta hg19.fa \
--output-dir results/gbcms dna \
--variants variants.vcf \
--bam-list samples.txt \
--fasta hg19.fa \
--output-dir results/# samplesheet.csv:
# sample,bam,bai
# tumor1,/path/to/tumor1.bam,
# tumor2,/path/to/tumor2.bam,
nextflow run nextflow/main.nf \
--input samplesheet.csv \
--variants variants.vcf \
--fasta hg19.fa \
--mode dna \
--outdir results \
-profile slurmπ Full Documentation: https://msk-access.github.io/gbcms/
Quick Links:
- Installation
- CLI Quick Start
- Nextflow Workflow
- CLI Reference β DNA
- CLI Reference β RNA
- CLI Reference β Normalize
- Input Formats
- Output Formats
- Architecture
See CONTRIBUTING.md for development guidelines.
To contribute to documentation, see the gh-pages branch.
If you use gbcms in your research, please cite:
Shah, R. et al. (2026). gbcms: A high-performance orientation-aware genotype counting system for genomic variants. Available at: https://github.com/msk-access/gbcms
BibTeX:
@software{pygbcms,
author = {Shah, Ronak and contributors},
title = {gbcms: A high-performance orientation-aware genotype counting system for genomic variants},
year = {2026},
url = {https://github.com/msk-access/gbcms},
note = {GitHub repository}
}AGPL-3.0 - see LICENSE for details.
- π Issues: https://github.com/msk-access/gbcms/issues
- π¬ Discussions: https://github.com/msk-access/gbcms/discussions