gbcms

Complete orientation-aware counting system for genomic variants

Features

🚀 High Performance: Rust-powered core engine with multi-threading
🧬 Complete Variant Support: SNP, MNP, insertion, deletion, and complex variants (DelIns, SNP+Indel)
🧪 WFA + PairHMM Phase 3: Pangenomic fast-path WFA alignment with PairHMM fallback for complex multi-allelic classification
📊 Orientation-Aware: Forward and reverse strand analysis with fragment counting
📏 mFSD (Mutant Fragment Size Distribution): Per-allele cfDNA fragment size profiling with KS test and log-likelihood ratio
🔬 Statistical Analysis: Fisher's exact test for strand bias (read-level and fragment-level)
📁 Flexible I/O: VCF and MAF input/output formats
🎯 Quality Filters: 8 configurable read and quality filtering options with heuristic BAQ
🧬 RNA Mode: Transcriptome-aware counting with strandedness, splice detection, and A-to-I editing
🔗 UMI Support: Molecule-level deduplication with UMI-aware fragment grouping
🔧 Normalize Command: Standalone variant normalization (left-align + REF validation) without counting

Installation

Quick install:

pip install gbcms

From source (requires Rust):

git clone https://github.com/msk-access/gbcms.git
cd gbcms
pip install .

Docker:

docker pull ghcr.io/msk-access/gbcms:X.Y.Z  # Replace X.Y.Z with latest from PyPI

💡 Find the latest version on PyPI or GHCR.

📖 Full documentation: https://msk-access.github.io/gbcms/

Usage

gbcms can be used in two ways:

🔧 Option 1: Standalone CLI (1-10 samples)

Best for: Quick analysis, local processing, direct control

gbcms dna \
    --variants variants.vcf \
    --bam sample1.bam \
    --fasta reference.fa \
    --output-dir results/

Output: results/sample1.vcf

Learn more:

🔄 Option 2: Nextflow Workflow (10+ samples, HPC)

Best for: Many samples, HPC clusters (SLURM), reproducible pipelines

nextflow run nextflow/main.nf \
    --input samplesheet.csv \
    --variants variants.vcf \
    --fasta reference.fa \
    --mode dna \
    -profile slurm

Features:

✅ Automatic parallelization across samples
✅ SLURM/HPC integration
✅ Container support (Docker/Singularity)
✅ Resume failed runs

Learn more:

🔄 Nextflow Workflow Guide
📋 Usage Patterns Comparison

Which Should I Use?

Scenario	Recommendation
1-10 samples, local machine	CLI
10+ samples, HPC cluster	Nextflow
Quick ad-hoc analysis	CLI
Production pipeline	Nextflow
Need auto-parallelization	Nextflow
Full manual control	CLI

Quick Examples

CLI: DNA Single Sample

gbcms dna \
    --variants variants.vcf \
    --bam tumor.bam \
    --fasta hg19.fa \
    --output-dir results/ \
    --threads 4

CLI: RNA-seq

gbcms rna \
    --variants variants.vcf \
    --bam rna_sample:aligned.bam \
    --fasta hg19.fa \
    --rna-editing-db TABLE1_hg38.txt.gz \
    --output-dir results/

CLI: Normalize Variants

gbcms normalize \
    --variants variants.vcf \
    --fasta hg19.fa \
    --output-dir results/

CLI: Multiple Samples (Sequential)

gbcms dna \
    --variants variants.vcf \
    --bam-list samples.txt \
    --fasta hg19.fa \
    --output-dir results/

Nextflow: Many Samples (Parallel)

# samplesheet.csv:
# sample,bam,bai
# tumor1,/path/to/tumor1.bam,
# tumor2,/path/to/tumor2.bam,

nextflow run nextflow/main.nf \
    --input samplesheet.csv \
    --variants variants.vcf \
    --fasta hg19.fa \
    --mode dna \
    --outdir results \
    -profile slurm

Documentation

📚 Full Documentation: https://msk-access.github.io/gbcms/

Quick Links:

Contributing

See CONTRIBUTING.md for development guidelines.

To contribute to documentation, see the gh-pages branch.

Citation

If you use gbcms in your research, please cite:

Shah, R. et al. (2026). gbcms: A high-performance orientation-aware genotype counting system for genomic variants. Available at: https://github.com/msk-access/gbcms

BibTeX:

@software{pygbcms,
  author       = {Shah, Ronak and contributors},
  title        = {gbcms: A high-performance orientation-aware genotype counting system for genomic variants},
  year         = {2026},
  url          = {https://github.com/msk-access/gbcms},
  note         = {GitHub repository}
}

License

AGPL-3.0 - see LICENSE for details.

Support

🐛 Issues: https://github.com/msk-access/gbcms/issues
💬 Discussions: https://github.com/msk-access/gbcms/discussions

Name		Name	Last commit message	Last commit date
Latest commit History 318 Commits
.agent/rules		.agent/rules
.github/workflows		.github/workflows
compat/py-gbcms		compat/py-gbcms
docs		docs
nextflow		nextflow
rust		rust
scripts		scripts
src		src
tests		tests
.gitbook.yaml		.gitbook.yaml
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
git-flow-helper.sh		git-flow-helper.sh
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gbcms

Features

Installation

Usage

🔧 Option 1: Standalone CLI (1-10 samples)

🔄 Option 2: Nextflow Workflow (10+ samples, HPC)

Which Should I Use?

Quick Examples

CLI: DNA Single Sample

CLI: RNA-seq

CLI: Normalize Variants

CLI: Multiple Samples (Sequential)

Nextflow: Many Samples (Parallel)

Documentation

Contributing

Citation

License

Support

About

Uh oh!

Releases 13

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gbcms

Features

Installation

Usage

🔧 Option 1: Standalone CLI (1-10 samples)

🔄 Option 2: Nextflow Workflow (10+ samples, HPC)

Which Should I Use?

Quick Examples

CLI: DNA Single Sample

CLI: RNA-seq

CLI: Normalize Variants

CLI: Multiple Samples (Sequential)

Nextflow: Many Samples (Parallel)

Documentation

Contributing

Citation

License

Support

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 13

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages