Skip to content

msk-access/gbcms

Repository files navigation

gbcms

Complete orientation-aware counting system for genomic variants

Tests Python 3.10+ Ask DeepWiki

Features

  • πŸš€ High Performance: Rust-powered core engine with multi-threading
  • 🧬 Complete Variant Support: SNP, MNP, insertion, deletion, and complex variants (DelIns, SNP+Indel)
  • πŸ§ͺ WFA + PairHMM Phase 3: Pangenomic fast-path WFA alignment with PairHMM fallback for complex multi-allelic classification
  • πŸ“Š Orientation-Aware: Forward and reverse strand analysis with fragment counting
  • πŸ“ mFSD (Mutant Fragment Size Distribution): Per-allele cfDNA fragment size profiling with KS test and log-likelihood ratio
  • πŸ”¬ Statistical Analysis: Fisher's exact test for strand bias (read-level and fragment-level)
  • πŸ“ Flexible I/O: VCF and MAF input/output formats
  • 🎯 Quality Filters: 8 configurable read and quality filtering options with heuristic BAQ
  • 🧬 RNA Mode: Transcriptome-aware counting with strandedness, splice detection, and A-to-I editing
  • πŸ”— UMI Support: Molecule-level deduplication with UMI-aware fragment grouping
  • πŸ”§ Normalize Command: Standalone variant normalization (left-align + REF validation) without counting

Installation

Quick install:

pip install gbcms

From source (requires Rust):

git clone https://github.com/msk-access/gbcms.git
cd gbcms
pip install .

Docker:

docker pull ghcr.io/msk-access/gbcms:X.Y.Z  # Replace X.Y.Z with latest from PyPI

πŸ’‘ Find the latest version on PyPI or GHCR.

πŸ“– Full documentation: https://msk-access.github.io/gbcms/


Usage

gbcms can be used in two ways:

πŸ”§ Option 1: Standalone CLI (1-10 samples)

Best for: Quick analysis, local processing, direct control

gbcms dna \
    --variants variants.vcf \
    --bam sample1.bam \
    --fasta reference.fa \
    --output-dir results/

Output: results/sample1.vcf

Learn more:


πŸ”„ Option 2: Nextflow Workflow (10+ samples, HPC)

Best for: Many samples, HPC clusters (SLURM), reproducible pipelines

nextflow run nextflow/main.nf \
    --input samplesheet.csv \
    --variants variants.vcf \
    --fasta reference.fa \
    --mode dna \
    -profile slurm

Features:

  • βœ… Automatic parallelization across samples
  • βœ… SLURM/HPC integration
  • βœ… Container support (Docker/Singularity)
  • βœ… Resume failed runs

Learn more:


Which Should I Use?

Scenario Recommendation
1-10 samples, local machine CLI
10+ samples, HPC cluster Nextflow
Quick ad-hoc analysis CLI
Production pipeline Nextflow
Need auto-parallelization Nextflow
Full manual control CLI

Quick Examples

CLI: DNA Single Sample

gbcms dna \
    --variants variants.vcf \
    --bam tumor.bam \
    --fasta hg19.fa \
    --output-dir results/ \
    --threads 4

CLI: RNA-seq

gbcms rna \
    --variants variants.vcf \
    --bam rna_sample:aligned.bam \
    --fasta hg19.fa \
    --rna-editing-db TABLE1_hg38.txt.gz \
    --output-dir results/

CLI: Normalize Variants

gbcms normalize \
    --variants variants.vcf \
    --fasta hg19.fa \
    --output-dir results/

CLI: Multiple Samples (Sequential)

gbcms dna \
    --variants variants.vcf \
    --bam-list samples.txt \
    --fasta hg19.fa \
    --output-dir results/

Nextflow: Many Samples (Parallel)

# samplesheet.csv:
# sample,bam,bai
# tumor1,/path/to/tumor1.bam,
# tumor2,/path/to/tumor2.bam,

nextflow run nextflow/main.nf \
    --input samplesheet.csv \
    --variants variants.vcf \
    --fasta hg19.fa \
    --mode dna \
    --outdir results \
    -profile slurm

Documentation

πŸ“š Full Documentation: https://msk-access.github.io/gbcms/

Quick Links:


Contributing

See CONTRIBUTING.md for development guidelines.

To contribute to documentation, see the gh-pages branch.


Citation

If you use gbcms in your research, please cite:

Shah, R. et al. (2026). gbcms: A high-performance orientation-aware genotype counting system for genomic variants. Available at: https://github.com/msk-access/gbcms

BibTeX:

@software{pygbcms,
  author       = {Shah, Ronak and contributors},
  title        = {gbcms: A high-performance orientation-aware genotype counting system for genomic variants},
  year         = {2026},
  url          = {https://github.com/msk-access/gbcms},
  note         = {GitHub repository}
}

License

AGPL-3.0 - see LICENSE for details.


Support

About

A high-performance orientation-aware genotype counting system for genomic variants

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors