smoothquant

Here are 3 public repositories matching this topic...

intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

Updated Mar 19, 2026
Python

ModelTC / LightCompress

Star

[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.

benchmark deployment tool evaluation pruning quantization wan awq large-language-models llm token-pruning vllm smoothquant token-reduction mixtral internlm2 token-merging deepseek-v3

Updated Mar 11, 2026
Python

luckystar-pear / llm-compress

Star

Compress context data to optimize memory and performance in C++ large language model applications within the llm-cpp toolkit.

nlp cli lightweight sparsity tool evaluation developer-tools pruning wan awq llm fastertransformer smoothquant token-reduction codellama internlm2 token-merging llama3 deepseek-v3

Updated Mar 19, 2026
C++

Improve this page

Add a description, image, and links to the smoothquant topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the smoothquant topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly