A from-scratch implementation of a Transformer-based language model, built to understand the internals of modern LLMs: tokenization, attention, training, and text generation.
LEmma implements the core components of the original "Attention Is All You Need" Transformer:
- Multi-Head Self-Attention — allows each token to attend to every other token in the sequence
- Position-wise Feed-Forward Network — processes each token representation independently
- Sinusoidal Positional Encoding — injects token position information into embeddings
- Residual Connections + Layer Norm — pre-norm style for stable training
- Custom Character Tokenizer: Simple character-level tokenizer (
CharTokenizer) for converting text into token sequences and back. - Transformer Model: Implements a minimal Transformer with multi-head self-attention, feed-forward layers, and positional encoding.
- Training Pipeline: Training script for feeding text data into the Transformer and optimizing with cross-entropy loss.
- Text Generation: Sampling script with temperature and top-k support for generating sequences from a trained model.
LEmma/
├── src/
│ └── lemma/
│ ├── models/
│ │ ├── attention.py
│ │ ├── feedforward.py
│ │ ├── positionalencoding.py
│ │ ├── transformerblock.py
│ │ └── transformer.py
│ ├── tokenizer/
│ │ ├── chartokenizer.py
│ │ └── huggingface_tokenizer.py
│ └── utils/
│ ├── train.py
│ ├── sample.py
│ ├── prepare_data.py
│ └── hf_tokenize.py
├── data/
├── checkpoints/
├── configs/
├── tests/
├── pyproject.toml
└── README.md
python -m venv .venv
source .venv/bin/activate
pip install -e .Train a model:
python src/lemma/utils/train.pyGenerate text:
python src/lemma/utils/sample.py