Statistical Arbitrage in Cryptocurrency Markets
Cointegration-Based Pairs Trading on Kraken
This repository contains my quantitative finance capstone project exploring statistical arbitrage opportunities in cryptocurrency markets using cointegration-based pairs trading.
The system ingests historical market data from the Kraken API, identifies statistically related asset pairs, models mean-reverting spreads, and evaluates trading strategies through historical backtesting and simulated execution.
The objective of this project is to demonstrate a complete quantitative research and trading workflow, including:
-
Market data ingestion from exchange APIs
-
Statistical relationship discovery between assets
-
Cointegration testing and hedge ratio estimation
-
Spread construction and z-score normalization
-
Strategy signal generation
-
Backtesting with transaction cost modeling
-
Paper-trading execution framework
The system is implemented in Python and designed as a modular environment suitable for quantitative strategy research, backtesting, and deployment.
- Statistical Arbitrage Strategy
- Cointegration Analysis
- Mean Reversion Modeling
- Backtesting Engine
- Paper Trading Simulation
- Project Overview
- Strategy Pipeline
- Strategy Concept
- Statistical Model
- Trading Rules
- System Architecture
- Spread Mean Reversion
- Example Strategy Results
- Repository Structure
- Research Notebooks
- Backtest Performance
- Backtesting Framework
- Key Research Questions
- Pair Selection Research
- Running the Project
- Future Improvements
- Technologies Used
- Author
Cryptocurrency markets contain many assets that move together due to:
-
sector correlations
-
liquidity flows
-
arbitrage relationships
-
shared market sentiment
Pairs trading attempts to exploit these relationships by identifying cointegrated assets whose price spreads revert to a long-term equilibrium.
Instead of predicting direction, the strategy focuses on relative mispricing.
The workflow is:
-
Identify pairs with strong statistical relationships
-
Construct a spread using a hedge ratio
-
Monitor deviations from the equilibrium
-
Trade when spreads diverge
-
Close trades when spreads revert
This produces a market-neutral statistical arbitrage strategy.
The statistical arbitrage pipeline follows a structured quantitative workflow:
- Download historical market data from the Kraken API
- Clean and align price series
- Identify candidate pairs using correlation filtering
- Perform cointegration testing
- Estimate hedge ratios
- Construct spreads
- Generate z-score signals
- Execute simulated trades
- Evaluate performance metrics
The core idea is that some crypto assets maintain long-run equilibrium relationships.
When the spread temporarily diverges from equilibrium, the strategy:
Short the overpriced asset
Buy the underpriced asset
The trade profits when the spread reverts toward its historical mean.
The spread between two assets is defined as:
Spread = Price_A − β × Price_B
Where:
-
β (beta) is the hedge ratio estimated using regression.
-
The spread should be stationary if the pair is cointegrated.
To normalize deviations, we compute a z-score:
Z = (Spread − Mean) / Standard Deviation
Trading signals are generated when the z-score crosses threshold levels.
| Condition | Action |
|---|---|
| Z > +2 | Short spread |
| Z < −2 | Long spread |
| Z returns to 0 | Close trade |
Buy Asset A
Sell Asset B
Sell Asset A
Buy Asset B
This creates a market-neutral portfolio where profit depends on relative price movement, not overall market direction.
The system is structured as a modular quantitative research and execution pipeline.
Once a cointegrated pair is identified, a spread is constructed using the hedge ratio.
Spread = Price_A − β × Price_B
The spread is normalized using a z-score to identify deviations from equilibrium.
Trading signals occur when the spread moves outside statistical thresholds.
Preliminary backtests demonstrate that statistically cointegrated crypto pairs can exhibit strong mean-reverting behavior.
Key observations from initial experiments:
- Spreads frequently revert within short time windows
- Z-score thresholds provide effective signal triggers
- Strategy performance is sensitive to transaction costs
- Multiple pairs improve diversification
Detailed performance metrics and visualizations are shown below.
Quant_Capstone/
│
├── configs/ # Strategy parameters and exchange configuration
│
├── data/ # Raw and processed market data
│ ├── raw/
│ ├── processed/
│ └── signals/
│
├── research/ # Statistical research modules
│ ├── cointegration.py
│ ├── pair_selection.py
│ └── spread_model.py
│
├── strategies/ # Trading strategy logic
│ └── pairs_trading_strategy.py
│
├── execution/ # Paper trading and execution engine
│ └── paper_trader.py
│
├── scripts/ # Data ingestion pipelines
│ └── download_kraken_data.py
│
├── notebooks/ # Research notebooks
│
├── utils/ # Shared utilities
│
├── logs/ # Backtest and execution logs
│
├── main.py # Project entry point
│
└── README.md
The research process is documented in Jupyter notebooks.
| Notebook | Purpose |
|---|---|
01_data_exploration.ipynb |
Explore cryptocurrency price data |
02_cointegration_analysis.ipynb |
Identify statistically related asset pairs |
03_strategy_backtest.ipynb |
Evaluate pairs trading strategy |
04_backtest_results.ipynb |
Results of the backtests |
05_strategy_backtest.ipynb |
Results of the strategy backtests |
The strategy is evaluated using historical market data.
Performance metrics include:
• Sharpe Ratio
• Maximum Drawdown
• Win Rate
• Profit Factor
• Average Trade Duration
Equity curves allow visualization of cumulative strategy performance over time.
The backtesting module evaluates the strategy using historical data with:
Rolling cointegration tests
Dynamic hedge ratio estimation
Transaction cost modeling
Position sizing
Risk management rules
Key metrics evaluated:
Sharpe Ratio
Maximum Drawdown
Win Rate
Profit Factor
Average Trade Duration
• Do cryptocurrency assets exhibit statistically stable cointegration relationships?
• How persistent are these relationships over time?
• Can mean-reverting spreads generate consistent risk-adjusted returns?
• How sensitive is the strategy to transaction costs and execution latency?
This makes it read like academic quant research, which professors and hiring managers like.
To identify potential trading pairs, the system evaluates statistical relationships between assets using correlation and cointegration tests.
Below is an example visualization used during the research phase.
Cointegration heatmaps allow quick identification of asset pairs that may form stable mean-reverting spreads.
Clone the repository:
git clone https://github.com/MarkRobertson67/Quant_Capstone.git
Create environment:
pip install -r requirements.txt
Download data:
python scripts/download_kraken_data.py
Run research pipeline:
python main.py
Potential extensions include:
-
Kalman filter dynamic hedge ratios
-
Machine learning pair selection
-
Multi-pair portfolio optimization
-
Real-time execution integration
-
Risk-adjusted capital allocation
-
Regime detection
-
Python
-
Pandas
-
NumPy
-
Statsmodels
-
SciPy
-
Kraken API
Mark Robertson
GitHub https://github.com/MarkRobertson67
This repository was developed as part of a quantitative finance capstone project focused on statistical arbitrage and algorithmic trading.
Note: Figures shown in this README are placeholder visuals during the initial repository setup phase. They will be replaced with actual research outputs, spread charts, and backtest results as the project develops.



