Welcome to my enhanced fork of The Exoplanet Classifier (now renamed to TransitIQ). Originally developed with my teammates from Ontohin 4b for the NASA Space Apps Challenge 2025, this project now represents the upgraded and research-extended version of that submission. To check out the original repository, click here.
The original repository remains archived under Ontohin 4b and licensed as such. This fork exists purely for further research, experimentation, and personal development to make the classifier far more powerful and accurate than the hackathon version.
A robust, data-driven Machine Learning tool that classifies transit data into three categories: Confirmed Exoplanets, False Positives, or Candidates.
This version enhances the original submission with Staking Ensemble Learning, robust Pydantic validation, and a FastAPI backend for high-performance inference. It features extensive preprocessing, imputation, and synthetic oversampling (SMOTE) to ensure a stable and generalizable model.
- TransitIQ πͺ
NASAβs exoplanet survey missions (Kepler, K2, and others) have generated thousands of data points using the transit method β tracking dips in starlight caused by orbiting planets.
These datasets contain both confirmed exoplanets and false positives, and the aim of this project is to build an AI classifier capable of making preliminary predictions on new candidates.
The classifier runs inside a FastAPI-powered web interface, allowing anyone β from students to researchers β to enter transit parameters and instantly receive a prediction.
The goal is to provide a scientifically meaningful, intuitive, and educational experience for users interested in exoplanet research.
Landing Page
Input Fields
Batch Prediction
Output

- Python 3.11 or above β Core programming language
- Pandas, NumPy β Data processing and numerical computation
- Scikit-learn β Pipeline, scaling, imputation, model stacking, metrics
- XGBoost β Gradient boosting-based sub-model for ensemble
- Imbalanced-learn (SMOTE) β Class balancing for improved fairness
- FastAPI β Backend web framework
- HTML/CSS/JavaScript (Vanilla) β Frontend for the interactive web UI
- Marimo Notebook β Used as a sandbox (
notebook/research.py) to experiment with different model architectures, hyperparameters, and feature engineering before finalizingfit.py.
- Python 3.11+
- Docker (Optional, for containerized deployment)
-
Clone the repository
git clone https://github.com/ByteBard58/TransitIQ cd TransitIQ -
Setup Virtual Environment
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt
-
Run the Application
uvicorn app.app:app --reload
Note: On first run, the app will automatically download the required model weights (approx. 200MB) from Hugging Face if they are not found locally.
-
Access the UI Open http://localhost:8000 in your browser.
The application is fully containerized and available on Docker Hub. It supports both ARM64 and AMD64 architectures.
# Pull and run the latest image
docker run --rm -p 8000:8000 bytebard101/exoplanet_classifier:latestIf port 8000 is already in use, map it to a different port:
docker run --rm -p 8001:8000 bytebard101/exoplanet_classifier:latest- Real-time Prediction: Enter transit parameters manually to get instant classification and probability scores.
- Educational Tooltips: Integrated documentation explains the significance of each scientific parameter (Orbital Period, Impact Parameter, etc.).
For large-scale analysis, TransitIQ supports batch processing via CSV upload:
- Navigate to the Batch Upload tab in the UI.
- Upload a
.csvfile containing the required transit features (follow the sample format provided in/data/sample_generator.py). - Download the results or visualize the distribution of predictions directly in the dashboard.
TransitIQ/
βββ app/ # FastAPI Application
β βββ schema/ # Pydantic validation models
β βββ static/ # Frontend assets (CSS, JS, Images)
β βββ templates/ # HTML entry points
βββ data/ # Raw transit datasets and data generators
βββ models/ # ML pipelines and HF integration scripts
βββ notebook/ # Research sandboxes (Marimo / research.py)
βββ tests/ # Pytest suite (API & Schema testing)
βββ .github/ # CI/CD Workflows (Docker Hub & Testing)
βββ Dockerfile # Production container configuration
- Automated Testing: Every push to
maintriggers a GitHub Action that runs thepytestsuite to ensure API stability. - Docker Hub Integration: On successful tests, the application is automatically built for multi-arch support and pushed to Docker Hub.
- Model Hosting: Large serialized model files are hosted on Hugging Face, ensuring the repository remains lightweight while the application can "self-heal" by downloading missing artifacts at runtime.
The upgraded classifier uses a stacking ensemble combining multiple base models with a meta-classifier:
- Base Models:
RandomForestClassifiern_estimators=1000max_depth=Noneclass_weight="balanced"
XGBClassifiern_estimators=1000max_depth=Nonelearning_rate=0.5
- Meta-classifier:
LogisticRegressionsolver="saga"penalty="l2"C=0.1class_weight="balanced"max_iter=5000
The stacking classifier uses 5-fold cross-validation internally and passes original features to the meta-classifier for better learning.
Before feeding data into the model, the following preprocessing steps are applied via a Pipeline:
- Imputation:
SimpleImputer(strategy="mean")to handle missing values. - Scaling:
StandardScalerto normalize features. - Class Balancing:
SMOTE(Synthetic Minority Oversampling Technique) to address class imbalance. - Model Training: Stacking ensemble as described above.
The model uses 13 transit and orbital-related features, including:
- Orbital period, transit epoch, transit depth
- Planetary radius, semi-major axis, inclination
- Equilibrium temperature, insolation, impact parameter
- Radius ratios, density ratios, duration ratios
- Number of observed transits
Targets are mapped as follows:
0β FALSE POSITIVE or REFUTED1β CANDIDATE2β CONFIRMED
- Train/test split: 2/3 training, 1/3 testing with stratification.
- Pipeline is trained end-to-end in
models/fit.py. - Hyperparameters and model choices were extensively tested in
notebook/research.py, which served as a sandbox for experimentation and optimization. - Final trained pipeline is saved as
models/pipe.pkl.
Here is the classification report:
| Class | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| 0 (FALSE POSITIVE / REFUTED) | 0.82 | 0.81 | 0.82 | 1718 |
| 1 (CANDIDATE) | 0.56 | 0.55 | 0.56 | 1118 |
| 2 (CONFIRMED) | 0.79 | 0.81 | 0.80 | 1687 |
Overall Metrics:
- Accuracy: 0.75
- Macro Avg: Precision = 0.72, Recall = 0.72, F1-score = 0.72
- Weighted Avg: Precision = 0.74, Recall = 0.75, F1-score = 0.75
This demonstrates that the upgraded stacking classifier maintains strong performance on confirmed and false positive classes, with room for improvement on candidate predictions.
The model balances accuracy, generalization, and class fairness, making it reliable for preliminary exoplanet classification tasks.
Despite extensive experimentation, this represents the current performance ceiling achievable with the available data.
Numerous optimizations were explored β including hyperparameter tuning, feature scaling, class rebalancing, and ensemble variations β yet further improvements beyond ~0.75 accuracy were not observed.
This indicates a data limitation rather than a model limitation, as the features may not carry additional separable information for higher classification accuracy.
The research process behind this version involved significant model testing and fine-tuning efforts (see notebook/research.py).
Suggestions and improvements are highly welcome β contributions or insights from the community could help push the model beyond its present boundary.
-
NASA Kepler and K2 Mission for providing the training datasets
-
Scikit-learn, XGBoost, and Imbalanced-learn teams for exceptional libraries
-
Inspiration from data science projects exploring real-world astrophysics datasets
-
The scientists who are engaged with exoplanet research. Their problem inspired us to create this project from the ground up
-
Ontohin 4b team for the original NASA SAC 2025 version of this project
Thank you for checking out this upgraded version of The Exoplanet Classifier. This repository is a personal continuation of a NASA Space Apps Challenge project β rebuilt with the intent to learn, improve, and explore the depths of real-world astrophysics through Machine Learning.
Have a great day !