🎓 RL Assignment: CartPole Mastery

Welcome to your Reinforcement Learning assignment!

In this project, you will step beyond simply "solving" the CartPole environment. Your goal is to design a reward function that trains an agent to be robust, smooth, and precise.

📂 File Structure (What's what?)

student_assignment_dqn.py: [🛠️ Your Workspace] This is the ONLY file you need to modify. It contains the DQN training loop and the reward function you need to design.
evaluate_performance.py: [📊 Score Card] Runs your trained model to calculate a comprehensive score (Survival, Centering, Smoothness).
evaluate_challenge.py: [🌪️ The Challenge] Tests your agent in a "windy" environment where random forces push the pole.
cartpole_basic.py: [📉 Baseline] A simple rule-based agent (if-else logic) for comparison.
requirements_*.txt: Dependency files for Linux and Windows.

🚀 Step-by-Step Guide

1. Environment Setup

First, create a clean Python environment to run the project.

Using Conda:

# 1. Create environment
conda create -n rl_hw python=3.10 -y

# 2. Activate environment
conda activate rl_hw

Option 1: Auto Install (Recommended - auto-detects GPU/CPU)

# Linux/Mac one-liner: auto-detects GPU and installs the appropriate PyTorch version
bash install.sh

Option 2: Manual Install

Option A: GPU environment (requires NVIDIA GPU + CUDA)

# For Linux/Mac:
pip install -r requirements_linux.txt

# For Windows:
pip install -r requirements_win.txt

Option B: CPU environment (no GPU, or CUDA install failed)

If your machine does not have an NVIDIA GPU, or the GPU version fails to install, install the CPU-only version of PyTorch first:

# 1. Install CPU-only PyTorch
pip install torch --index-url https://download.pytorch.org/whl/cpu

# 2. Install remaining dependencies (torch is already installed, pip will skip it)
# For Linux/Mac:
pip install -r requirements_linux.txt
# For Windows:
pip install -r requirements_win.txt

Note: The model in this project is very small (3-layer fully connected network). CPU training is perfectly sufficient — no GPU required to complete all tasks.

Verify installation:

python -c "import torch; print(f'PyTorch {torch.__version__}, Device: {\"cuda\" if torch.cuda.is_available() else \"cpu\"}')"

2. Dry Run: Test the Baseline

Before training anything, let's see how a "dumb" rule-based agent performs. This gives you a baseline score to beat.

Run the simple agent (no rendering):

python cartpole_basic.py

Run with visual rendering (recommended):

python cartpole_dqn_run.py --baseline

Observation: The pole wobbles heavily and the agent collapses quickly (~40 steps).

Check its score:

python evaluate_performance.py --baseline

Expected Score: Around 20-25 points. It fails Survival (~3/40) and Smoothness (0/30) due to heavy wobble and quick pole collapse. Centering may appear decent (~19/30) simply because the episodes are too short for the cart to drift far.

3. First Training Run (No changes)

Now, let's run the DQN training script as-is to make sure everything works.

python student_assignment_dqn.py

What happens: It trains a DQN agent for 2000 episodes using a "Dummy Reward" (currently just basic survival).
Output: It saves a model file named student_model.pth.
Plot: It generates student_training_curve.png showing the reward over time.

4. 🧠 Your Task: Reward Engineering

This is the core of the assignment.

Open student_assignment_dqn.py in your editor.
Locate the function calculate_custom_reward.
MODIFY IT!

Currently, the reward logic is very basic. You need to design a reward function that encourages:

Centering: Penalize the agent if state[0] (Cart Position) is far from 0.
Stability: Penalize the agent if state[3] (Pole Angular Velocity) is high.

Example Logic (Pseudocode):

reward = 1.0  # Survival
reward -= abs(cart_position) * 0.5  # Penalize drift
reward -= abs(pole_velocity) * 0.1  # Penalize wobble

After modifying the code, re-run the training:

python student_assignment_dqn.py

5. Self-Evaluation

How good is your new model?

1. Quantitative Score (The Grade):

python evaluate_performance.py

Target Score: > 75
It evaluates 50 episodes and scores you on Survival, Centering, and Smoothness.

2. Visual Inspection (The Eye Test): Want to see your agent in action? Add the --render flag:

python evaluate_performance.py --render

Watch: Does the cart stay in the middle? Does the pole shake or stay still?

3. The Robustness Challenge (The Final Boss): Can your agent survive being pushed?

python evaluate_challenge.py

Random forces will push the pole. A robust policy should recover quickly.
Pass Condition: Survive > 300 steps on average.

🏆 Grading Criteria

Component	Weight	Description
Survival	40%	Can it stay alive for 500 steps?
Centering	30%	Does it stay near x=0? (Penalty if \|x\| > 0.2)
Smoothness	30%	Is the movement stable? (Penalty if \|angular_vel\| > 0.2)
Robustness	Pass/Fail	Must survive > 300 steps in `evaluate_challenge.py`

📋 Assignment Rules

What you are allowed to modify

calculate_custom_reward() in student_assignment_dqn.py — This is the primary task. Design your reward function here.
Training hyperparameters — You may adjust values such as episodes, learning rate (lr), batch_size, gamma, epsilon_decay, etc. to improve training.

What you must NOT modify

The DQN network architecture (class DQN)
The evaluation scripts: evaluate_performance.py and evaluate_challenge.py
Any other files outside of student_assignment_dqn.py

Submission

Submit the following two files only:

student_assignment_dqn.py — your modified training script
student_model.pth — the trained model weights produced by your script

🙏 Acknowledgments & References

This assignment is adapted from the classic CartPole-v1 environment provided by Farama Foundation Gymnasium.

Special thanks to the open-source community for the initial implementations of DQN algorithms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎓 RL Assignment: CartPole Mastery

📂 File Structure (What's what?)

🚀 Step-by-Step Guide

1. Environment Setup

2. Dry Run: Test the Baseline

3. First Training Run (No changes)

4. 🧠 Your Task: Reward Engineering

5. Self-Evaluation

🏆 Grading Criteria

📋 Assignment Rules

What you are allowed to modify

What you must NOT modify

Submission

🙏 Acknowledgments & References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
cartpole_basic.py		cartpole_basic.py
cartpole_dqn_run.py		cartpole_dqn_run.py
evaluate_challenge.py		evaluate_challenge.py
evaluate_performance.py		evaluate_performance.py
install.sh		install.sh
requirements_linux.txt		requirements_linux.txt
requirements_win.txt		requirements_win.txt
student_assignment_dqn.py		student_assignment_dqn.py

Folders and files

Latest commit

History

Repository files navigation

🎓 RL Assignment: CartPole Mastery

📂 File Structure (What's what?)

🚀 Step-by-Step Guide

1. Environment Setup

2. Dry Run: Test the Baseline

3. First Training Run (No changes)

4. 🧠 Your Task: Reward Engineering

5. Self-Evaluation

🏆 Grading Criteria

📋 Assignment Rules

What you are allowed to modify

What you must NOT modify

Submission

🙏 Acknowledgments & References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages