ctm-dqn/README.md

174 lines
5.6 KiB
Markdown

# CTM-DQN: Dynamic Speed Limit Control System
A Deep Q-Network (DQN) based dynamic speed limit control system using the Cell Transmission Model (CTM) for traffic flow simulation.
## Project Structure
```
ctm/
├── config.yaml # Configuration file for all parameters
├── main.py # Main entry point
├── ctm_model.py # Cell Transmission Model implementation
├── dqn_agent.py # DQN agent with replay buffer
├── environment.py # Training environment
├── train.py # Training script
├── test.py # Testing/evaluation script
├── utils.py # Utility functions
├── checkpoints/ # Saved model checkpoints (created automatically)
└── logs/ # Training logs and plots (created automatically)
```
## Features
- **CTM Traffic Model**: Realistic highway traffic flow simulation
- **DQN Agent**: Deep reinforcement learning for speed limit control
- **Flexible Configuration**: Easy parameter adjustment via YAML config
- **Training & Testing**: Separate modes for model training and evaluation
- **Visualization**: Automatic plotting of training results and traffic patterns
- **Checkpointing**: Regular model saving during training
## Installation
1. Install dependencies using uv:
```bash
uv sync
```
Or manually install:
```bash
pip install torch numpy matplotlib pyyaml tqdm
```
## Quick Start
### Training
Train the DQN agent with default configuration:
```bash
python main.py --mode train
```
Train with custom configuration:
```bash
python main.py --mode train --config custom_config.yaml
```
### Testing
Test the trained model:
```bash
python main.py --mode test
```
Test with specific model checkpoint:
```bash
python main.py --mode test --model checkpoints/model_episode_500.pt
```
## Configuration
All parameters can be adjusted in [config.yaml](config.yaml). Key configuration sections:
### Environment Parameters
- `num_cells`: Number of road cells (default: 10)
- `cell_length`: Length of each cell in meters (default: 500.0)
- `free_flow_speed`: Free flow speed in m/s (default: 30.0)
- `demand_pattern`: Traffic demand pattern - "constant", "sine", or "random"
- `num_speed_actions`: Number of discrete speed limit actions (default: 5)
### DQN Agent Parameters
- `hidden_layers`: Neural network architecture (default: [128, 128])
- `learning_rate`: Learning rate (default: 0.0001)
- `gamma`: Discount factor (default: 0.99)
- `epsilon_start/end/decay`: Exploration parameters
- `buffer_size`: Replay buffer capacity (default: 50000)
- `batch_size`: Training batch size (default: 64)
### Training Parameters
- `num_episodes`: Number of training episodes (default: 500)
- `save_freq`: Model checkpoint frequency (default: 50)
- `log_freq`: Logging frequency (default: 10)
### Reward Function Weights
- `throughput_weight`: Weight for throughput reward (default: 1.0)
- `speed_weight`: Weight for average speed reward (default: 0.5)
- `density_weight`: Weight for density penalty (default: -0.3)
- `action_change_weight`: Weight for action change penalty (default: -0.1)
## Customization Guide
### Changing Traffic Scenarios
Edit the environment parameters in [config.yaml](config.yaml):
```yaml
environment:
demand_pattern: "sine" # Change to "constant" or "random"
demand_mean: 2000.0 # Adjust traffic demand
num_cells: 15 # Increase road length
```
### Modifying DQN Architecture
Adjust the neural network structure:
```yaml
agent:
hidden_layers: [256, 256, 128] # Deeper network
learning_rate: 0.0005 # Faster learning
gamma: 0.95 # Different discount factor
```
### Tuning Reward Function
Balance different objectives by adjusting weights:
```yaml
reward:
throughput_weight: 2.0 # Prioritize throughput
speed_weight: 1.0 # Increase speed importance
density_weight: -0.5 # Stronger density penalty
action_change_weight: -0.2 # Discourage frequent changes
```
## Output Files
After training and testing, the following files will be generated:
- `checkpoints/model_episode_*.pt`: Model checkpoints saved during training
- `checkpoints/model_final.pt`: Final trained model
- `logs/training_results.png`: Training curves (rewards, loss, throughput)
- `logs/test_results.png`: Test visualization (density heatmap and speed control)
## Model Architecture
The DQN agent uses:
- **State**: Concatenation of traffic densities and speed limits for all cells
- **Action**: Discrete speed limit values (uniformly distributed between min and max)
- **Network**: Fully connected layers with ReLU activation
- **Training**: Experience replay + target network for stable learning
## CTM Model
The Cell Transmission Model simulates traffic flow based on:
- **Sending flow**: Limited by density and speed limit
- **Receiving flow**: Limited by downstream capacity
- **Conservation**: Vehicles are conserved across cell boundaries
- **Fundamental diagram**: Relationship between density, flow, and speed
## Example Workflow
1. **Adjust configuration** for your scenario in [config.yaml](config.yaml)
2. **Train the model**: `python main.py --mode train`
3. **Monitor progress** in console output and `logs/training_results.png`
4. **Test the model**: `python main.py --mode test`
5. **Analyze results** in `logs/test_results.png`
6. **Iterate**: Adjust parameters and retrain as needed
## Troubleshooting
- **CUDA out of memory**: Reduce `batch_size` or use `device: "cpu"` in config
- **Slow training**: Reduce `num_episodes` or `episode_length`
- **Poor performance**: Adjust reward weights or increase network capacity
- **Unstable training**: Reduce `learning_rate` or increase `target_update_freq`