修复单独测试时无法正常保存的bug,修改ctm模型运算单位

This commit is contained in:
Zihan Ye 2026-01-13 15:34:47 +08:00
parent e8f32f9942
commit 9fa47127ab
6 changed files with 76 additions and 299 deletions

101
README.md
View File

@ -13,13 +13,15 @@ ctm/
├── ctm_model.py # 元胞传输模型实现
├── dqn_agent.py # DQN智能体与经验回放
├── environment.py # 训练环境
├── vectorized_env.py # 并行环境包装器
├── demand_loader.py # CSV流量数据加载器
├── train.py # 训练脚本
├── test.py # 测试/评估脚本
├── utils.py # 工具函数
├── checkpoints/ # 模型检查点(自动创建)
└── logs/ # 训练日志和图表(自动创建)
├── latest/ # 最新训练运行目录(自动创建)
│ ├── checkpoints/ # 模型检查点
│ ├── logs/ # 训练日志和图表
│ └── config.yaml # 训练时使用的配置
└── runs/ # 历史训练运行归档(自动创建)
```
## 主要特性
@ -28,12 +30,13 @@ ctm/
- **DQN智能体**: 基于深度强化学习的限速控制
- **灵活配置**: 通过YAML配置文件轻松调整参数
- **训练与测试**: 独立的训练和评估模式
- **Baseline对比**: 自动对比无控制策略的性能
- **可视化**: 自动生成训练结果和交通模式图表
- **检查点保存**: 训练过程中定期保存模型
- **并行训练**: 支持多环境并行训练,显著提升训练效率
- **CSV流量输入**: 支持从CSV文件读取真实交通流量数据
- **随机种子固定**: 确保训练结果可重复
- **最佳模型保存**: 自动保存效果最好的模型
- **运行管理**: 自动管理训练运行最新运行保存在latest/历史运行归档到runs/
## 安装
@ -100,7 +103,7 @@ python main.py --mode test --model checkpoints/model_best.pt
- `save_freq`: 模型检查点保存频率默认50
- `log_freq`: 日志记录频率默认10
- `random_seed`: 随机种子默认42
- `num_parallel_envs`: 并行环境数量默认4
- `train_freq`: 训练频率每N步训练一次默认1
### 奖励函数权重
- `throughput_weight`: 通行量奖励权重默认1.0
@ -112,59 +115,7 @@ python main.py --mode test --model checkpoints/model_best.pt
## 高级功能
### 1. 并行环境训练
系统支持多环境并行训练,可显著提高训练效率。
#### 配置方法
`config.yaml` 中设置:
```yaml
training:
num_parallel_envs: 4 # 使用4个并行环境
```
#### 推荐配置
- **CPU训练**: 2-4个并行环境
- **GPU训练**: 4-8个并行环境
- **高性能GPU**: 8-16个并行环境
#### 性能对比
| 并行环境数 | 相对速度 | 内存占用 | 推荐场景 |
|----------|---------|---------|---------|
| 1 | 1x | 低 | 调试、小规模实验 |
| 2 | ~1.8x | 中 | CPU训练 |
| 4 | ~3.5x | 中高 | 标准训练 |
| 8 | ~6.5x | 高 | GPU训练 |
#### 使用示例
```yaml
training:
num_episodes: 500
num_parallel_envs: 4
agent:
batch_size: 128 # 建议增加批量大小
buffer_size: 100000
```
运行训练:
```bash
python main.py --mode train
```
输出示例:
```
Random seed set to: 42
Created 4 parallel environments
Using 4 parallel environments
Starting training for 500 episodes...
```
### 2. CSV流量输入
### 1. CSV流量输入
系统支持从CSV文件读取真实交通流量数据。
@ -283,8 +234,8 @@ agent:
**问题:训练速度慢**
```yaml
training:
num_parallel_envs: 4 # 启用并行环境
num_episodes: 200 # 减少episode数量
train_freq: 5 # 降低训练频率
environment:
episode_length: 180 # 减少episode长度
```
@ -305,21 +256,6 @@ agent:
hidden_layers: [256, 256, 128] # 增加网络容量
```
### 并行训练相关
**问题:并行训练速度提升不明显**
- 检查GPU利用率是否已经很高
- 瓶颈可能在网络训练而非数据收集
- 尝试减少训练频率
**问题:内存不足**
```yaml
training:
num_parallel_envs: 2 # 减少并行环境数
agent:
buffer_size: 50000 # 减小缓冲区
```
### CSV流量输入相关
**问题找不到CSV文件**
@ -335,6 +271,20 @@ agent:
## 技术细节
### 运行管理
系统自动管理训练运行,保持目录整洁:
- 最新训练保存在 `latest/` 目录
- 每次新训练开始时,旧的 `latest/` 会自动移动到 `runs/run_YYYYMMDD_HHMMSS/`
- 每个运行包含:配置文件、模型检查点、训练日志和图表
- 测试时自动使用 `latest/checkpoints/model_best.pt``latest/config.yaml`
### Baseline 对比测试
测试模式会自动运行 baseline 对比实验:
- Baseline 使用固定的最大速度限制(无动态控制)
- 自动计算 DQN 相对于 baseline 的性能提升百分比
- 对比指标包括:平均奖励、平均吞吐量
- 帮助评估 DQN 控制策略的实际效果
### 随机种子固定
系统在训练开始时会固定所有随机种子Python、NumPy、PyTorch确保
- 训练结果可重复
@ -344,8 +294,9 @@ agent:
### 最佳模型保存
训练过程中会自动跟踪并保存效果最好的模型:
- 基于episode总奖励评估
- 保存为 `checkpoints/model_best.pt`
- 保存为 `latest/checkpoints/model_best.pt`
- 训练结束时显示最佳奖励值
- 训练完成后自动使用最佳模型进行测试
---

View File

@ -57,7 +57,6 @@ training:
checkpoint_dir: "checkpoints" # Checkpoint directory
log_dir: "logs" # Log directory
random_seed: 42 # Random seed for reproducibility
num_parallel_envs: 1 # Number of parallel environments (1 = no parallelization)
train_freq: 5 # Train every N steps (higher = faster but less stable)
# Testing Parameters

View File

@ -69,24 +69,36 @@ class CTMModel:
return np.concatenate([self.densities, self.speed_limits])
def _calculate_sending_flow(self, cell_idx: int) -> float:
"""Calculate sending flow from a cell."""
"""
Calculate sending flow from a cell (vehicles per time step).
Returns flow in vehicles that can leave the cell during one time step.
"""
density = self.densities[cell_idx]
speed_limit = self.speed_limits[cell_idx]
effective_speed = min(speed_limit, self.free_flow_speed)
# Flow = density (veh/km) * speed (m/s) * 3.6 to get veh/h, then convert to veh/timestep
# Simplified: density * speed * time_step / 1000 gives vehicles per time step
sending_flow = min(
density * effective_speed,
self.capacity
density * effective_speed * self.time_step / 1000.0,
self.capacity * self.time_step / 1000.0
)
return sending_flow
def _calculate_receiving_flow(self, cell_idx: int) -> float:
"""Calculate receiving flow to a cell."""
"""
Calculate receiving flow to a cell (vehicles per time step).
Returns flow in vehicles that can enter the cell during one time step.
"""
density = self.densities[cell_idx]
# Receiving capacity based on available space
# congestion_wave_speed (m/s) * (jam_density - density) (veh/km) * time_step / 1000
receiving_flow = min(
self.capacity,
self.congestion_wave_speed * (self.jam_density - density)
self.capacity * self.time_step / 1000.0,
self.congestion_wave_speed * (self.jam_density - density) * self.time_step / 1000.0
)
return receiving_flow
@ -105,14 +117,16 @@ class CTMModel:
new_densities = self.densities.copy()
flows = np.zeros(self.num_cells + 1)
flows[0] = inflow / 3600.0 * self.time_step
# Convert inflow from veh/h to vehicles per time step
flows[0] = inflow * self.time_step / 3600.0
for i in range(self.num_cells):
sending = self._calculate_sending_flow(i)
if i < self.num_cells - 1:
receiving = self._calculate_receiving_flow(i + 1)
else:
receiving = outflow / 3600.0 * self.time_step
# Convert outflow from veh/h to vehicles per time step
receiving = outflow * self.time_step / 3600.0
flows[i + 1] = min(sending, receiving)
@ -130,6 +144,7 @@ class CTMModel:
"flows": flows.copy(),
"average_density": np.mean(self.densities),
"total_vehicles": np.sum(self.densities) * self.cell_length / 1000.0,
# flows[-1] is in vehicles per time step, convert to veh/h
"throughput": flows[-1] * 3600.0 / self.time_step,
}

View File

@ -42,15 +42,17 @@ def main():
# Auto-detect config from model path if using default model
config_path = args.config
model_path = args.model
output_dir = None
# If using latest model and default config, use latest config instead
if model_path.startswith("latest/") and args.config == "config.yaml":
latest_config = os.path.join("latest", "config.yaml")
output_dir = os.path.join("latest", "logs")
if os.path.exists(latest_config):
config_path = latest_config
print(f"Auto-detected config from latest run: {config_path}")
test(config_path, model_path)
test(config_path, model_path, output_dir=output_dir)
if __name__ == "__main__":

View File

@ -1,138 +0,0 @@
"""
Multiprocessing-based parallel environment for true parallelization.
"""
import numpy as np
from typing import List, Tuple, Dict
from multiprocessing import Process, Pipe
from environment import TrafficEnvironment
def worker(remote, parent_remote, config):
"""
Worker process for running environment.
Args:
remote: Child end of pipe
parent_remote: Parent end of pipe (closed in child)
config: Environment configuration
"""
parent_remote.close()
env = TrafficEnvironment(config)
while True:
try:
cmd, data = remote.recv()
if cmd == 'step':
state, reward, done, info = env.step(data)
if done:
state = env.reset()
remote.send((state, reward, done, info))
elif cmd == 'reset':
state = env.reset()
remote.send(state)
elif cmd == 'get_metrics':
remote.send(env.episode_metrics)
elif cmd == 'close':
remote.close()
break
except EOFError:
break
class ParallelEnvironment:
"""Multiprocessing-based parallel environment wrapper."""
def __init__(self, config: dict, num_envs: int = 4):
"""
Initialize parallel environment with multiprocessing.
Args:
config: Configuration dictionary
num_envs: Number of parallel environments
"""
self.num_envs = num_envs
self.config = config
# Create pipes and processes
self.remotes, self.work_remotes = zip(*[Pipe() for _ in range(num_envs)])
self.processes = []
for work_remote, remote in zip(self.work_remotes, self.remotes):
proc = Process(
target=worker,
args=(work_remote, remote, config)
)
proc.daemon = True
proc.start()
self.processes.append(proc)
work_remote.close()
# Get dimensions from config
temp_env = TrafficEnvironment(config)
self.state_dim = temp_env.state_dim
self.action_dim = temp_env.action_dim
print(f"Created {num_envs} parallel environments with multiprocessing")
def reset(self) -> np.ndarray:
"""
Reset all environments in parallel.
Returns:
Array of initial states, shape: (num_envs, state_dim)
"""
for remote in self.remotes:
remote.send(('reset', None))
states = [remote.recv() for remote in self.remotes]
return np.array(states)
def step(self, actions: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray, List[Dict]]:
"""
Step all environments in parallel.
Args:
actions: Array of actions, shape: (num_envs,)
Returns:
states: Array of next states, shape: (num_envs, state_dim)
rewards: Array of rewards, shape: (num_envs,)
dones: Array of done flags, shape: (num_envs,)
infos: List of info dictionaries
"""
# Send actions to all workers
for remote, action in zip(self.remotes, actions):
remote.send(('step', action))
# Receive results from all workers
results = [remote.recv() for remote in self.remotes]
states, rewards, dones, infos = zip(*results)
return np.array(states), np.array(rewards), np.array(dones), list(infos)
def get_episode_metrics(self) -> List[List[Dict]]:
"""
Get episode metrics from all environments.
Returns:
List of episode metrics for each environment
"""
for remote in self.remotes:
remote.send(('get_metrics', None))
metrics = [remote.recv() for remote in self.remotes]
return metrics
def close(self):
"""Close all worker processes."""
for remote in self.remotes:
remote.send(('close', None))
for proc in self.processes:
proc.join()

100
train.py
View File

@ -9,7 +9,6 @@ from tqdm import tqdm
import matplotlib.pyplot as plt
from utils import load_config, create_run_directory
from environment import TrafficEnvironment
from parallel_env import ParallelEnvironment
from dqn_agent import DQNAgent
from test import test
@ -38,14 +37,8 @@ def train(config_path: str = "config.yaml"):
set_random_seed(random_seed)
print(f"Random seed set to: {random_seed}")
# Create environment (parallel or single)
num_parallel_envs = config["training"].get("num_parallel_envs", 1)
if num_parallel_envs > 1:
env = ParallelEnvironment(config, num_envs=num_parallel_envs)
print(f"Using {num_parallel_envs} parallel environments with multiprocessing")
else:
env = TrafficEnvironment(config)
print("Using single environment")
env = TrafficEnvironment(config)
print("Using single environment")
agent = DQNAgent(
state_dim=env.state_dim,
action_dim=env.action_dim,
@ -81,88 +74,43 @@ def train(config_path: str = "config.yaml"):
print(f"Train frequency: every {train_freq} steps")
# Check if using parallel environments
is_vectorized = num_parallel_envs > 1
# is_vectorized = num_parallel_envs > 1
for episode in tqdm(range(num_episodes), desc="Training"):
states = env.reset()
step_count = 0 # Track steps for training frequency
episode_reward = 0
episode_loss = 0
loss_count = 0
state = states
step_count = 0
if is_vectorized:
# Parallel environment training
episode_rewards_vec = np.zeros(num_parallel_envs)
episode_loss = 0
loss_count = 0
done_flags = np.zeros(num_parallel_envs, dtype=bool)
while True:
action = agent.select_action(state, training=True)
next_state, reward, done, info = env.step(action)
while not np.all(done_flags):
# Select actions for all environments
actions = np.array([agent.select_action(states[i], training=True)
for i in range(num_parallel_envs)])
agent.store_transition(state, action, reward, next_state, done)
next_states, rewards, dones, infos = env.step(actions)
# Train agent every train_freq steps
step_count += 1
if step_count % train_freq == 0:
loss = agent.train()
if loss > 0:
episode_loss += loss
loss_count += 1
# Store transitions for all environments
for i in range(num_parallel_envs):
if not done_flags[i]:
agent.store_transition(states[i], actions[i], rewards[i],
next_states[i], dones[i])
episode_rewards_vec[i] += rewards[i]
if dones[i]:
done_flags[i] = True
episode_reward += reward
state = next_state
# Train agent every train_freq steps
step_count += 1
if step_count % train_freq == 0:
loss = agent.train()
if loss > 0:
episode_loss += loss
loss_count += 1
states = next_states
episode_reward = np.mean(episode_rewards_vec)
else:
# Single environment training
episode_reward = 0
episode_loss = 0
loss_count = 0
state = states
step_count = 0
while True:
action = agent.select_action(state, training=True)
next_state, reward, done, info = env.step(action)
agent.store_transition(state, action, reward, next_state, done)
# Train agent every train_freq steps
step_count += 1
if step_count % train_freq == 0:
loss = agent.train()
if loss > 0:
episode_loss += loss
loss_count += 1
episode_reward += reward
state = next_state
if done:
break
if done:
break
agent.end_episode()
episode_rewards.append(episode_reward)
episode_losses.append(episode_loss / max(1, loss_count))
# Calculate average throughput
if is_vectorized:
all_metrics = env.get_episode_metrics()
avg_throughput = np.mean([
np.mean([m["throughput"] for m in metrics])
for metrics in all_metrics if len(metrics) > 0
])
else:
avg_throughput = np.mean([m["throughput"] for m in env.episode_metrics])
avg_throughput = np.mean([m["throughput"] for m in env.episode_metrics])
episode_throughputs.append(avg_throughput)
# Save best model based on episode reward