[2025-10-11 22:32:58,892][__main__][INFO] - Training for 50000 timesteps with NormalQNetwork and NormalReplayBuffer [2025-10-11 22:33:08,427][core][INFO] - Step: 2000, Eval mean: 9.2, Eval std: 0.6 [2025-10-11 22:33:19,195][core][INFO] - Step: 4000, Eval mean: 9.2, Eval std: 0.6 [2025-10-11 22:33:29,736][core][INFO] - Step: 6000, Eval mean: 9.2, Eval std: 0.6 [2025-10-11 22:33:40,937][core][INFO] - Step: 8000, Eval mean: 9.2, Eval std: 0.6 [2025-10-11 22:33:52,238][core][INFO] - Step: 10000, Eval mean: 9.4, Eval std: 0.66332495807108 [2025-10-11 22:34:03,847][core][INFO] - Step: 12000, Eval mean: 9.4, Eval std: 0.66332495807108 [2025-10-11 22:34:15,649][core][INFO] - Step: 14000, Eval mean: 9.4, Eval std: 0.66332495807108 [2025-10-11 22:34:28,471][core][INFO] - Step: 16000, Eval mean: 9.2, Eval std: 0.6 [2025-10-11 22:34:41,494][core][INFO] - Step: 18000, Eval mean: 9.2, Eval std: 0.6 [2025-10-11 22:34:53,973][core][INFO] - Step: 20000, Eval mean: 9.4, Eval std: 0.66332495807108 [2025-10-11 22:35:06,951][core][INFO] - Step: 22000, Eval mean: 9.4, Eval std: 0.66332495807108 [2025-10-11 22:35:19,579][core][INFO] - Step: 24000, Eval mean: 9.4, Eval std: 0.66332495807108 [2025-10-11 22:35:33,048][core][INFO] - Step: 26000, Eval mean: 9.4, Eval std: 0.66332495807108