[2025-10-12 00:15:02,316][__main__][INFO] - Training for 50000 timesteps with NormalQNetwork and 128StepReplayBuffer