PPO

Proximal Policy Optimization - a policy gradient method for reinforcement learning that ensures stable training.