AdaGrad

Optimizer that adapts learning rates per parameter based on historical squared gradients, aiding sparse features.