Supervised Fine-Tuning (SFT)

Initial training phase in RLHF where model learns from labeled examples before preference learning.