Direct preference optimization vs Proximal policy optimization (DPO vs PPO)

learn how llm's are fine-tuned!

Direct preference optimization vs Proximal policy optimization (DPO vs PPO)

Loading...