RLHF Preference Demo — Interactive

Click on the trajectory you prefer, or mark them as equal. Watch the reward model learn from your preferences!

Comparisons Made

—

Reward Model Accuracy

—

Cross-Entropy Loss

Interactive RLHF Preference Demo