Interactive RLHF Preference Demo

Simulating the human preference elicitation from Christiano et al., 2017

Click on the trajectory you prefer, or mark them as equal. Watch the reward model learn from your preferences!

Trajectory Segment σ¹

VS

Trajectory Segment σ²

0
Comparisons Made
Reward Model Accuracy
Cross-Entropy Loss