Simulating the human preference elicitation from Christiano et al., 2017
Click on the trajectory you prefer, or mark them as equal. Watch the reward model learn from your preferences!