Making AI papers lucid -- Learn, Code, Document
Read and deeply understand the paper, its context, and contributions.
Write visualization code hands-on -- practice by building static figures, interactive demos, and animations.
Publish and share the work -- GitHub Pages demos, structured notes, and paper walkthroughs.
Christiano, Leike, Brown, Martic, Legg, Amodei (2017) -- The foundational RLHF paper. Train RL agents using human preference comparisons instead of hand-designed reward functions.
Schulman, Wolski, Dhariwal, Radford, Klimov (2017) -- PPO -- a simple, stable policy gradient method using a clipped surrogate objective. The RL optimizer used inside RLHF.
Stiennon, Ouyang, Wu, Ziegler, Lowe, Voss, Radford, Amodei, Christiano (2020) -- Applies RLHF to text summarization, demonstrating preference-based training scales to NLP tasks.
Ouyang, Wu, Jiang, Almeida, Wainwright, Mishkin, Zhang, et al. (2022) -- RLHF at scale -- fine-tunes GPT-3 with human feedback to follow instructions.
Rafailov, Sharma, Mitchell, Ermon, Manning, Finn (2023) -- Eliminates the reward model entirely -- optimizes preferences directly via a classification loss.
Chen, Shen, Hong, Chen, Jiao, Zhang, Ma, Liu (2024) -- Aligns LLMs via self-play without human preference data.
Bordes, Pang, Ajay, et al. (2024) -- Comprehensive survey of VLM families: contrastive, masking, and generative approaches.