about

All Posts

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

May 9 2024 · 3 min read
#RLHF
TL;DR: 1) on-policy sampling + 2) negative gradients → mode-seeking objectives → better efficiency Table of Contents Characterizing preference fine-tuning methods Experiments Setup Observation 1: On-policy sampling generally improves performance and efficiency Observation 2: sample reuse can enable leveraging off-policy data Observation 3: negative gradient enables faster convergence Observation 4: on-policy sampling and negative gradients are complementary Theoretical intuition: mode-seeking vs …
Read More…

An Emulator for Fine-tuning Large Language Models using Small Language Models

Feb 5 2024 · 4 min read
#Weak-to-strong Generalization
TL;DR: we can transplant >80% instruction-following performance from small models to large models, without actually tuning them. Table of Contents Emulated finetuning (EFT) A model-as-rm perspective EFT for scale decoupling Experiments Observation 1: pretraining vs finetuning ⇒ factuality vs helpfulness Observation 2: EFT enables dynamic test-time reward interpolation Observation 3: speculative decoding speeds up EFT up-scaling Observation 4: up-scaling can be further amplified Observation 5: …
Read More…

Measuring Faithfulness in Chain-of-Thought Reasoning

Sep 19 2023 · 8 min read
#CoT #Faithfulness
Generated reasoning is faithful to the model’s true reasoning, if it can “accurately represents the reasoning process behind the model’s prediction”. This is particularly important 1) in high-stakes settings, such as medical decision-making, and 2) for gaining a better understanding of how reasoning works in LLMs. This work provides a timely investigation into the faithfulness of CoT reasoning for LLMs, adding to previous research that suggests LLM-generated reasoning may not be faithful.
Read More…