§ 23 papers · 7 groups

Research papers

Short, opinionated notes on the papers that shape how we build AI systems. Each entry has a TLDR, the key contributions, and why it still matters.

/

23 papers

#	Year	Title	Group	Venue
1	2023	Direct Preference Optimization: Your Language Model is Secretly a Reward ModelWIP Derive a closed-form loss that optimizes a policy against preference data without training a separate reward model or running PPO. Much simpler than RLHF, competitive quality.	Fine-Tuning & Alignment	NeurIPS 2023
2	2023	LLaMA: Open and Efficient Foundation Language ModelsWIP A family of 7B–65B decoder transformers trained on public data. The 13B model matches GPT-3 and the weights (eventually) leaked, kicking off the modern open-weights era.	Transformers & Language Models	arXiv 2023
3	2023	Tree of Thoughts: Deliberate Problem Solving with Large Language ModelsWIP Search over a tree of partial reasoning paths with lookahead and backtracking, using the LLM to both generate branches and evaluate them. Big wins on puzzles that trip up linear CoT.	Prompting & Reasoning	NeurIPS 2023
4	2022	Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsWIP Asking a large model to "think step by step" — or showing a few examples that include reasoning traces — dramatically improves accuracy on math and multi-step problems.	Prompting & Reasoning	NeurIPS 2022
5	2022	Constitutional AI: Harmlessness from AI FeedbackWIP Replace most human harmlessness labels with model-generated self-critiques guided by a written set of principles ("constitution"). Scales alignment data and gives more transparent rules.	Fine-Tuning & Alignment	arXiv 2022
6	2022	FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessWIP Reorder the attention computation into tiled blocks that fit in GPU SRAM so you avoid materializing the full attention matrix in HBM. Same math, much faster, much less memory.	Scaling & Efficiency	NeurIPS 2022
7	2022	High-Resolution Image Synthesis with Latent Diffusion ModelsWIP Run the diffusion denoising process in the compressed latent space of a pretrained autoencoder instead of in pixel space. Cuts compute by an order of magnitude and enables text-to-image at home.	Computer Vision	CVPR 2022
8	2022	ReAct: Synergizing Reasoning and Acting in Language ModelsWIP Interleave reasoning traces ("Thought") with tool calls ("Action") and their results ("Observation") in a single prompt loop. The template every agent framework still uses.	Prompting & Reasoning	ICLR 2023
9	2022	Training Compute-Optimal Large Language ModelsWIP For a fixed training compute budget, you should scale model size and training tokens roughly equally. GPT-3 and PaLM were massively undertrained; a 70B model trained on 1.4T tokens beats a 280B one.	Transformers & Language Models	NeurIPS 2022
10	2022	Training Language Models to Follow Instructions with Human FeedbackWIP The InstructGPT recipe — SFT on demonstrations, reward model on human preferences, PPO fine-tuning — turns a raw GPT into a model that follows instructions and refuses harmful requests.	Fine-Tuning & Alignment	NeurIPS 2022
11	2021	Learning Transferable Visual Models From Natural Language SupervisionWIP Train an image encoder and a text encoder to embed matching (image, caption) pairs nearby in a shared space. Gives zero-shot image classification across thousands of categories.	Computer Vision	ICML 2021
12	2021	LoRA: Low-Rank Adaptation of Large Language ModelsWIP Fine-tune giant models by only training small low-rank update matrices instead of all the original weights. Cuts trainable parameters by 10,000x with no quality loss.	Fine-Tuning & Alignment	ICLR 2022
13	2020	An Image is Worth 16x16 Words: Transformers for Image Recognition at ScaleWIP Slice an image into 16x16 patches, treat them as tokens, run a plain transformer. With enough data, it matches or beats CNNs on ImageNet and other benchmarks.	Computer Vision	ICLR 2021
14	2020	ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERTWIP Keep per-token embeddings for both query and document and score via a sum of max-similarities. Much more expressive than a single-vector dual encoder while still fast enough to index.	RAG & Retrieval	SIGIR 2020
15	2020	Dense Passage Retrieval for Open-Domain Question AnsweringWIP Train a BERT-based dual encoder with in-batch negatives to embed queries and passages into the same space. Beats BM25 for open-domain QA retrieval.	RAG & Retrieval	EMNLP 2020
16	2020	Language Models are Few-Shot LearnersWIP Scales the GPT decoder to 175B parameters and shows that a single model, with no gradient updates, can do many tasks from a handful of in-context examples.	Transformers & Language Models	NeurIPS 2020
17	2020	Retrieval-Augmented Generation for Knowledge-Intensive NLP TasksWIP Introduces the term "RAG" and a model that retrieves passages from a Wikipedia index and conditions a seq2seq generator on them. The blueprint for every RAG system that came after.	RAG & Retrieval	NeurIPS
18	2018	BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingWIP Pretrain an encoder-only transformer with masked language modeling and next-sentence prediction, then fine-tune for downstream tasks. Set state of the art on 11 NLP benchmarks.	Transformers & Language Models	NAACL 2019
19	2017	Attention Is All You NeedWIP Replaces recurrence and convolutions with self-attention. Introduces the Transformer architecture that powers every modern LLM.	Transformers & Language Models	NeurIPS
20	2015	Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate ShiftWIP Normalize each layer's activations over the mini-batch, then rescale with learned parameters. Dramatically speeds up training and reduces sensitivity to initialization.	Foundational ML & Deep Learning	ICML 2015
21	2015	Deep Residual Learning for Image RecognitionWIP Adds identity "skip connections" across layers so gradients flow through very deep networks. Made training 150+ layer CNNs practical and won ImageNet 2015.	Foundational ML & Deep Learning	CVPR 2016
22	2014	Adam: A Method for Stochastic OptimizationWIP An adaptive optimizer that tracks first and second moments of the gradient per parameter. Became the default optimizer for deep learning almost overnight.	Foundational ML & Deep Learning	ICLR 2015
23	2012	ImageNet Classification with Deep Convolutional Neural NetworksWIP The paper that kicked off the deep learning era. A large CNN trained on GPUs crushed ImageNet 2012, halving the previous error rate.	Foundational ML & Deep Learning	NeurIPS 2012