LLM Papers
updated
Attention Is All You Need
Paper
• 1706.03762
• Published
• 115
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
• 1810.04805
• Published
• 26
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
lighter
Paper
• 1910.01108
• Published
• 21
Language Models are Few-Shot Learners
Paper
• 2005.14165
• Published
• 19
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper
• 2201.11903
• Published
• 15
Training language models to follow instructions with human feedback
Paper
• 2203.02155
• Published
• 24
PaLM: Scaling Language Modeling with Pathways
Paper
• 2204.02311
• Published
• 3
The Flan Collection: Designing Data and Methods for Effective
Instruction Tuning
Paper
• 2301.13688
• Published
• 9
LLaMA: Open and Efficient Foundation Language Models
Paper
• 2302.13971
• Published
• 21
Paper
• 2303.08774
• Published
• 7
Paper
• 2305.10403
• Published
• 8
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Paper
• 2305.10601
• Published
• 15
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
• 2307.09288
• Published
• 250
Attention Is Not All You Need Anymore
Paper
• 2308.07661
• Published
• 1
Paper
• 2310.06825
• Published
• 58
Gemini: A Family of Highly Capable Multimodal Models
Paper
• 2312.11805
• Published
• 49
Gemini 1.5: Unlocking multimodal understanding across millions of tokens
of context
Paper
• 2403.05530
• Published
• 65
Gemma: Open Models Based on Gemini Research and Technology
Paper
• 2403.08295
• Published
• 50
OpenELM: An Efficient Language Model Family with Open-source Training
and Inference Framework
Paper
• 2404.14619
• Published
• 126
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Paper
• 2407.01370
• Published
• 89
OpenDevin: An Open Platform for AI Software Developers as Generalist
Agents
Paper
• 2407.16741
• Published
• 76
The Llama 3 Herd of Models
Paper
• 2407.21783
• Published
• 117
The AI Scientist: Towards Fully Automated Open-Ended Scientific
Discovery
Paper
• 2408.06292
• Published
• 128
Qwen2.5-Coder Technical Report
Paper
• 2409.12186
• Published
• 153
Paper
• 2410.21276
• Published
• 87
DeepSeek-V3 Technical Report
Paper
• 2412.19437
• Published
• 76
Evolving Deeper LLM Thinking
Paper
• 2501.09891
• Published
• 115
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
• 2502.02737
• Published
• 255
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published
• 440
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper
• 2502.14499
• Published
• 194