Reading list
updated
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper
• 2412.11768
• Published
• 43
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World
Tasks
Paper
• 2412.14161
• Published
• 51
HiRED: Attention-Guided Token Dropping for Efficient Inference of
High-Resolution Vision-Language Models in Resource-Constrained Environments
Paper
• 2408.10945
• Published
• 10
PDFTriage: Question Answering over Long, Structured Documents
Paper
• 2309.08872
• Published
• 55
Compressed Chain of Thought: Efficient Reasoning Through Dense
Representations
Paper
• 2412.13171
• Published
• 35
The Matrix Calculus You Need For Deep Learning
Paper
• 1802.01528
• Published
• 2
A Modern Self-Referential Weight Matrix That Learns to Modify Itself
Paper
• 2202.05780
• Published
Recurrent Memory Transformer
Paper
• 2207.06881
• Published
• 1
How many words does ChatGPT know? The answer is ChatWords
Paper
• 2309.16777
• Published
• 1
Weaver: Foundation Models for Creative Writing
Paper
• 2401.17268
• Published
• 45
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Paper
• 2308.09687
• Published
• 7
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling
with Backtracking
Paper
• 2306.05426
• Published
Think before you speak: Training Language Models With Pause Tokens
Paper
• 2310.02226
• Published
• 3
What do tokens know about their characters and how do they know it?
Paper
• 2206.02608
• Published
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
• 2404.07143
• Published
• 111
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive
Cognitive-Inspired Sketching
Paper
• 2503.05179
• Published
• 46
Expressing stigma and inappropriate responses prevents LLMs from safely
replacing mental health providers
Paper
• 2504.18412
• Published
• 1
Chain of Draft: Thinking Faster by Writing Less
Paper
• 2502.18600
• Published
• 50
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large
Language Models
Paper
• 2506.19697
• Published
• 44
Jasper and Stella: distillation of SOTA embedding models
Paper
• 2412.19048
• Published
• 2
The Flan Collection: Designing Data and Methods for Effective
Instruction Tuning
Paper
• 2301.13688
• Published
• 9
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Paper
• 2411.19146
• Published
• 17
Chain-of-Thought Reasoning Without Prompting
Paper
• 2402.10200
• Published
• 109
Robust and Fine-Grained Detection of AI Generated Texts
Paper
• 2504.11952
• Published
• 12
Does Math Reasoning Improve General LLM Capabilities? Understanding
Transferability of LLM Reasoning
Paper
• 2507.00432
• Published
• 79
The Landscape of Memorization in LLMs: Mechanisms, Measurement, and
Mitigation
Paper
• 2507.05578
• Published
• 6
Measuring the Impact of Early-2025 AI on Experienced Open-Source
Developer Productivity
Paper
• 2507.09089
• Published
Stochastic LLMs do not Understand Language: Towards Symbolic,
Explainable and Ontologically Based LLMs
Paper
• 2309.05918
• Published
The Debate Over Understanding in AI's Large Language Models
Paper
• 2210.13966
• Published
Emergent World Representations: Exploring a Sequence Model Trained on a
Synthetic Task
Paper
• 2210.13382
• Published
Evidence of Meaning in Language Models Trained on Programs
Paper
• 2305.11169
• Published
Paper
• 2202.00666
• Published
• 4
Balancing Diversity and Risk in LLM Sampling: How to Select Your Method
and Parameter for Open-Ended Text Generation
Paper
• 2408.13586
• Published
• 3
Language Models are Injective and Hence Invertible
Paper
• 2510.15511
• Published
• 69
Poisoning Attacks on LLMs Require a Near-constant Number of Poison
Samples
Paper
• 2510.07192
• Published
• 5
Not All Bits Are Equal: Scale-Dependent Memory Optimization Strategies
for Reasoning Models
Paper
• 2510.10964
• Published
• 3
SETOL: A Semi-Empirical Theory of (Deep) Learning
Paper
• 2507.17912
• Published
• 1
Attention Is Not What You Need
Paper
• 2512.19428
• Published
The Geometry of Reasoning: Flowing Logics in Representation Space
Paper
• 2510.09782
• Published
• 7
GrokAlign: Geometric Characterisation and Acceleration of Grokking
Paper
• 2506.12284
• Published
Intelligence per Watt: Measuring Intelligence Efficiency of Local AI
Paper
• 2511.07885
• Published
• 10
Base Models Beat Aligned Models at Randomness and Creativity
Paper
• 2505.00047
• Published
• 1
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
Paper
• 2601.06002
• Published
• 56