view article Article Glaze and the Effectiveness of Anti-AI Methods for Diffusion Models parsee-mizuhashi • May 15, 2024 • 10
view article Article Mixture of Experts Explained +4 osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq • Dec 11, 2023 • 1.12k
view article Article How NuminaMath Won the 1st AIMO Progress Prize +6 yfleureau, liyongsea, edbeeching, lewtun, benlipkin, romansoletskyi, vwxyzjn, kashif • Jul 11, 2024 • 128
Large Language Models for Mathematical Reasoning: Progresses and Challenges Paper • 2402.00157 • Published Jan 31, 2024 • 1
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models Paper • 2404.05567 • Published Apr 8, 2024 • 10
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding Paper • 2402.16671 • Published Feb 26, 2024 • 27
Programming datasets Collection Web-scrapes of datasets to boost code performance • 3 items • Updated Mar 2 • 1
haiku Collection 🌸 This is a collection of synthetic datasets built to help improve the ability of open language models to better write haikus through the use of DPO • 3 items • Updated Jun 21, 2024 • 6
LLM Augmented LLMs: Expanding Capabilities through Composition Paper • 2401.02412 • Published Jan 4, 2024 • 38
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset Paper • 2309.04662 • Published Sep 9, 2023 • 26