NLP paper
updated
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
Paper
• 2401.01854
• Published
• 11
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper
• 2401.01055
• Published
• 55
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper
• 2401.01325
• Published
• 27
Improving Text Embeddings with Large Language Models
Paper
• 2401.00368
• Published
• 82
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale
Pretraining Corpus for Math
Paper
• 2312.17120
• Published
• 28
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
• 2312.15166
• Published
• 61
Principled Instructions Are All You Need for Questioning LLaMA-1/2,
GPT-3.5/4
Paper
• 2312.16171
• Published
• 37
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with
Refined Data Generation
Paper
• 2312.14187
• Published
• 49
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
• 2401.01335
• Published
• 68
LLaMA Pro: Progressive LLaMA with Block Expansion
Paper
• 2401.02415
• Published
• 54
Paper
• 2401.04088
• Published
• 160
SeaLLMs -- Large Language Models for Southeast Asia
Paper
• 2312.00738
• Published
• 25
System 2 Attention (is something you might need too)
Paper
• 2311.11829
• Published
• 43
Contrastive Chain-of-Thought Prompting
Paper
• 2311.09277
• Published
• 35
Blending Is All You Need: Cheaper, Better Alternative to
Trillion-Parameters LLM
Paper
• 2401.02994
• Published
• 52
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper
• 2401.03462
• Published
• 28
Self-Rewarding Language Models
Paper
• 2401.10020
• Published
• 152
Tuning Language Models by Proxy
Paper
• 2401.08565
• Published
• 22
ReFT: Reasoning with Reinforced Fine-Tuning
Paper
• 2401.08967
• Published
• 31
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
• 2402.17764
• Published
• 627
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper
• 2310.11453
• Published
• 106
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper
• 2402.14830
• Published
• 24
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
• 2403.03507
• Published
• 189
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper
• 2403.07816
• Published
• 44
RAFT: Adapting Language Model to Domain Specific RAG
Paper
• 2403.10131
• Published
• 72
ORPO: Monolithic Preference Optimization without Reference Model
Paper
• 2403.07691
• Published
• 72
Evolutionary Optimization of Model Merging Recipes
Paper
• 2403.13187
• Published
• 58
RakutenAI-7B: Extending Large Language Models for Japanese
Paper
• 2403.15484
• Published
• 15
sDPO: Don't Use Your Data All at Once
Paper
• 2403.19270
• Published
• 41
Jamba: A Hybrid Transformer-Mamba Language Model
Paper
• 2403.19887
• Published
• 112
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language
Models through Question Complexity
Paper
• 2403.14403
• Published
• 7
Long-context LLMs Struggle with Long In-context Learning
Paper
• 2404.02060
• Published
• 37
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Paper
• 2404.05961
• Published
• 66
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Paper
• 2404.07413
• Published
• 38
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
• 2404.14219
• Published
• 259
OpenELM: An Efficient Language Model Family with Open-source Training
and Inference Framework
Paper
• 2404.14619
• Published
• 126
Mixture-of-Agents Enhances Large Language Model Capabilities
Paper
• 2406.04692
• Published
• 59