NLP paper - a LHPKAI Collection

LHPKAI 's Collections

NLP paper

updated Jun 11, 2024

Multilingual Instruction Tuning With Just a Pinch of Multilinguality

Paper • 2401.01854 • Published Jan 3, 2024 • 11
LLaMA Beyond English: An Empirical Study on Language Capability Transfer

Paper • 2401.01055 • Published Jan 2, 2024 • 55
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Paper • 2401.01325 • Published Jan 2, 2024 • 27
Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 82
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math

Paper • 2312.17120 • Published Dec 28, 2023 • 28
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 61
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4

Paper • 2312.16171 • Published Dec 26, 2023 • 37
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation

Paper • 2312.14187 • Published Dec 20, 2023 • 49
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Paper • 2401.01335 • Published Jan 2, 2024 • 68
LLaMA Pro: Progressive LLaMA with Block Expansion

Paper • 2401.02415 • Published Jan 4, 2024 • 54
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 160
SeaLLMs -- Large Language Models for Southeast Asia

Paper • 2312.00738 • Published Dec 1, 2023 • 25
System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 43
Contrastive Chain-of-Thought Prompting

Paper • 2311.09277 • Published Nov 15, 2023 • 35
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4, 2024 • 52
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon

Paper • 2401.03462 • Published Jan 7, 2024 • 28
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 152
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 22
ReFT: Reasoning with Reinforced Fine-Tuning

Paper • 2401.08967 • Published Jan 17, 2024 • 31
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 627
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 106
Orca-Math: Unlocking the potential of SLMs in Grade School Math

Paper • 2402.14830 • Published Feb 16, 2024 • 24
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 189
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Paper • 2403.07816 • Published Mar 12, 2024 • 44
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15, 2024 • 72
ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12, 2024 • 72
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 58
RakutenAI-7B: Extending Large Language Models for Japanese

Paper • 2403.15484 • Published Mar 21, 2024 • 15
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28, 2024 • 41
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28, 2024 • 112
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

Paper • 2403.14403 • Published Mar 21, 2024 • 7
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2, 2024 • 37
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Paper • 2404.05961 • Published Apr 9, 2024 • 66
JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11, 2024 • 38
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 259
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22, 2024 • 126
Mixture-of-Agents Enhances Large Language Model Capabilities

Paper • 2406.04692 • Published Jun 7, 2024 • 59