leonardlin 's Collections sota
updated
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
• 2401.02954
• Published
• 53
Paper
• 2309.16609
• Published
• 38
Paper
• 2303.08774
• Published
• 7
Gemini: A Family of Highly Capable Multimodal Models
Paper
• 2312.11805
• Published
• 49
An In-depth Look at Gemini's Language Abilities
Paper
• 2312.11444
• Published
• 1
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the
Generative Artificial Intelligence (AI) Research Landscape
Paper
• 2312.10868
• Published
• 1
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language
Models
Paper
• 2312.17661
• Published
• 15
Paper
• 2310.06825
• Published
• 58
TinyLlama: An Open-Source Small Language Model
Paper
• 2401.02385
• Published
• 95
Textbooks Are All You Need II: phi-1.5 technical report
Paper
• 2309.05463
• Published
• 89
Textbooks Are All You Need
Paper
• 2306.11644
• Published
• 154
Paper
• 2401.04088
• Published
• 160
MoE-Mamba: Efficient Selective State Space Models with Mixture of
Experts
Paper
• 2401.04081
• Published
• 74
Magicoder: Source Code Is All You Need
Paper
• 2312.02120
• Published
• 82
Towards Conversational Diagnostic AI
Paper
• 2401.05654
• Published
• 20
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper
• 2401.13601
• Published
• 48
MambaByte: Token-free Selective State Space Model
Paper
• 2401.13660
• Published
• 60
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on
Generalizability, Trustworthiness and Causality through Four Modalities
Paper
• 2401.15071
• Published
• 37
Language Models can be Logical Solvers
Paper
• 2311.06158
• Published
• 20
OLMo: Accelerating the Science of Language Models
Paper
• 2402.00838
• Published
• 85
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
• 2402.03300
• Published
• 141
BlackMamba: Mixture of Experts for State-Space Models
Paper
• 2402.01771
• Published
• 25
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity
Text Embeddings Through Self-Knowledge Distillation
Paper
• 2402.03216
• Published
• 6
Matryoshka Representation Learning
Paper
• 2205.13147
• Published
• 25
Not all layers are equally as important: Every Layer Counts BERT
Paper
• 2311.02265
• Published
• 1
An Interactive Agent Foundation Model
Paper
• 2402.05929
• Published
• 30
Advancing State of the Art in Language Modeling
Paper
• 2312.03735
• Published
• 1
Large Language Models: A Survey
Paper
• 2402.06196
• Published
• 4
ChemLLM: A Chemical Large Language Model
Paper
• 2402.06852
• Published
• 30
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper
• 2402.07456
• Published
• 46
Grandmaster-Level Chess Without Search
Paper
• 2402.04494
• Published
• 69
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts
for Instruction Tuning on General Tasks
Paper
• 2401.02731
• Published
• 3
MobileLLM: Optimizing Sub-billion Parameter Language Models for
On-Device Use Cases
Paper
• 2402.14905
• Published
• 134
Yi: Open Foundation Models by 01.AI
Paper
• 2403.04652
• Published
• 65
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
• 2403.09611
• Published
• 129
InternLM2 Technical Report
Paper
• 2403.17297
• Published
• 34
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language
Models
Paper
• 2404.12387
• Published
• 39
Your Transformer is Secretly Linear
Paper
• 2405.12250
• Published
• 157
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper
• 2405.12981
• Published
• 33
Observational Scaling Laws and the Predictability of Language Model
Performance
Paper
• 2405.10938
• Published
• 14