MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper • 2402.14905 • Published Feb 22, 2024 • 134
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published Feb 10, 2025 • 153
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 219
jiogenes/Llama-2-7b-hf-finetuned-open-korean-instructions Text Generation • 7B • Updated Jan 16, 2024 • 1
jiogenes/gpt2-medium-finetuned-open-korean-instructions Text Generation • 0.4B • Updated Jan 11, 2024 • 3