Running 20 Rabbits Leaderboard 💊 20 Visualize and analyze language model robustness to drug name synonyms
Running on CPU Upgrade 13.9k Open LLM Leaderboard 🏆 13.9k Track, rank and evaluate open LLMs and chatbots
Running on CPU Upgrade 241 MMLU-Pro Leaderboard 🥇 241 More advanced and challenging multi-task evaluation