Open to Collab

Mike Ravkine PRO

mike-ravkine

the-crypt-keeper

AI & ML interests

LLM Research / Development / Evaluation

Recent Activity

posted an update about 14 hours ago

gpt-oss-120b has held on to the ReasonScape crown since it's release on Aug 5, 2025 - 7 months in the LLM space is *impressive*. With the release of Qwen-3.5 the king has been dethroned by not one but 2 models the mid-dense https://huggingface.co/Qwen/Qwen3.5-27B and the large-MoE https://huggingface.co/Qwen/Qwen3.5-122B-A10B-FP8. The old king is dead - long live the new king 👑 Note that these rankings are based on `r12` - a 27k prompts, 12 task domain 3rd iteration of the ReasonScape evaluation. Compared to the previous m12x ranking this evaluation fixes a slew of test bugs, refines the task set to add table-extraction, and lifts the context ceiling to 16k - so these rankings are quite a bit different vs the previous m12x Leaderboard (which has an 8k context limit).

liked a model 8 days ago

Qwen/Qwen3.5-35B-A3B

liked a model 15 days ago

Nanbeige/Nanbeige4.1-3B

View all activity

Organizations

posted an update about 14 hours ago

Post

120

gpt-oss-120b has held on to the ReasonScape crown since it's release on Aug 5, 2025 - 7 months in the LLM space is *impressive*.

With the release of Qwen-3.5 the king has been dethroned by not one but 2 models the mid-dense Qwen/Qwen3.5-27B and the large-MoE Qwen/Qwen3.5-122B-A10B-FP8.

The old king is dead - long live the new king 👑

Note that these rankings are based on r12 - a 27k prompts, 12 task domain 3rd iteration of the ReasonScape evaluation. Compared to the previous m12x ranking this evaluation fixes a slew of test bugs, refines the task set to add table-extraction, and lifts the context ceiling to 16k - so these rankings are quite a bit different vs the previous m12x Leaderboard (which has an 8k context limit).