1 74

Alexander

AlexanderKyng

AI & ML interests

Deeply passionate in all things AI. Junior Data Scientist.

Recent Activity

liked a model about 13 hours ago

unsloth/GLM-4.7-Flash-GGUF

reacted to unmodeled-tyler's post with 👍 19 days ago

Happy New Year, Hugging Face! It's been a crazy year for me! This year I launched VANTA Research as a solo operator and managed to push out 14 original open source finetunes and 5 datasets in the span of about 4 months, completely on my own. The reception has been much higher than I ever anticipated and sincerely appreciate everyone that's checked out my work thus far. The good news is, I'm just getting started! In 2026 you can expect even more original models from VANTA Research, more open source datasets, and maybe some other cool things as well? 👀 2026 is gonna be big for AI in general, and I can't wait to experience it with all of you!

liked a model about 1 month ago

nomic-ai/nomic-embed-text-v1.5

View all activity

Organizations

None yet

liked a model about 13 hours ago

unsloth/GLM-4.7-Flash-GGUF

Text Generation • 30B • Updated 3 minutes ago • 54k • 195

reacted to unmodeled-tyler's post with 👍 19 days ago

Post

1388

Happy New Year, Hugging Face!

It's been a crazy year for me! This year I launched VANTA Research as a solo operator and managed to push out 14 original open source finetunes and 5 datasets in the span of about 4 months, completely on my own.

The reception has been much higher than I ever anticipated and sincerely appreciate everyone that's checked out my work thus far.

The good news is, I'm just getting started! In 2026 you can expect even more original models from VANTA Research, more open source datasets, and maybe some other cool things as well? 👀

2026 is gonna be big for AI in general, and I can't wait to experience it with all of you!

1 reply

liked a model about 1 month ago

nomic-ai/nomic-embed-text-v1.5

New activity in AlexanderKyng/Devstral-Small-2-24B-Instruct-2512-exl3-4.5bpw-optimized about 1 month ago

Missing file preprocessor_config.json?

#1 opened about 1 month ago by

atisharma

updated a model about 1 month ago

AlexanderKyng/Devstral-Small-2-24B-Instruct-2512-exl3-4.5bpw-optimized

8B • Updated Dec 10, 2025 • 23 • 2

published a model about 1 month ago

AlexanderKyng/Devstral-Small-2-24B-Instruct-2512-exl3-4.5bpw-optimized

8B • Updated Dec 10, 2025 • 23 • 2

liked 2 models about 1 month ago

mistralai/Devstral-Small-2-24B-Instruct-2512

24B • Updated about 1 month ago • 280k • 491

apple/DiffuCoder-7B-cpGRPO

8B • Updated Dec 8, 2025 • 2.56k • 316

reacted to Kseniase's post with ❤️ about 2 months ago

Post

6248

9 Recent advances in Multi-Agent Systems (all open-source)

The idea to split tasks across multiple agents instead of relying on one universal agent is now seen as one of the most effective ways to build an AI stack. Concepts like “agent swarms” were highlighted at the AI Engineer Code Summit in NYC (Nov 20–21) as the winning architecture. And this trend is not only about coding and software. It applies across all AI domains.

So here is some recent research that helps keep multi-agent systems (MAS) better and up-to-date:

1. LatentMAS → Latent Collaboration in Multi-Agent Systems (2511.20639)
AI agents share their hidden "thoughts" directly in latent space instead of talking through text. This makes collaboration and reasoning way faster and accurate (no extra training needed)

2. Puppeteer → Multi-Agent Collaboration via Evolving Orchestration (2505.19591)
Uses a “puppeteer” LLM that dynamically decides which agents (“puppets”) to call and in what order. By learning this orchestration with reinforcement learning (RL), the system solves complex tasks more efficiently and with fewer compute costs

3. MADD → MADD: Multi-Agent Drug Discovery Orchestra (2511.08217)
A MAS with 4 agents for drug discovery. It lets researchers describe a drug discovery task in plain language. Then MADD automatically builds and runs the full hit-identification pipeline, making AI-driven drug design a simple end-to-end workflow

4. Multi-Agent Tool-Integrated Policy Optimization (MATPO) → Multi-Agent Tool-Integrated Policy Optimization (2510.04678)
Lets one LLM act as multiple agents (like a planner and a worker) by using different prompts and training them together with RL. So you get the benefits of a multi-agent system without needing multiple models

If you're interested in trends in multi-agent for software development of the future, explore my article with the emergent playbook. This is super interesting → https://www.turingpost.com/p/aisoftwarestack
Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

Read further below ⬇️

2 replies

liked a model about 2 months ago

PrimeIntellect/INTELLECT-3

Text Generation • 107B • Updated Nov 27, 2025 • 2.54k • 203

liked 2 models 2 months ago

nex-agi/Qwen3-30B-A3B-Nex-N1

31B • Updated Dec 5, 2025 • 36 • 15

janhq/Jan-v2-VL-high

Image-Text-to-Text • 9B • Updated Nov 14, 2025 • 468 • 90

updated a model 2 months ago

AlexanderKyng/qwen3-coder-30b-a3b-instruct-exl3-4.0bpw-optimized

Text Generation • 8B • Updated Nov 17, 2025 • 1

published a model 2 months ago

AlexanderKyng/qwen3-coder-30b-a3b-instruct-exl3-4.0bpw-optimized

Text Generation • 8B • Updated Nov 17, 2025 • 1

liked a model 2 months ago

cerebras/MiniMax-M2-REAP-162B-A10B

Text Generation • 162B • Updated Nov 15, 2025 • 347 • 76

reacted to TravisMuhlestein's post with 🔥 2 months ago

Post

1407

Building Smarter AI Agents: A Tool-Based Architecture for Modularity and Trust

Over the past year, our AI engineering team at GoDaddy has been rethinking how to make agent systems more modular, transparent, and production-ready. Instead of viewing an AI agent as a monolithic process, we’ve decomposed it into four core tools that separate decision-making from execution — a design that’s proving critical for scale and observability:

🧩 MemoryTool – maintains persistent context and user continuity
✅ CompletionTool – determines when a task is truly complete
💬 UserInteractionTool – manages clarifications, approvals, and confirmations
🔁 DelegationTool – enables agents to hand off tasks to other agents or humans

This approach makes every step of an agent’s workflow explicit, testable, and auditable, allowing us to scale AI systems in production with higher confidence. We see this as a step toward a more open, composable agent ecosystem — one where frameworks can interoperate and agents can build trust through transparency and version control.

Read the full write-up here → Building AI Agents at GoDaddy – An Agent’s Toolkit https://www.godaddy.com/resources/news/building-ai-agents-at-godaddy-an-agents-toolkit

We’d love to collaborate and exchange ideas with the community:

- How are you designing modular agent architectures?
- What design patterns or abstractions have helped you manage agent complexity?

Let’s build smarter, safer agents together.

#AI #Agents #Architecture #MachineLearning #OpenSource #AgentFrameworks #TrustInAI

reacted to mike-ravkine's post with 🔥 2 months ago

Post

2773

There is no anxiety quite like powering up 2KW of basement compute after rewiring it all. Small bit of trouble with the horizontal 3090 because I misread my motherboard manual, but otherwise so far so good.. Next we see if I've built up enough cooling to hit my target TDP on those 3-slot nvlinked cards especially. The 4-slot bridges are much easier to work with but their prices went bananas and I couldn't acquire a second, so gotta get a little creative with intakes.

5 replies

reacted to codelion's post with 🔥 3 months ago

Post

3613

On this day in 2019, OpenAI released the final GPT-2 model as part of their staged release. I still remember that November well - so much was happening, but GPT-2's release felt like a watershed moment for the field. It showed us what was possible with carefully trained language models.

To recreate some of that GPT-2 magic, I recently tackled an interesting challenge: can you pretrain a language model with just 1 billion tokens - roughly 1/10th of what GPT-2 used - and still get comparable performance? After 50+ systematic experiments testing different dataset mixtures, the answer is yes.

The result is codelion/gpt-2-70m, which achieves over 90% of GPT-2's benchmark performance despite being trained on 10x less data. The key was finding the optimal dataset composition: 50% high-quality textbook PDFs, 30% filtered web content, and 20% educational resources. It even beats GPT-2 on TruthfulQA (47.31% vs 40.69%).

If you're interested in the full story of how we discovered this optimal mixture and why curriculum learning catastrophically failed, check out the complete article: https://huggingface.co/blog/codelion/optimal-dataset-mixing

Sometimes less really is more - when you mix it right.

1 reply

reacted to abidlabs's post with 👍 3 months ago

Post

9223

Why I think local, open-source models will eventually win.

The most useful AI applications are moving toward multi-turn agentic behavior: systems that take hundreds or even thousands of iterative steps to complete a task, e.g. Claude Code, computer-control agents that click, type, and test repeatedly.

In these cases, the power of the model is not how smart it is per token, but in how quickly it can interact with its environment and tools across many steps. In that regime, model quality becomes secondary to latency.

An open-source model that can call tools quickly, check that the right thing was clicked, or verify that a code change actually passes tests can easily outperform a slightly “smarter” closed model that has to make remote API calls for every move.

Eventually, the balance tips: it becomes impractical for an agent to rely on remote inference for every micro-action. Just as no one would tolerate a keyboard that required a network request per keystroke, users won’t accept agent workflows bottlenecked by latency. All devices will ship with local, open-source models that are “good enough” and the expectation will shift toward everything running locally. It’ll happen sooner than most people think.

8 replies

reacted to nouamanetazi's post with 🤗 3 months ago

Post

4310

After training 𝐒𝐦𝐨𝐥𝐋𝐌𝟑 on 𝟑𝟖𝟒 𝐇𝟏𝟎𝟎𝐬 for nearly a month, I've come to realize something most people overlook: 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐢𝐬 𝐭𝐡𝐞 𝐦𝐚𝐤𝐞-𝐨𝐫-𝐛𝐫𝐞𝐚𝐤 𝐟𝐚𝐜𝐭𝐨𝐫 𝐢𝐧 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠. 🔥

Everyone talks about model architecture and data quality. And yes, those matter immensely. But here's what nobody tells you: when your training run fails at 2 AM because of mysterious 𝐍𝐂𝐂𝐋 𝐞𝐫𝐫𝐨𝐫𝐬, or when your expensive GPU cluster is running at 𝟔𝟎% 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐜𝐲, the problem isn't your model. It's most probably a 𝐦𝐢𝐬𝐮𝐬𝐞 𝐨𝐟 𝐭𝐡𝐞 𝐡𝐚𝐫𝐝𝐰𝐚𝐫𝐞. 🛠️

Questions that seemed simple but had no clear answers: Why is 𝐌𝐨𝐄 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐬𝐥𝐨𝐰𝐞𝐫 𝐭𝐡𝐚𝐧 𝐝𝐞𝐧𝐬𝐞 𝐦𝐨𝐝𝐞𝐥𝐬? Which 𝐍𝐂𝐂𝐋 𝐟𝐥𝐚𝐠𝐬 should we actually set? How often should we checkpoint without killing throughput?

That's why we built 𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤 📖: a complete guide covering everything from model architecture and data curation to the SmolLM3 training marathon, post-training techniques, and crucially, the 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐥𝐚𝐲𝐞𝐫 that most teams get wrong.

We validated real vs theoretical bandwidth across the entire stack: 𝐇𝐁𝐌𝟑 𝐡𝐢𝐭𝐭𝐢𝐧𝐠 𝟑 𝐓𝐁/𝐬, 𝐍𝐕𝐋𝐢𝐧𝐤 𝟒.𝟎 𝐫𝐞𝐚𝐜𝐡𝐢𝐧𝐠 𝟕𝟖𝟔 𝐆𝐁/𝐬, 𝐏𝐂𝐈𝐞 𝐆𝐞𝐧𝟒 𝐚𝐭 𝟏𝟒.𝟐 𝐆𝐁/𝐬. Then we ran collective operations across 𝟏𝟐𝟖 𝐆𝐏𝐔𝐬 (16 nodes, 8xH100s each) and measured how performance degrades at scale: all-reduce drops from 𝟒𝟖𝟎 𝐆𝐁/𝐬 on a single node to 𝟑𝟐𝟎-𝟑𝟓𝟎 𝐆𝐁/𝐬 across 16 nodes.

If you've ever wondered why your training runs are slower than they should be, or you're planning to scale up and want to avoid expensive mistakes, this guide might save you weeks of debugging.

𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤: https://lnkd.in/e5MKXUHS

Shared with ❤️ by the HuggingFace team

Alexander

AI & ML interests

Recent Activity

Organizations

AlexanderKyng's activity

Missing file preprocessor_config.json?