Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

SeaWolf-AI 
posted an update about 8 hours ago
view post
Post
824
🌍 World Model Bench — does your world model actually think?

FID measures realism. FVD measures smoothness. But neither tells you whether the model understood the scene.

We just released WM Bench — the first benchmark for cognitive intelligence in world models. The core question: when a beast charges from 3 meters away, does the model know to sprint — not walk? Does it respond differently to a human vs an animal? Does it remember the left corridor was blocked two steps ago?

Those are cognitive questions. No existing benchmark asks them. So we built one.

3 Pillars · 10 Categories · 100 Scenarios · 1,000-point scale

- 👁 P1 Perception (25%) — Can it read the scene?
- 🧠 P2 Cognition (45%) — Does it predict threats, escalate emotions, utilize memory?
- 🔥 P3 Embodiment (30%) — Does the body respond with the right motion?

All evaluation is via simple JSON I/O — no 3D engine, no special hardware. Any model with an API can participate.

We also built PROMETHEUS as a live reference implementation — runs in your browser on a T4, no install needed. Combines FloodDiffusion motion generation with a LLM cognitive brain (Perceive → Predict → Decide → Act). Scored 726/1000 (Grade B) on Track C — the only directly verified model so far. Submissions from other teams very welcome.

---

🗂 Dataset → FINAL-Bench/World-Model
🌍 Demo → FINAL-Bench/World-Model
🏆 Leaderboard → FINAL-Bench/worldmodel-bench
📝 Article → https://huggingface.co/blog/FINAL-Bench/world-model

Part of the FINAL Bench Family — alongside FINAL Bench (Feb 2026). Feedback on rubrics and missing models always welcome!
reaperdoesntknow 
posted an update 2 days ago
view post
Post
3067
We present a methodology for training small language models on CPU at FP32 precision
that achieves capability-per-dollar efficiency orders of magnitude beyond GPU-based training.
Across15modelsspanningfournovelarchitecturefamilies—MixtureofAttentions(MoA),cross-
architecture fusion (Qemma), swarm intelligence (SAGI), and metric-space causal language
models (DiscoverLM)—total compute cost was $24 on a single AMD EPYC 9454P proces-
sor. We introduce seven methodological pillars: (1) FP32 precision preservation, with exper-
iments demonstrating 5,810×single-operation error and 23,225×compounding error ratio for
FP16 at network depth; (2) sparse cognitive architectures where 0.02–7% of parameters activate
per token, matching CPU branching rather than GPU SIMD; (3) developmental curriculum
training progressing from language to logic to transfer to depth; (4) continuous belt-fed data
ingestion eliminating truncation waste; (5) hardware-native optimization for AMD Zen 4 via
AOCL/OpenMP/NUMA-aware allocation; (6) self-regulating thermodynamic governance with
emergent temperature measurement grounded in L2-star discrepancy; and (7) open-standard
compute (AVX2 SIMD at FP32) free of proprietary vendor dependency. We argue that trans-
formers were designed for GPU hardware rather than mathematical optimality, and that archi-
tectures designed for geometric correctness—metric-space attention, triangle inequality enforce-
ment, sparse expert routing—naturally favor CPU execution. For sub-2B parameter models,
CPU training produces more capable models at a fraction of the cost.
  • 6 replies
·
Shrijanagain 
posted an update about 14 hours ago
prithivMLmods 
posted an update 3 days ago
view post
Post
3909
Flux-Klein-KV-Edit-Consistency demo is now available on Spaces. It preserves character identity and delivers high-quality, realistic results after edits. No need for any special prompts, just upload the image, type your prompt, and get the resulting image blazing fast.

🔥 Demo Space: prithivMLmods/flux-klein-kv-edit-consistency
🤗 Model: black-forest-labs/FLUX.2-klein-9b-kv
🤗 Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
🔗 Gradio Server Mode: https://www.gradio.app/main/guides/server-mode

➔ Built with Headless Gradio, an alternative to using gr.Blocks for creating the frontend and triggering events, powered by FastAPI + Gradio. You can now design the frontend however you want, with continued support for APIs, MCP, and ZeroGPU.

➔ Gradio Server Mode is now available from gradio@v6.10.0.

To learn more, visit the app page or the respective model pages.
branikita 
posted an update 3 days ago
view post
Post
2961
We have received the majority of components for our first small commercial batch of SO-ARM101 robotic arms. Feetech STS3250 servo drives, parallel gripper, and depth camera.
  • 1 reply
·
prometechinc 
posted an update 3 days ago
view post
Post
2185
Cicikuş v4-5B (POFUDUK Edition) is a next-generation compact language model engineered for high-efficiency reasoning, adaptive intelligence, and behavioral coherence. Built on the Gemma 4B IT foundation and enhanced through advanced LoRA optimization and selective layer reconstruction, this model delivers powerful performance without the overhead of massive parameter counts.
🔗 Explore the model: pthinc/pofuduk_cicikus_v4_5B
🧠 Why Cicikuş?
In a world dominated by massive LLMs, Cicikuş takes a different path:
⚡ Fast & Efficient — Designed for edge deployment and low-resource environments
🎯 High Reasoning Accuracy — Strong results across MMLU, GSM8K, HumanEval, and more
🧩 Behavior-Aware Intelligence — Powered by the Behavioral Consciousness Engine (BCE)
🔍 Low Hallucination Rate — ~3% with built-in ethical filtering
🌍 Multilingual Capable — Optimized for English and Turkish
kanaria007 
posted an update 2 days ago
view post
Post
151
✅ Article highlight: *Adversarial SI* (art-60-050, v0.1)

TL;DR:
If SI-Core is meant for real deployment, it cannot assume benevolent actors. This article looks at *adversarial SI*: malicious Jumps, malicious RML calls, poisoned Genius Traces, metric gaming, compromised peers, and policy-plane artifacts as attack surfaces.

The core claim is simple: *OBS / ID / MEM / ETH / EVAL / PoLB are not just governance layers — they are also a defensive fabric.*

Read:
kanaria007/agi-structural-intelligence-protocols

Why it matters:
• treats SI-Core invariants as security invariants, not just safety abstractions
• makes abuse structurally expensive through traceability, fail-closed ETH, and scoped capabilities
• reuses *SCover / SCI / CAS* as security and forensics signals
• treats red-teaming as structured experimentation, not ad hoc chaos

What’s inside:
• an SI-native threat taxonomy: malicious Jumps, RML abuse, peer spoofing, metric gaming, policy-plane tampering
• defensive uses of *ID / OBS / MEM / ETH / EVAL / PoLB*
• malicious Genius Traces and how to vet or quarantine them
• *incident response as an SIR-native process*
• federated trust, revocation, quarantine, and graceful degradation
• red-team EvalSurfaces and abuse-resistant PoLB recipes

Key idea:
The goal is not invincibility. It is to make abuse *hard to execute, easy to detect, and easy to learn from* using the same structural language as the rest of SI-Core.
prabhatkr 
posted an update 3 days ago
view post
Post
1220
🚀 Is Vector RAG Dead? Why We Built FastMemory to Beat PageIndex
If you've built a RAG pipeline for complex financial documents, you already know the painful truth: Standard vector search fails when things get complicated.

While tools like PageIndex and Mafin 2.5 provide great out-of-the-box PDF chat experiences, they hit structural bottlenecks the second you push them past basic queries.

We just published a comprehensive benchmark study comparing FastMemory against PageIndex across 5 advanced datasets. The results fundamentally change how we should think about document ingestion.

Read more: https://x.com/FastBuilderAI/status/2037404008978018493
  • 2 replies
·
lakj7 
posted an update 3 days ago
view post
Post
402
Neural Gas is a classical unsupervised learning algorithm for vector quantization and topology learning, introduced in the early 1990s. It maintains a set of prototype vectors that move through the data space and gradually approximate the underlying distribution by ranking samples and adapting all units accordingly.

While the original formulation is algorithmically elegant, most existing implementations remain procedural and non-differentiable, which limits their integration with modern deep learning systems.

This project introduces a **differentiable** implementation of Neural Gas in PyTorch:
https://github.com/francesco-p/ngas-pytorch

The key idea is to reinterpret the update rules in a way that is compatible with autograd, allowing the algorithm to be embedded inside end-to-end trainable pipelines.

This shift enables several directions that are difficult or impossible with standard implementations:

- joint optimization of Neural Gas with neural networks
- inclusion of topology-learning modules inside differentiable models
- gradient-based tuning of algorithm parameters
- hybrid architectures combining representation learning and vector quantization

The repository provides a clean PyTorch implementation and focuses on making the core mechanism usable as a first-class differentiable component, rather than a standalone preprocessing step.

In parallel, an interactive playground was built to visualize the behavior of Neural Gas during training and better understand how prototypes adapt to the data distribution:
https://francesco-p.github.io/res/neural-gas/playground.html

The goal is to revisit a well-known algorithm and make it compatible with current machine learning workflows, where differentiability is a central constraint rather than an afterthought.
PhysiQuanty 
posted an update about 5 hours ago
view post
Post
28
🧬 Can an LLM speak in binary ?
YES ... RADIX 2 / VOCAB 4

>_ Can an LLM execute logic gates and boolean arithmetic ?

We need to collaboratively create datasets:
- Neural Arithmetic and Logic Unit (NALU)
- Neural Application Binary Interface (NABI)
(deterministic task)

This opens the way for code writing and execution by the LLMs themselves without an external CLI.

🙏 Please share 🙏

The more of us who want it, the more possible it will become.

PhysiQuanty/Binary-LLM-POC
PhysiQuanty/Binary-Addition-LLM-POC