π TRL v0.29.0 introduces trl-training: an agent-native training skill.
This makes the TRL CLI a structured, agent-readable capability, allowing AI agents to reliably execute training workflows such as: - Supervised Fine-Tuning (SFT) - Direct Preference Optimization (DPO) - Group Relative Policy Optimization (GRPO)
Weβre excited to see what the community builds on top of this.
If youβre working on AI agents, alignment research, or scalable RL training infrastructure: give TRL v0.29.0 a try! π€
Introducing NoesisLab/Kai-3B-Instruct What happens when you force a 3B model to reason entirely in its latent space ? Meet Kai-3B, our latest industrial-grade reasoning model fine-tuned using the Adaptive Dual Search (ADS) algorithm. GSM8K (0-shot, Direct Answer): 39.27% π€― (Llama-2-7B is ~14.6%) HumanEval (Pass@1): 39.02% π» (Overtakes Gemma-2-2B's 30%) MMLU (5-shot): 53.62% π (Crushing the 50% barrier) ARC-Challenge: 51.88%π― PIQA: 77.53% HellaSwag: 69.53% Kai-3B proves that reasoning density doesn't strictly require parameter bloat or verbose generation. It acts as a perfect, cold-blooded Agent action-engineβideal for JSON routing, SWE-bench patch generation, and anywhere you need absolute structured certainty without token waste.
This is the repo that will contain the next experimental stage, which is based entirely on the research and structural boundaries applied by said research. It'll be a little rigid while I get Claude set up.
In order to directly train these layered topological response patchworks you must install and use the geovocab2, geofractal, and wide_compiler repos.
This is due to the wide_compiler's wide_linear high-speed efficiency for ensemble processing, the geovocab2 factory structure with multiple formulas including highly efficient designs meant for kernel compilation, and a series of reusable utilities in geofractal including some of the more complex losses and difficult to optimally tune gate structures surrounding them.
Utilization and training USING the pretrained or untrained geolip patchwork will be as simple as loading the model in pytorch and will not require external dependencies of the geolip package, numpy, or pytorch depending on the task. It will come packaged with recommended losses but I encourage experimentation because I simply cannot cover all spectrums.
More details to come as development progresses. The system is coming together and the state of the utilizable autoencoder will be ready within a couple weeks. The entire system is built for convenience and reusability, so the structure will be built similarly to autoencoder systems that currently exist, with a few tweaks here and there for important elements - so the interface will be familiar to those who use it.
What happens when you make an LLM drive a car where physics are real and actions can't be undone?
I ported CARLA, the autonomous driving simulator, to OpenEnv and added training support via TRL + Hugging Face Spaces.
The model interacts with the simulator through tool calls (observe, brake, change lane) and learns from a reward signal.
In 50 training steps, Qwen 0.6B learns to swerve and brake to avoid pedestrians in emergency situations.
The project supports text and vision (VLMs can see through a camera sensor), open-world driving with traffic, and multiple driving scenarios.
This builds on the carla-env project by sinatras, which originally placed LLMs inside CARLA for evaluation. We extended it with vision, new scenarios, rubric-based rewards, and made it trainable end-to-end.
Quick update on the SecureCode dataset family. We've restructured things and fixed several issues:
**What changed:**
- The datasets are now properly split into three repos: [unified](scthornton/securecode) (2,185), [web](scthornton/securecode-web) (1,378), [AI/ML](scthornton/securecode-aiml) (750) - All repos now use Parquet format -- load_dataset() just works, no deprecated loading scripts - SecureCode Web now includes 219 framework-specific examples (Express, Django, Spring Boot, Flask, Rails, Laravel, ASP.NET Core, FastAPI, NestJS) - Data cards have been corrected and split sizes fixed
**Why it matters:**
With AI-generated code accounting for 60%+ of some codebases (Checkmarx 2025), security training data is more important than ever. Every example in SecureCode is grounded in a real CVE with 4-turn conversations that mirror actual developer-AI workflows.
If you're working on code generation models, I'd love to hear how you're approaching the security angle. Are there vulnerability categories or frameworks you'd like to see covered?
Just open sourced LavaSR v2: a model that can enhance 5000 seconds of audio in 1 second while being higher quality than giant and slow 6gb diffusion models!
It works with any sampling rate from 8-48khz and is nearly 5000x faster than competition while being superior in objective benchmarks.
LavaSR v2 is Perfect for - Enhancing TTS models. - Fixing old audio datasets. - Restoring low quality recordings.
You can check out the examples and run it locally or online:
Just open sourced LavaSR v2: a model that can enhance 5000 seconds of audio in 1 second while being higher quality than giant and slow 6gb diffusion models!
It works with any sampling rate from 8-48khz and is nearly 5000x faster than competition while being superior in objective benchmarks.
LavaSR v2 is Perfect for - Enhancing TTS models. - Fixing old audio datasets. - Restoring low quality recordings.
You can check out the examples and run it locally or online: