In a Training Loop 🔄

4 15 7

NJX-njx

https://github.com/NJX-njx

AI & ML interests

AI infra, large model architecture, intelligent agent,evaluation

Recent Activity

upvoted an article 2 days ago

Microgpt

published an article 2 days ago

Microgpt

replied to FreshmanD's post 5 days ago

LoongFlow Big News!!! @all We’ve put AI Agents into a production GPU cluster to handle GPU failure prediction. Not as a demo. Not as AutoML. But as an evolving system that designs and improves its own models. On two GPU types: – IT21HMDB01-B2: +30% prediction accuracy – H800: +25% prediction accuracy The resulting models already meet production standards and are being wired into the ops pipeline. How it works: • An ML agent designs the full ML pipeline from scratch • A Math agent performs targeted evolutionary optimization • The agents explore, discard, and iterate toward better modelsHumans don’t hand-tune parameters. This is not offline analysis. GPU failure prediction means: • heavy assets • real incidents • real operational risk The agents now trigger maintenance before failures happen. This feels like an early signal: AI agents are starting to take responsibility for infrastructure-level engineering decisions in production systems. For ML Agent, you can check: https://github.com/baidu-baige/LoongFlow

View all activity

Organizations

upvoted an article 2 days ago

Article

Microgpt

2 days ago

•

published an article 2 days ago

Article

Microgpt

2 days ago

•

replied to FreshmanD's post 5 days ago

I saw that this project on GitHub has passed some tests using Kaggle datasets and won 22 gold medals. Could there be a problem of data contamination here? Can we try real-world competition problems on Kaggle?

replied to FreshmanD's post 5 days ago

I roughly get it, it's very interesting. It feels like this is an area that everyone hasn't delved into deeply so far.
I'm still very curious about what exactly these real-time data consist of.

replied to marksverdhei's post 5 days ago

Perhaps we can look forward to diffusion models

upvoted an article 5 days ago

Article

Scaling Mixture of Experts: Architecture Search for Billion-Parameter Language Models

5 days ago

•

replied to marksverdhei's post 5 days ago

Perhaps, instead of thinking about architectural innovation, it is better to study how to reduce the cost of inference.

replied to Imosu's post 5 days ago

To be honest, hf has significant issues with functions such as payment and subscription, which greatly affect its user experience and user retention.

replied to MikeDoes's post 5 days ago

Your work is very meaningful.

replied to FreshmanD's post 5 days ago

I might not have fully understood. Do you deploy an agent to the GPU that you want to detect, and then use the agent to detect problems that the GPU may encounter during operation?

published an article 6 days ago

Article

the practice of ernie5

6 days ago

upvoted a paper 7 days ago

ERNIE 5.0 Technical Report

Paper • 2602.04705 • Published 10 days ago • 250

commented on Community Evals: Because we're done trusting black-box leaderboards over the community 7 days ago

Although such a measure has not solved the problems encountered in the current evaluation, at least it is indeed a very good measure in terms of decentralization and mobilizing the power of the community for co-construction.

upvoted an article 7 days ago

Article

Community Evals: Because we're done trusting black-box leaderboards over the community

10 days ago

•

replied to danielhanchen's post 7 days ago

I want to know what this is provided to our model as. Is it a skill?

commented on Chasing AI, Losing Meaning 14 days ago

you are right

New activity in fixpoer2033/car-battery-lifespan 15 days ago

Inquiry about dataset issues

#2 opened 15 days ago by

NJX-njx

commented on We Got Claude to Build CUDA Kernels and teach open models! 16 days ago

This article is very inspiring to me.

Since skills have become a great boost to the improvement of model capabilities, can we try to distill skills, just like we did model distillation before? I think this can be achieved through multiple iterations.
The current functions of upskill are actually quite complete, but I wonder if we can try to make it generate a compatibility matrix between multiple skills, so that the combined effect is greater than the sum of the parts. In addition, Model A generates skills, and Model B looks for counterexamples, so that they can evolve together.

upvoted an article 16 days ago

Article

We Got Claude to Build CUDA Kernels and teach open models!

17 days ago

•

138

replied to Nanthasit's post 16 days ago

Actually, I think a very important point is that most independent developers do not have enough case studies to support their work, and at the same time, the cost of online deployment is actually a bit high

NJX-njx

AI & ML interests

Recent Activity

Organizations

NJX-njx's activity

Microgpt

Microgpt

Scaling Mixture of Experts: Architecture Search for Billion-Parameter Language Models

the practice of ernie5

Community Evals: Because we're done trusting black-box leaderboards over the community

Inquiry about dataset issues

We Got Claude to Build CUDA Kernels and teach open models!