Join the conversation
Join the community of Machine Learners and AI enthusiasts.
Sign UpAll HF Hub posts
Post
386
🐦🔥 I've just published Sentence Transformers v5.2.0! It introduces multi-processing for CrossEncoder (rerankers), multilingual NanoBEIR evaluators, similarity score outputs in mine_hard_negatives, Transformers v5 support and more. Details:
- CrossEncoder multi-processing: Similar to SentenceTransformer and SparseEncoder, you can now use multi-processing with CrossEncoder rerankers. Useful for multi-GPU and CPU settings, and simple to configure: just
- Multilingual NanoBEIR Support: You can now use community translations of the tiny NanoBEIR retrieval benchmark instead of only the English one, by passing
- Similarity scores in Hard Negatives Mining: When mining for hard negatives to create a strong training dataset, you can now pass
- Transformers v5: This release works with both Transformers v4 and the upcoming v5. In the future, Sentence Transformers will only work with Transformers v5, but not yet!
- Python 3.9 deprecation: Now that Python 3.9 has lost security support, Sentence Transformers no longer supports it.
Check out the full changelog for more details: https://github.com/huggingface/sentence-transformers/releases/tag/v5.2.0
I'm quite excited about what's coming. There's a huge draft PR with a notable refactor in the works that should bring some exciting support. Specifically, better multimodality, rerankers, and perhaps some late interaction in the future!
- CrossEncoder multi-processing: Similar to SentenceTransformer and SparseEncoder, you can now use multi-processing with CrossEncoder rerankers. Useful for multi-GPU and CPU settings, and simple to configure: just
device=["cuda:0", "cuda:1"] or device=["cpu"]*4 on the model.predict or model.rank calls.- Multilingual NanoBEIR Support: You can now use community translations of the tiny NanoBEIR retrieval benchmark instead of only the English one, by passing
dataset_id, e.g. dataset_id="lightonai/NanoBEIR-de" for the German benchmark.- Similarity scores in Hard Negatives Mining: When mining for hard negatives to create a strong training dataset, you can now pass
output_scores=True to get similarity scores returned. This can be useful for some distillation losses!- Transformers v5: This release works with both Transformers v4 and the upcoming v5. In the future, Sentence Transformers will only work with Transformers v5, but not yet!
- Python 3.9 deprecation: Now that Python 3.9 has lost security support, Sentence Transformers no longer supports it.
Check out the full changelog for more details: https://github.com/huggingface/sentence-transformers/releases/tag/v5.2.0
I'm quite excited about what's coming. There's a huge draft PR with a notable refactor in the works that should bring some exciting support. Specifically, better multimodality, rerankers, and perhaps some late interaction in the future!
sergiopaniego
posted an update
1 day ago
Post
1520
ICYMI, you can fine-tune open LLMs using Claude Code
just tell it:
“Fine-tune Qwen3-0.6B on open-r1/codeforces-cots”
and Claude submits a real training job on HF GPUs using TRL.
it handles everything:
> dataset validation
> GPU selection
> training + Trackio monitoring
> job submission + cost estimation
when it’s done, your model is on the Hub, ready to use
read more about the process: https://huggingface.co/blog/hf-skills-training
just tell it:
“Fine-tune Qwen3-0.6B on open-r1/codeforces-cots”
and Claude submits a real training job on HF GPUs using TRL.
it handles everything:
> dataset validation
> GPU selection
> training + Trackio monitoring
> job submission + cost estimation
when it’s done, your model is on the Hub, ready to use
read more about the process: https://huggingface.co/blog/hf-skills-training
sergiopaniego
posted an update
1 day ago
Post
671
Z-Image Turbo + LoRA ⚡
ovi054/Z-Image-LORA
Z-Image Turbo is the No. 1 trending Text-to-Image model right now. You can add a custom LoRA and generate images with this Space.
👉 Try it now: ovi054/Z-Image-LORA
ovi054/Z-Image-LORA
Z-Image Turbo is the No. 1 trending Text-to-Image model right now. You can add a custom LoRA and generate images with this Space.
👉 Try it now: ovi054/Z-Image-LORA
Post
1569
installama.sh at the TigerBeetle 1000x World Tour !
Last week I had the chance to give a short talk during the TigerBeetle 1000x World Tour (organized by @jedisct1 👏 ) a fantastic event celebrating high-performance engineering and the people who love pushing systems to their limits!
In the talk, I focused on the CPU and Linux side of things, with a simple goal in mind: making the installation of llama.cpp instant, automatic, and optimal, no matter your OS or hardware setup.
For the curious, here are the links worth checking out:
Event page: https://tigerbeetle.com/event/1000x
GitHub repo: https://github.com/angt/installama.sh
Talk: https://youtu.be/pg5NOeJZf0o?si=9Dkcfi2TqjnT_30e
More improvements are coming soon. Stay tuned!
Last week I had the chance to give a short talk during the TigerBeetle 1000x World Tour (organized by @jedisct1 👏 ) a fantastic event celebrating high-performance engineering and the people who love pushing systems to their limits!
In the talk, I focused on the CPU and Linux side of things, with a simple goal in mind: making the installation of llama.cpp instant, automatic, and optimal, no matter your OS or hardware setup.
For the curious, here are the links worth checking out:
Event page: https://tigerbeetle.com/event/1000x
GitHub repo: https://github.com/angt/installama.sh
Talk: https://youtu.be/pg5NOeJZf0o?si=9Dkcfi2TqjnT_30e
More improvements are coming soon. Stay tuned!
melvindave
posted an update
2 days ago
Post
2473
Currently having a blast learning the transformers library.
I noticed that model cards usually have Transformers code as usage examples.
So I tried to figure out how to load a model just using the transformers library without using ollama, lmstudio, or llamacpp.
Learned how to install dependencies required to make it work like pytorch and CUDA. I also used Conda for python environment dependencies.
Once I got the model loaded and sample inference working, I made an API to serve it.
I know it's very basic stuff for machine learning experts here in HF but I'm completely new to this so I'm happy to get it working!
Model used: Qwen/Qwen3-VL-8B-Instruct
GPU: NVIDIA GeForce RTX 3090
Here's the result of my experimentation
I noticed that model cards usually have Transformers code as usage examples.
So I tried to figure out how to load a model just using the transformers library without using ollama, lmstudio, or llamacpp.
Learned how to install dependencies required to make it work like pytorch and CUDA. I also used Conda for python environment dependencies.
Once I got the model loaded and sample inference working, I made an API to serve it.
I know it's very basic stuff for machine learning experts here in HF but I'm completely new to this so I'm happy to get it working!
Model used: Qwen/Qwen3-VL-8B-Instruct
GPU: NVIDIA GeForce RTX 3090
Here's the result of my experimentation
Post
570
Introducing 🇨🇭WindowSeat🇨🇭 –– our new method for removing reflections from photos taken through windows, on planes, in malls, offices, and other glass-filled environments.
Finetuning a foundation diffusion transformer for reflection removal quickly runs up against the limits of what existing datasets and techniques can offer. To fill that gap, we generate physically accurate examples in Blender that simulate realistic glass and reflection effects. This data enables strong performance on both established benchmarks and previously unseen images.
To make this practical, the open-source Apache-2 model builds on Qwen-Image-Edit-2509, a 20B image-editing diffusion transformer that runs on a single GPU and can be fine-tuned in about a day. WindowSeat keeps its use of the underlying DiT cleanly separated from the data and training recipe, allowing future advances in base models to be incorporated with minimal friction.
Try it out with your own photos in this interactive demo:
🤗 toshas/windowseat-reflection-removal
Other resources:
🌎 Website: huawei-bayerlab/windowseat-reflection-removal-web
🎓 Paper: Reflection Removal through Efficient Adaptation of Diffusion Transformers (2512.05000)
🤗 Model: huawei-bayerlab/windowseat-reflection-removal-v1-0
🐙 Code: https://github.com/huawei-bayerlab/windowseat-reflection-removal
Team: Daniyar Zakarin ( @daniyarzt )*, Thiemo Wandel ( @thiemo-wandel )*, Anton Obukhov ( @toshas ), Dengxin Dai.
*Work done during internships at HUAWEI Bayer Lab
Finetuning a foundation diffusion transformer for reflection removal quickly runs up against the limits of what existing datasets and techniques can offer. To fill that gap, we generate physically accurate examples in Blender that simulate realistic glass and reflection effects. This data enables strong performance on both established benchmarks and previously unseen images.
To make this practical, the open-source Apache-2 model builds on Qwen-Image-Edit-2509, a 20B image-editing diffusion transformer that runs on a single GPU and can be fine-tuned in about a day. WindowSeat keeps its use of the underlying DiT cleanly separated from the data and training recipe, allowing future advances in base models to be incorporated with minimal friction.
Try it out with your own photos in this interactive demo:
🤗 toshas/windowseat-reflection-removal
Other resources:
🌎 Website: huawei-bayerlab/windowseat-reflection-removal-web
🎓 Paper: Reflection Removal through Efficient Adaptation of Diffusion Transformers (2512.05000)
🤗 Model: huawei-bayerlab/windowseat-reflection-removal-v1-0
🐙 Code: https://github.com/huawei-bayerlab/windowseat-reflection-removal
Team: Daniyar Zakarin ( @daniyarzt )*, Thiemo Wandel ( @thiemo-wandel )*, Anton Obukhov ( @toshas ), Dengxin Dai.
*Work done during internships at HUAWEI Bayer Lab
Post
603
Muon vs MuonClip vs Muon+Adamw
Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.
Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.
Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.
Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.
Full Blog Link: https://huggingface.co/blog/KingNish/optimizer-part1
Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.
Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.
Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.
Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.
Full Blog Link: https://huggingface.co/blog/KingNish/optimizer-part1
leonardlin
posted an update
1 day ago
Post
1637
We just released our latest Shisa V2.1 Japanese multi-lingual models: https://huggingface.co/collections/shisa-ai/shisa-v21
Besides updates to our 14B, and 70B, we have a new LFM2-based 1.2B, Llama 3.2-based 3B, and Qwen 3-based 8B, all with class-leading Japanese language capabilities.
Per usual, lots of details in the Model Cards for those interested.
Besides updates to our 14B, and 70B, we have a new LFM2-based 1.2B, Llama 3.2-based 3B, and Qwen 3-based 8B, all with class-leading Japanese language capabilities.
Per usual, lots of details in the Model Cards for those interested.