1 1

Doug PRO

dougeeai

dougeeai

AI & ML interests

CUDA, Sovereign AI, OSS GenAI, LLM fine-tuning, VRAM optimization, RAG, SLM Agents

Recent Activity

posted an update 3 days ago

## Llama-cpp-python wheels for Windows - update Pre-compiled wheels for `llama-cpp-python` on Windows. No Visual Studio, no CUDA Toolkit setup. `pip install` and run. ### New in this update - **sm_120 (consumer/workstation Blackwell) support.** A single wheel now covers both sm_100 (datacenter) and sm_120 (RTX 5090 / 5080 / 5070 / 5060 / 5050, RTX PRO 6000 / 5000 / 4500 / 4000 / 2000 Blackwell). - **llama-cpp-python 0.3.20** across all four architectures (Blackwell, Ada, Ampere, Turing). Brings Gemma 4 support via the updated llama.cpp core. - **One wheel covers Python 3.10 through 3.13.** The 0.3.20 builds use `py3-none` tagging, no more per-interpreter builds. - **Fixed three mislabeled 0.3.16 sm_86 wheels** that linked against the wrong CUDA cuBLAS. Properly-built replacement is available. ### Coverage - **GPUs:** RTX 20 / 30 / 40 / 50 series, RTX PRO Blackwell workstation, B100 / B200 / B300 datacenter - **CUDA:** 11.8 / 12.1 / 13.0 - **Python:** 3.10, 3.11, 3.12, 3.13 ### Download https://github.com/dougeeai/llama-cpp-python-wheels Linux wheels still on the roadmap. File an issue if you need a specific configuration built. Tags: #llama-cpp #gguf #windows #prebuilt #blackwell #rtx5090 #rtxpro6000 #rtxproblackwell #gemma4

liked a model 5 months ago

microsoft/phi-4-gguf

upvoted an article 5 months ago

🌳 QAT: The Art of Growing a Bonsai Model

View all activity

Organizations

None yet

posted an update 3 days ago

Post

132

## Llama-cpp-python wheels for Windows - update

Pre-compiled wheels for llama-cpp-python on Windows. No Visual Studio, no CUDA Toolkit setup. pip install and run.

### New in this update

- **sm_120 (consumer/workstation Blackwell) support.** A single wheel now covers both sm_100 (datacenter) and sm_120 (RTX 5090 / 5080 / 5070 / 5060 / 5050, RTX PRO 6000 / 5000 / 4500 / 4000 / 2000 Blackwell).
- **llama-cpp-python 0.3.20** across all four architectures (Blackwell, Ada, Ampere, Turing). Brings Gemma 4 support via the updated llama.cpp core.
- **One wheel covers Python 3.10 through 3.13.** The 0.3.20 builds use py3-none tagging, no more per-interpreter builds.
- **Fixed three mislabeled 0.3.16 sm_86 wheels** that linked against the wrong CUDA cuBLAS. Properly-built replacement is available.

### Coverage

- **GPUs:** RTX 20 / 30 / 40 / 50 series, RTX PRO Blackwell workstation, B100 / B200 / B300 datacenter
- **CUDA:** 11.8 / 12.1 / 13.0
- **Python:** 3.10, 3.11, 3.12, 3.13

### Download

https://github.com/dougeeai/llama-cpp-python-wheels

Linux wheels still on the roadmap. File an issue if you need a specific configuration built.

Tags: #llama-cpp #gguf #windows #prebuilt #blackwell #rtx5090 #rtxpro6000 #rtxproblackwell #gemma4

posted an update 5 months ago

Post

517

Llama.cpp wheels for Windows - Hot off the press!

I got tired of fighting with Visual Studio and CUDA Toolkit every time I wanted to use llama-cpp-python on Windows, so I've been building pre-compiled wheels for the community.

## What's Available:
✅ RTX 50/40/30/20 Series support (Blackwell, Ada, Ampere, Turing)
✅ CUDA 11.8, 12.1, 13.0 (Blackwell is CUDA 13 only)
✅ Python 3.10-3.13
✅ Just 'pip install' and run - no build tools needed

## Why this matters:
Windows users face a painful setup process with llama-cpp-python. These wheels eliminate:
- Visual Studio installation
- CUDA Toolkit setup
- Compilation errors
- Hours of troubleshooting

**Download:** https://github.com/dougeeai/llama-cpp-python-wheels

Linux wheels coming soon! Let me know what configs you need.

Tested on Ada Lovelace & Ampere

dougeeai/llama-cpp-python-wheels

#llama-cpp #gguf #windows #prebuilt

1 reply

reacted to abidlabs's post with 🔥 6 months ago

Post

10711

Why I think local, open-source models will eventually win.

The most useful AI applications are moving toward multi-turn agentic behavior: systems that take hundreds or even thousands of iterative steps to complete a task, e.g. Claude Code, computer-control agents that click, type, and test repeatedly.

In these cases, the power of the model is not how smart it is per token, but in how quickly it can interact with its environment and tools across many steps. In that regime, model quality becomes secondary to latency.

An open-source model that can call tools quickly, check that the right thing was clicked, or verify that a code change actually passes tests can easily outperform a slightly “smarter” closed model that has to make remote API calls for every move.

Eventually, the balance tips: it becomes impractical for an agent to rely on remote inference for every micro-action. Just as no one would tolerate a keyboard that required a network request per keystroke, users won’t accept agent workflows bottlenecked by latency. All devices will ship with local, open-source models that are “good enough” and the expectation will shift toward everything running locally. It’ll happen sooner than most people think.

8 replies

replied to abidlabs's post 6 months ago

Totally agree!

Doug PRO

AI & ML interests

Recent Activity

Organizations

dougeeai's activity