Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking
Paper
• 2602.21196 • Published
• 3
None defined yet.
hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.--max-model-len, --batch-size and --kv-cache-dtype arguments (à la vLLM) manually if preferred. hf: a faster, friendlier Hugging Face CLI ✨hf auth login easier to type and remember?transformers 🚀generate interface 🤓transformers with minimal effort. You'll also have access to all Hub features: a landing page for your creation, discussions, usage metrics, ... 🤓pip install -U huggingface_hub[hf_xet]from huggingface_hub import InferenceClient
client = InferenceClient(provider="fal-ai", bill_to="my-cool-company")
image = client.text_to_image(
"A majestic lion in a fantasy forest",
model="black-forest-labs/FLUX.1-schnell",
)
image.save("lion.png")