CADD-Base-7B

CADD-Base-7B is a masked diffusion language model for code generation, augmented with Continuously Augmented Discrete Diffusion (CADD) --- a continuous flow-matching signal that guides the discrete denoising process.

Key idea: At each diffusion step, a continuous embedding z_continuous is added to masked-token embeddings, following a linear flow-matching trajectory from noise to clean embeddings. This is orthogonal to the discrete unmasking strategy --- any MDM algorithm can be combined with CADD.

Usage

import torch
from transformers import AutoModel, AutoTokenizer

model_path = "apple/CADD-Base-7B"
model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = model.to("cuda").eval()

prompt = "def fibonacci(n):\n"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")

output = model.diffusion_generate(
    input_ids,
    max_new_tokens=512,
    steps=512,
    temperature=0.1,
    alg="entropy",
    alg_temp=0.0,
    use_cadd=True,
    cadd_sampling_mode="weighted",
)

print(tokenizer.decode(output[0], skip_special_tokens=True))

CADD Sampling Parameters

Parameter Type Default Description
use_cadd bool True Enable CADD continuous augmentation
cadd_sampling_mode str "argmax" How to estimate z_0 from logits: "weighted" or "argmax"
alg str "origin" Unmasking strategy: "entropy", "origin", "maskgit_plus", "topk_margin"
temperature float 1.0 Sampling temperature for token prediction
steps int 512 Number of diffusion steps

More details:

Citation

@article{zheng2025continuously,
  title={Continuously augmented discrete diffusion model for categorical generative modeling},
  author={Zheng, Huangjie and Gong, Shansan and Zhang, Ruixiang and Chen, Tianrong and Gu, Jiatao and Zhou, Mingyuan and Jaitly, Navdeep and Zhang, Yizhe},
  journal={arXiv preprint arXiv:2510.01329},
  year={2025}
}

Acknowledgment

To power this HuggingFace model release, we build upon and improve DiffuCoder, reusing Dream's modeling architecture and generation utils.

Downloads last month
130
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for apple/CADD-Base-7B

Base model

Qwen/Qwen2.5-7B
Finetuned
(340)
this model
Quantizations
2 models

Paper for apple/CADD-Base-7B