Daizee commited on
Commit
1eb80af
·
verified ·
1 Parent(s): 684a3c2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - mlx
4
+ - apple-silicon
5
+ - text-generation
6
+ - gemma3
7
+ library_name: mlx-lm
8
+ pipeline_tag: text-generation
9
+ base_model: Daizee/Gemma3-Callous-Calla-4B
10
+ ---
11
+
12
+ # Gemma3-Callous-Calla-4B — **MLX** builds (Apple Silicon)
13
+
14
+ This repo hosts **MLX-converted** variants of **Daizee/Gemma3-Callous-Calla-4B** for fast, local inference on Apple Silicon (M-series).
15
+ Tokenizer/config are included at the repo root. MLX weight folders live under `mlx/`.
16
+
17
+ > **Note on vocab padding:** For MLX compatibility, the tokenizer/embeddings were padded to the next multiple of 64 tokens.
18
+ > In this build: **262,208 tokens** (added 64 placeholder tokens named `<pad_ex_*>`).
19
+
20
+ ## Variants
21
+
22
+ | Path | Bits | Group Size | Notes |
23
+ |--------------|------|------------|------------------------------------|
24
+ | `mlx/g128/` | int4 | 128 | Smallest & fastest |
25
+ | `mlx/g64/` | int4 | 64 | Slightly larger, often steadier |
26
+ | `mlx/int8/` | 8 | — | Closest to fp16 quality (slower) |
27
+
28
+ ## Quickstart (MLX-LM)
29
+
30
+ ### Run from Hugging Face (no cloning needed)
31
+ ```bash
32
+ python -m mlx_lm.generate \
33
+ --model hf://Daizee/Gemma3-Callous-Calla-4B-mlx/mlx/g64 \
34
+ --prompt "Summarize the Bill of Rights for 7th graders in 4 bullet points." \
35
+ --max-tokens 180 --temp 0.3 --top-p 0.92