Update README.md
Browse files
README.md
CHANGED
|
@@ -53,7 +53,7 @@ MiniCPM 4 is an extremely efficient edge-side large model that has undergone eff
|
|
| 53 |
|
| 54 |
## Usage
|
| 55 |
|
| 56 |
-
###
|
| 57 |
For now, you need to install the latest version of vLLM.
|
| 58 |
```
|
| 59 |
pip install -U vllm \
|
|
@@ -61,7 +61,7 @@ pip install -U vllm \
|
|
| 61 |
--extra-index-url https://wheels.vllm.ai/nightly
|
| 62 |
```
|
| 63 |
|
| 64 |
-
Then you can inference MiniCPM4-8B with vLLM
|
| 65 |
```python
|
| 66 |
from transformers import AutoTokenizer
|
| 67 |
from vllm import LLM, SamplingParams
|
|
@@ -77,7 +77,13 @@ llm = LLM(
|
|
| 77 |
trust_remote_code=True,
|
| 78 |
max_num_batched_tokens=32768,
|
| 79 |
dtype="bfloat16",
|
| 80 |
-
gpu_memory_utilization=0.8,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
)
|
| 82 |
sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)
|
| 83 |
|
|
|
|
| 53 |
|
| 54 |
## Usage
|
| 55 |
|
| 56 |
+
### Using Eagle Speculative Decoding with [vLLM](https://github.com/vllm-project/vllm)
|
| 57 |
For now, you need to install the latest version of vLLM.
|
| 58 |
```
|
| 59 |
pip install -U vllm \
|
|
|
|
| 61 |
--extra-index-url https://wheels.vllm.ai/nightly
|
| 62 |
```
|
| 63 |
|
| 64 |
+
Then you can use Eagle Speculative Decoding to inference MiniCPM4-8B with vLLM. Use `speculative_config` to set the draft model.
|
| 65 |
```python
|
| 66 |
from transformers import AutoTokenizer
|
| 67 |
from vllm import LLM, SamplingParams
|
|
|
|
| 77 |
trust_remote_code=True,
|
| 78 |
max_num_batched_tokens=32768,
|
| 79 |
dtype="bfloat16",
|
| 80 |
+
gpu_memory_utilization=0.8,
|
| 81 |
+
speculative_config={
|
| 82 |
+
"method": "eagle",
|
| 83 |
+
"model": "openbmb/MiniCPM4-8B-Eagle-vLLM",
|
| 84 |
+
"num_speculative_tokens": 2,
|
| 85 |
+
"max_model_len": 32768,
|
| 86 |
+
},
|
| 87 |
)
|
| 88 |
sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)
|
| 89 |
|