hyx21 commited on
Commit
1d5249b
·
verified ·
1 Parent(s): dd100b9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -3
README.md CHANGED
@@ -53,7 +53,7 @@ MiniCPM 4 is an extremely efficient edge-side large model that has undergone eff
53
 
54
  ## Usage
55
 
56
- ### Inference with [vLLM](https://github.com/vllm-project/vllm)
57
  For now, you need to install the latest version of vLLM.
58
  ```
59
  pip install -U vllm \
@@ -61,7 +61,7 @@ pip install -U vllm \
61
  --extra-index-url https://wheels.vllm.ai/nightly
62
  ```
63
 
64
- Then you can inference MiniCPM4-8B with vLLM:
65
  ```python
66
  from transformers import AutoTokenizer
67
  from vllm import LLM, SamplingParams
@@ -77,7 +77,13 @@ llm = LLM(
77
  trust_remote_code=True,
78
  max_num_batched_tokens=32768,
79
  dtype="bfloat16",
80
- gpu_memory_utilization=0.8,
 
 
 
 
 
 
81
  )
82
  sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)
83
 
 
53
 
54
  ## Usage
55
 
56
+ ### Using Eagle Speculative Decoding with [vLLM](https://github.com/vllm-project/vllm)
57
  For now, you need to install the latest version of vLLM.
58
  ```
59
  pip install -U vllm \
 
61
  --extra-index-url https://wheels.vllm.ai/nightly
62
  ```
63
 
64
+ Then you can use Eagle Speculative Decoding to inference MiniCPM4-8B with vLLM. Use `speculative_config` to set the draft model.
65
  ```python
66
  from transformers import AutoTokenizer
67
  from vllm import LLM, SamplingParams
 
77
  trust_remote_code=True,
78
  max_num_batched_tokens=32768,
79
  dtype="bfloat16",
80
+ gpu_memory_utilization=0.8,
81
+ speculative_config={
82
+ "method": "eagle",
83
+ "model": "openbmb/MiniCPM4-8B-Eagle-vLLM",
84
+ "num_speculative_tokens": 2,
85
+ "max_model_len": 32768,
86
+ },
87
  )
88
  sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)
89