Lemon-03 commited on
Commit
f53ae76
Β·
verified Β·
1 Parent(s): 4caea57

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -107
README.md CHANGED
@@ -1,31 +1,18 @@
1
  ---
2
 
3
  datasets:
4
-
5
  - lerobot/pusht
6
-
7
  library_name: lerobot
8
-
9
  license: apache-2.0
10
-
11
  model_name: diffusion
12
-
13
  pipeline_tag: robotics
14
-
15
  tags:
16
-
17
  - lerobot
18
-
19
  - robotics
20
-
21
  - diffusion
22
-
23
  - pusht
24
-
25
  - imitation-learning
26
-
27
  - benchmark
28
-
29
  ---
30
 
31
 
@@ -45,189 +32,96 @@ tags:
45
 
46
 
47
  > **Summary:** This model demonstrates the capabilities of **Diffusion Policy** on the precision-demanding **Push-T** task. It was trained using the [LeRobot](https://github.com/huggingface/lerobot) framework as part of a thesis research project benchmarking Imitation Learning algorithms.
48
-
49
-
50
-
51
  - **🧩 Task**: Push-T (Simulated)
52
-
53
  - **🧠 Algorithm**: [Diffusion Policy](https://huggingface.co/papers/2303.04137) (DDPM)
54
-
55
  - **πŸ”„ Training Steps**: 200,000 (Fine-tuned via Resume)
56
-
57
  - **πŸŽ“ Author**: Graduate Student, **UESTC** (University of Electronic Science and Technology of China)
58
-
59
-
60
-
61
  ---
62
 
63
 
64
 
65
  ## πŸ”¬ Benchmark Results (vs ACT)
66
 
67
-
68
-
69
  Compared to the ACT baseline (which achieved **0%** success rate in our controlled experiments), this Diffusion Policy model demonstrates significantly better control precision and trajectory stability.
70
 
71
-
72
-
73
  ### πŸ“Š Evaluation Metrics (50 Episodes)
74
 
75
-
76
-
77
  | Metric | Value | Comparison to ACT Baseline | Status |
78
-
79
  | :--- | :---: | :--- | :---: |
80
-
81
  | **Success Rate** | **14.0%** | **Significant Improvement** (ACT: 0%) | πŸ† |
82
-
83
  | **Avg Max Reward** | **0.81** | **+58% Higher Precision** (ACT: ~0.51) | πŸ“ˆ |
84
-
85
  | **Avg Sum Reward** | **130.46** | **+147% More Stable** (ACT: ~52.7) | βœ… |
86
 
87
-
88
-
89
  > **Note:** The Push-T environment requires **>95% target coverage** for success. An average max reward of `0.81` indicates the policy consistently moves the block very close to the target position, proving strong manipulation capabilities despite the strict success threshold.
90
-
91
-
92
-
93
  ---
94
 
95
-
96
-
97
  ## βš™οΈ Model Details
98
 
99
-
100
-
101
  | Parameter | Description |
102
-
103
  | :--- | :--- |
104
-
105
  | **Architecture** | ResNet18 (Vision Backbone) + U-Net (Diffusion Head) |
106
-
107
  | **Prediction Horizon** | 16 steps |
108
-
109
  | **Observation History** | 2 steps |
110
-
111
  | **Action Steps** | 8 steps |
112
 
113
-
114
-
115
  - **Training Strategy**:
116
-
117
  - Phase 1: Initial training (100,000 steps) -> Model: `Lemon-03/DP_PushT_test`
118
-
119
  - Phase 2: Resume/Fine-tuning (+100,000 steps) -> Model: `Lemon-03/DP_PushT_test_Resume`
120
-
121
  - **Total**: 200,000 steps
122
-
123
-
124
-
125
  ---
126
 
127
-
128
-
129
  ## πŸ”§ Training Configuration (Reference)
130
 
131
-
132
-
133
  For reproducibility, here are the key parameters used during the training session:
134
 
135
-
136
-
137
  - **Batch Size**: 64
138
-
139
  - **Optimizer**: AdamW (`lr=1e-4`)
140
-
141
  - **Scheduler**: Cosine with warmup
142
-
143
  - **Vision**: ResNet18 with random crop (84x84)
144
-
145
  - **Precision**: Mixed Precision (AMP) enabled
146
 
147
-
148
-
149
  #### Original Training Command (My Resume Mode)
150
 
151
-
152
-
153
  ```bash
154
-
155
  python -m lerobot.scripts.lerobot_train \
156
-
157
  --policy.type diffusion \
158
-
159
  --env.type pusht \
160
-
161
  --dataset.repo_id lerobot/pusht \
162
-
163
  --wandb.enable true \
164
-
165
  --eval.batch_size 8 \
166
-
167
  --job_name DP_PushT_Resume \
168
-
169
  --policy.repo_id Lemon-03/DP_PushT_test_Resume \
170
-
171
  --policy.pretrained_path outputs/train/2025-12-02/14-33-35_DP_PushT/checkpoints/last/pretrained_model \
172
-
173
  --steps 100000
174
-
175
  ```
176
 
177
  ---
178
 
179
-
180
-
181
  ## πŸš€ Evaluate (My Evaluation Mode)
182
 
183
-
184
-
185
  Run the following command in your terminal to evaluate the model for 50 episodes and save the visualization videos:
186
 
187
-
188
-
189
  ```bash
190
-
191
  python -m lerobot.scripts.lerobot_eval \
192
-
193
  --policy.type diffusion \
194
-
195
  --policy.pretrained_path outputs/train/2025-12-04/14-47-37_DP_PushT_Resume/checkpoints/last/pretrained_model \
196
-
197
  --eval.n_episodes 50 \
198
-
199
  --eval.batch_size 10 \
200
-
201
  --env.type pusht \
202
-
203
  --env.task PushT-v0
204
-
205
  ```
206
 
207
-
208
-
209
  To evaluate this model locally, run the following command:
210
 
211
-
212
-
213
  ```bash
214
-
215
  python -m lerobot.scripts.lerobot_eval \
216
-
217
  --policy.type diffusion \
218
-
219
  --policy.pretrained_path Lemon-03/DP_PushT_test_Resume \
220
-
221
  --eval.n_episodes 50 \
222
-
223
  --eval.batch_size 10 \
224
-
225
  --env.type pusht \
226
-
227
  --env.task PushT-v0
228
-
229
  ```
230
 
231
-
232
-
233
  -----
 
1
  ---
2
 
3
  datasets:
 
4
  - lerobot/pusht
 
5
  library_name: lerobot
 
6
  license: apache-2.0
 
7
  model_name: diffusion
 
8
  pipeline_tag: robotics
 
9
  tags:
 
10
  - lerobot
 
11
  - robotics
 
12
  - diffusion
 
13
  - pusht
 
14
  - imitation-learning
 
15
  - benchmark
 
16
  ---
17
 
18
 
 
32
 
33
 
34
  > **Summary:** This model demonstrates the capabilities of **Diffusion Policy** on the precision-demanding **Push-T** task. It was trained using the [LeRobot](https://github.com/huggingface/lerobot) framework as part of a thesis research project benchmarking Imitation Learning algorithms.
 
 
 
35
  - **🧩 Task**: Push-T (Simulated)
 
36
  - **🧠 Algorithm**: [Diffusion Policy](https://huggingface.co/papers/2303.04137) (DDPM)
 
37
  - **πŸ”„ Training Steps**: 200,000 (Fine-tuned via Resume)
 
38
  - **πŸŽ“ Author**: Graduate Student, **UESTC** (University of Electronic Science and Technology of China)
 
 
 
39
  ---
40
 
41
 
42
 
43
  ## πŸ”¬ Benchmark Results (vs ACT)
44
 
 
 
45
  Compared to the ACT baseline (which achieved **0%** success rate in our controlled experiments), this Diffusion Policy model demonstrates significantly better control precision and trajectory stability.
46
 
 
 
47
  ### πŸ“Š Evaluation Metrics (50 Episodes)
48
 
 
 
49
  | Metric | Value | Comparison to ACT Baseline | Status |
 
50
  | :--- | :---: | :--- | :---: |
 
51
  | **Success Rate** | **14.0%** | **Significant Improvement** (ACT: 0%) | πŸ† |
 
52
  | **Avg Max Reward** | **0.81** | **+58% Higher Precision** (ACT: ~0.51) | πŸ“ˆ |
 
53
  | **Avg Sum Reward** | **130.46** | **+147% More Stable** (ACT: ~52.7) | βœ… |
54
 
 
 
55
  > **Note:** The Push-T environment requires **>95% target coverage** for success. An average max reward of `0.81` indicates the policy consistently moves the block very close to the target position, proving strong manipulation capabilities despite the strict success threshold.
56
+
 
 
57
  ---
58
 
 
 
59
  ## βš™οΈ Model Details
60
 
 
 
61
  | Parameter | Description |
 
62
  | :--- | :--- |
 
63
  | **Architecture** | ResNet18 (Vision Backbone) + U-Net (Diffusion Head) |
 
64
  | **Prediction Horizon** | 16 steps |
 
65
  | **Observation History** | 2 steps |
 
66
  | **Action Steps** | 8 steps |
67
 
 
 
68
  - **Training Strategy**:
 
69
  - Phase 1: Initial training (100,000 steps) -> Model: `Lemon-03/DP_PushT_test`
 
70
  - Phase 2: Resume/Fine-tuning (+100,000 steps) -> Model: `Lemon-03/DP_PushT_test_Resume`
 
71
  - **Total**: 200,000 steps
 
 
 
72
  ---
73
 
 
 
74
  ## πŸ”§ Training Configuration (Reference)
75
 
 
 
76
  For reproducibility, here are the key parameters used during the training session:
77
 
 
 
78
  - **Batch Size**: 64
 
79
  - **Optimizer**: AdamW (`lr=1e-4`)
 
80
  - **Scheduler**: Cosine with warmup
 
81
  - **Vision**: ResNet18 with random crop (84x84)
 
82
  - **Precision**: Mixed Precision (AMP) enabled
83
 
 
 
84
  #### Original Training Command (My Resume Mode)
85
 
 
 
86
  ```bash
 
87
  python -m lerobot.scripts.lerobot_train \
 
88
  --policy.type diffusion \
 
89
  --env.type pusht \
 
90
  --dataset.repo_id lerobot/pusht \
 
91
  --wandb.enable true \
 
92
  --eval.batch_size 8 \
 
93
  --job_name DP_PushT_Resume \
 
94
  --policy.repo_id Lemon-03/DP_PushT_test_Resume \
 
95
  --policy.pretrained_path outputs/train/2025-12-02/14-33-35_DP_PushT/checkpoints/last/pretrained_model \
 
96
  --steps 100000
 
97
  ```
98
 
99
  ---
100
 
 
 
101
  ## πŸš€ Evaluate (My Evaluation Mode)
102
 
 
 
103
  Run the following command in your terminal to evaluate the model for 50 episodes and save the visualization videos:
104
 
 
 
105
  ```bash
 
106
  python -m lerobot.scripts.lerobot_eval \
 
107
  --policy.type diffusion \
 
108
  --policy.pretrained_path outputs/train/2025-12-04/14-47-37_DP_PushT_Resume/checkpoints/last/pretrained_model \
 
109
  --eval.n_episodes 50 \
 
110
  --eval.batch_size 10 \
 
111
  --env.type pusht \
 
112
  --env.task PushT-v0
 
113
  ```
114
 
 
 
115
  To evaluate this model locally, run the following command:
116
 
 
 
117
  ```bash
 
118
  python -m lerobot.scripts.lerobot_eval \
 
119
  --policy.type diffusion \
 
120
  --policy.pretrained_path Lemon-03/DP_PushT_test_Resume \
 
121
  --eval.n_episodes 50 \
 
122
  --eval.batch_size 10 \
 
123
  --env.type pusht \
 
124
  --env.task PushT-v0
 
125
  ```
126
 
 
 
127
  -----