Commit
·
b179c47
1
Parent(s):
905c83b
Update README.md
Browse files
README.md
CHANGED
|
@@ -134,7 +134,7 @@ inference:
|
|
| 134 |
num_beams: 4
|
| 135 |
---
|
| 136 |
|
| 137 |
-
# Multi-purpose Summarizer (Fine-tuned google/flan-t5-xl
|
| 138 |
|
| 139 |
<a href="https://colab.research.google.com/gist/pszemraj/3eba944ddc9fc9a4a1bfb21e83b57620/summarization-token-batching.ipynb">
|
| 140 |
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
|
@@ -202,60 +202,28 @@ If having computing constraints, try the base version [`pszemraj/led-base-book-s
|
|
| 202 |
|
| 203 |
## Training procedure
|
| 204 |
|
| 205 |
-
- Training
|
| 206 |
-
-
|
| 207 |
-
|
| 208 |
### Training hyperparameters
|
| 209 |
|
| 210 |
-
#### Initial Three Epochs
|
| 211 |
|
| 212 |
The following hyperparameters were used during training:
|
| 213 |
-
- learning_rate:
|
| 214 |
-
- train_batch_size:
|
| 215 |
-
- eval_batch_size:
|
| 216 |
- seed: 42
|
| 217 |
- distributed_type: multi-GPU
|
| 218 |
-
- gradient_accumulation_steps:
|
| 219 |
-
-
|
| 220 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 221 |
- lr_scheduler_type: linear
|
| 222 |
-
-
|
| 223 |
-
|
| 224 |
-
#### In-between Epochs
|
| 225 |
-
|
| 226 |
-
Unfortunately, don't have all records on-hand for middle epochs; the following should be representative:
|
| 227 |
-
|
| 228 |
-
- learning_rate: 4e-05
|
| 229 |
-
- train_batch_size: 2
|
| 230 |
-
- eval_batch_size: 2
|
| 231 |
-
- seed: 42
|
| 232 |
-
- distributed_type: multi-GPU
|
| 233 |
-
- gradient_accumulation_steps: 16
|
| 234 |
-
- total_train_batch_size: 32
|
| 235 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 236 |
-
- lr_scheduler_type: cosine
|
| 237 |
-
- lr_scheduler_warmup_ratio: 0.05
|
| 238 |
-
- num_epochs: 6 (in addition to prior model)
|
| 239 |
-
|
| 240 |
-
#### Final Two Epochs
|
| 241 |
-
|
| 242 |
-
The following hyperparameters were used during training:
|
| 243 |
-
- learning_rate: 2e-05
|
| 244 |
-
- train_batch_size: 1
|
| 245 |
-
- eval_batch_size: 1
|
| 246 |
-
- seed: 42
|
| 247 |
-
- distributed_type: multi-GPU
|
| 248 |
-
- gradient_accumulation_steps: 16
|
| 249 |
-
- total_train_batch_size: 16
|
| 250 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 251 |
-
- lr_scheduler_type: cosine
|
| 252 |
-
- lr_scheduler_warmup_ratio: 0.03
|
| 253 |
-
- num_epochs: 2 (in addition to prior model)
|
| 254 |
|
| 255 |
|
| 256 |
### Framework versions
|
| 257 |
|
| 258 |
-
- Transformers 4.
|
| 259 |
-
- Pytorch 1.
|
| 260 |
-
-
|
| 261 |
-
-
|
|
|
|
| 134 |
num_beams: 4
|
| 135 |
---
|
| 136 |
|
| 137 |
+
# Multi-purpose Summarizer (Fine-tuned 3B google/flan-t5-xl on several Summarization datasets)
|
| 138 |
|
| 139 |
<a href="https://colab.research.google.com/gist/pszemraj/3eba944ddc9fc9a4a1bfb21e83b57620/summarization-token-batching.ipynb">
|
| 140 |
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
|
|
|
| 202 |
|
| 203 |
## Training procedure
|
| 204 |
|
| 205 |
+
- Training was done in BF16, deepspeed stage 2 for 6 epochs with ROUGE2 monitored on the validation set.
|
| 206 |
+
-
|
|
|
|
| 207 |
### Training hyperparameters
|
| 208 |
|
|
|
|
| 209 |
|
| 210 |
The following hyperparameters were used during training:
|
| 211 |
+
- learning_rate: 3e-05
|
| 212 |
+
- train_batch_size: 5
|
| 213 |
+
- eval_batch_size: 8
|
| 214 |
- seed: 42
|
| 215 |
- distributed_type: multi-GPU
|
| 216 |
+
- gradient_accumulation_steps: 2
|
| 217 |
+
- effective_train_batch_size: 80
|
| 218 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 219 |
- lr_scheduler_type: linear
|
| 220 |
+
- warmup_steps: 2000
|
| 221 |
+
- num_epochs: 10
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 222 |
|
| 223 |
|
| 224 |
### Framework versions
|
| 225 |
|
| 226 |
+
- Transformers 4.24.0
|
| 227 |
+
- Pytorch 1.9.1+cu111
|
| 228 |
+
- Deepspeed 0.7.4
|
| 229 |
+
- Pytorch-lightning 1.8.1
|