T5Lae-Large-WeightedLoss
This model is a fine-tuned version of on the HuggingFaceFW/fineweb sample-350BT dataset. It achieves the following results on the evaluation set:
- Perplexity: 56.6106
- Loss: 4.0362
- Accuracy: 0.0260
- Lookahead Perplexity: 672.4668
- Lookahead Loss: 6.5110
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- training_steps: 524288
Training results
| Training Loss | Epoch | Step | Accuracy | Lookahead Loss | Lookahead Perplexity | Validation Loss | Perplexity |
|---|---|---|---|---|---|---|---|
| 6.4709 | 0.0095 | 5000 | 0.0300 | 7.3655 | 1580.4451 | 6.4112 | 608.6332 |
| 6.1634 | 0.0191 | 10000 | 0.0288 | 7.2945 | 1472.1925 | 6.1387 | 463.4569 |
| 6.0273 | 0.0286 | 15000 | 0.0312 | 7.2446 | 1400.5552 | 5.9474 | 382.7631 |
| 5.89 | 0.0381 | 20000 | 0.0314 | 7.2101 | 1353.0672 | 5.8083 | 333.0537 |
| 5.7665 | 0.0477 | 25000 | 0.0330 | 7.1880 | 1323.4656 | 5.7197 | 304.8037 |
| 5.6891 | 0.0572 | 30000 | 0.0311 | 7.2027 | 1343.0104 | 5.6905 | 296.0397 |
| 5.6926 | 0.0668 | 35000 | 0.0323 | 7.1358 | 1256.1099 | 5.5811 | 265.3738 |
| 5.6076 | 0.0763 | 40000 | 0.0324 | 7.1160 | 1231.5257 | 5.5175 | 249.0057 |
| 5.5375 | 0.0858 | 45000 | 0.0315 | 7.1181 | 1234.1559 | 5.4709 | 237.6680 |
| 5.5712 | 0.0954 | 50000 | 0.0323 | 7.0829 | 1191.4170 | 5.4163 | 225.0337 |
| 5.5664 | 0.1049 | 55000 | 0.0305 | 7.0561 | 1159.9263 | 5.3879 | 218.7341 |
| 5.4285 | 0.1144 | 60000 | 0.0319 | 7.0428 | 1144.5391 | 5.3540 | 211.4576 |
| 5.3761 | 0.1240 | 65000 | 0.0324 | 7.0227 | 1121.8277 | 5.2855 | 197.4554 |
| 5.3878 | 0.1335 | 70000 | 0.0317 | 7.0129 | 1110.8984 | 5.2550 | 191.5256 |
| 5.3311 | 0.1431 | 75000 | 0.0311 | 6.9915 | 1087.3640 | 5.2316 | 187.0996 |
| 5.192 | 0.1526 | 80000 | 0.0304 | 6.9784 | 1073.2082 | 5.1762 | 177.0060 |
| 5.3575 | 0.1621 | 85000 | 0.0307 | 6.9608 | 1054.4677 | 5.1340 | 169.6862 |
| 5.2409 | 0.1717 | 90000 | 0.0296 | 6.9574 | 1050.9358 | 5.1046 | 164.7766 |
| 5.2178 | 0.1812 | 95000 | 0.0299 | 6.9414 | 1034.2007 | 5.0751 | 159.9887 |
| 5.1463 | 0.1907 | 100000 | 0.0297 | 6.9335 | 1026.1214 | 5.0439 | 155.0754 |
| 5.1763 | 0.2003 | 105000 | 0.0295 | 6.9240 | 1016.3639 | 5.0195 | 151.3366 |
| 5.1507 | 0.2098 | 110000 | 0.0289 | 6.9072 | 999.3971 | 4.9863 | 146.3885 |
| 5.0905 | 0.2193 | 115000 | 0.0291 | 6.9061 | 998.3528 | 4.9651 | 143.3287 |
| 5.133 | 0.2289 | 120000 | 0.0289 | 6.9054 | 997.6628 | 4.9342 | 138.9648 |
| 5.0768 | 0.2384 | 125000 | 0.0292 | 6.8943 | 986.6068 | 4.9211 | 137.1481 |
| 5.0645 | 0.2480 | 130000 | 0.0288 | 6.8848 | 977.3311 | 4.8912 | 133.1113 |
| 5.0714 | 0.2575 | 135000 | 0.0291 | 6.8772 | 969.9084 | 4.8770 | 131.2382 |
| 5.0038 | 0.2670 | 140000 | 0.0283 | 6.8762 | 968.9717 | 4.8573 | 128.6715 |
| 4.9626 | 0.2766 | 145000 | 0.0283 | 6.8654 | 958.5580 | 4.8280 | 124.9669 |
| 5.0443 | 0.2861 | 150000 | 0.0285 | 6.8569 | 950.3789 | 4.8000 | 121.5045 |
| 4.9348 | 0.2956 | 155000 | 0.0284 | 6.8488 | 942.7645 | 4.7827 | 119.4311 |
| 4.8871 | 0.3052 | 160000 | 0.0279 | 6.8374 | 932.0763 | 4.7582 | 116.5389 |
| 4.9312 | 0.3147 | 165000 | 0.0292 | 6.8212 | 917.0747 | 4.7352 | 113.8875 |
| 4.9013 | 0.3242 | 170000 | 0.0285 | 6.8190 | 915.0439 | 4.7225 | 112.4514 |
| 4.8217 | 0.3338 | 175000 | 0.0279 | 6.8120 | 908.7160 | 4.7054 | 110.5448 |
| 4.9256 | 0.3433 | 180000 | 0.0290 | 6.8176 | 913.8313 | 4.6843 | 108.2374 |
| 4.8326 | 0.3529 | 185000 | 0.0284 | 6.7713 | 872.4788 | 4.6523 | 104.8223 |
| 4.7726 | 0.3624 | 190000 | 0.0287 | 6.7743 | 875.0953 | 4.6364 | 103.1737 |
| 4.8655 | 0.3719 | 195000 | 0.0281 | 6.7721 | 873.1190 | 4.6174 | 101.2283 |
| 4.7666 | 0.3815 | 200000 | 0.0271 | 6.7623 | 864.6278 | 4.5986 | 99.3420 |
| 4.7235 | 0.3910 | 205000 | 0.0280 | 6.7533 | 856.8730 | 4.5802 | 97.5293 |
| 4.8219 | 0.4005 | 210000 | 0.0278 | 6.7477 | 852.1302 | 4.5626 | 95.8318 |
| 4.7113 | 0.4101 | 215000 | 0.0278 | 6.7336 | 840.1735 | 4.5455 | 94.2049 |
| 4.7011 | 0.4196 | 220000 | 0.0276 | 6.7365 | 842.5797 | 4.5303 | 92.7869 |
| 4.7134 | 0.4292 | 225000 | 0.0275 | 6.7356 | 841.8266 | 4.5222 | 92.0394 |
| 4.6916 | 0.4387 | 230000 | 0.0277 | 6.7191 | 828.0459 | 4.4974 | 89.7795 |
| 4.6694 | 0.4482 | 235000 | 0.0267 | 6.7319 | 838.7439 | 4.5096 | 90.8881 |
| 4.6188 | 0.4578 | 240000 | 0.0274 | 6.7032 | 814.9789 | 4.4625 | 86.7048 |
| 4.6234 | 0.4673 | 245000 | 0.0272 | 6.7044 | 815.9833 | 4.4569 | 86.2231 |
| 4.5967 | 0.4768 | 250000 | 0.0277 | 6.6989 | 811.5360 | 4.4373 | 84.5446 |
| 4.5814 | 0.4864 | 255000 | 0.0271 | 6.6857 | 800.8966 | 4.4201 | 83.1059 |
| 4.5838 | 0.4959 | 260000 | 0.0276 | 6.6818 | 797.7229 | 4.4135 | 82.5570 |
| 4.575 | 0.5054 | 265000 | 0.0268 | 6.6774 | 794.2122 | 4.3935 | 80.9230 |
| 4.5786 | 0.5150 | 270000 | 0.0272 | 6.6718 | 789.7848 | 4.3830 | 80.0755 |
| 4.5582 | 0.5245 | 275000 | 0.0270 | 6.6660 | 785.2693 | 4.3724 | 79.2297 |
| 4.5571 | 0.5341 | 280000 | 0.0275 | 6.6558 | 777.2855 | 4.3549 | 77.8625 |
| 4.5238 | 0.5436 | 285000 | 0.0276 | 6.6491 | 772.0528 | 4.3424 | 76.8926 |
| 4.4993 | 0.5531 | 290000 | 0.0272 | 6.6507 | 773.2902 | 4.3351 | 76.3315 |
| 4.6375 | 0.5627 | 295000 | 0.0273 | 6.6448 | 768.7754 | 4.3190 | 75.1133 |
| 4.4756 | 0.5722 | 300000 | 0.0273 | 6.6445 | 768.5136 | 4.3120 | 74.5917 |
| 4.4725 | 0.5817 | 305000 | 0.0265 | 6.6336 | 760.1904 | 4.2939 | 73.2534 |
| 4.4974 | 0.5913 | 310000 | 0.0276 | 6.6359 | 761.9827 | 4.3011 | 73.7798 |
| 4.4651 | 0.6008 | 315000 | 0.0263 | 6.6248 | 753.5762 | 4.2764 | 71.9778 |
| 4.4339 | 0.6104 | 320000 | 0.0270 | 6.6184 | 748.7501 | 4.2654 | 71.1964 |
| 4.456 | 0.6199 | 325000 | 0.0267 | 6.6181 | 748.5140 | 4.2558 | 70.5135 |
| 4.4337 | 0.6294 | 330000 | 0.0263 | 6.6098 | 742.3199 | 4.2479 | 69.9577 |
| 4.4737 | 0.6390 | 335000 | 0.0264 | 6.6077 | 740.8061 | 4.2347 | 69.0383 |
| 4.4369 | 0.6485 | 340000 | 0.0271 | 6.6113 | 743.4451 | 4.2307 | 68.7642 |
| 4.4086 | 0.6580 | 345000 | 0.0265 | 6.5994 | 734.6393 | 4.2213 | 68.1242 |
| 4.3934 | 0.6676 | 350000 | 0.0269 | 6.5908 | 728.3871 | 4.2089 | 67.2815 |
| 4.361 | 0.6771 | 355000 | 0.0264 | 6.5858 | 724.7172 | 4.2004 | 66.7156 |
| 4.3219 | 0.6866 | 360000 | 0.0261 | 6.5836 | 723.1735 | 4.1895 | 65.9867 |
| 4.3989 | 0.6962 | 365000 | 0.0264 | 6.5908 | 728.3600 | 4.1812 | 65.4464 |
| 4.3539 | 0.7057 | 370000 | 0.0262 | 6.5788 | 719.6894 | 4.1751 | 65.0468 |
| 4.3217 | 0.7153 | 375000 | 0.0263 | 6.5773 | 718.6223 | 4.1679 | 64.5772 |
| 4.3246 | 0.7248 | 380000 | 0.0263 | 6.5662 | 710.6689 | 4.1580 | 63.9416 |
| 4.3571 | 0.7343 | 385000 | 0.0265 | 6.5718 | 714.6888 | 4.1507 | 63.4802 |
| 4.3611 | 0.7439 | 390000 | 0.0262 | 6.5647 | 709.5887 | 4.1429 | 62.9843 |
| 4.297 | 0.7534 | 395000 | 0.0259 | 6.5603 | 706.5064 | 4.1342 | 62.4369 |
| 4.2687 | 0.7629 | 400000 | 0.0259 | 6.5615 | 707.3522 | 4.1294 | 62.1408 |
| 4.3698 | 0.7725 | 405000 | 0.0264 | 6.5555 | 703.1012 | 4.1234 | 61.7694 |
| 4.2747 | 0.7820 | 410000 | 0.0263 | 6.5541 | 702.1163 | 4.1165 | 61.3433 |
| 4.2801 | 0.7915 | 415000 | 0.0261 | 6.5474 | 697.4035 | 4.1110 | 61.0067 |
| 4.2964 | 0.8011 | 420000 | 0.0262 | 6.5474 | 697.4221 | 4.1041 | 60.5896 |
| 4.2828 | 0.8106 | 425000 | 0.0262 | 6.5472 | 697.3089 | 4.0988 | 60.2670 |
| 4.2847 | 0.8202 | 430000 | 0.0260 | 6.5395 | 691.9503 | 4.0927 | 59.9032 |
| 4.2484 | 0.8297 | 435000 | 0.0261 | 6.5386 | 691.3068 | 4.0874 | 59.5869 |
| 4.2779 | 0.8392 | 440000 | 0.0260 | 6.5351 | 688.9287 | 4.0834 | 59.3466 |
| 4.2502 | 0.8488 | 445000 | 0.0263 | 6.5322 | 686.9031 | 4.0770 | 58.9670 |
| 4.2147 | 0.8583 | 450000 | 0.0261 | 6.5313 | 686.3179 | 4.0766 | 58.9433 |
| 4.3302 | 0.8678 | 455000 | 0.0263 | 6.5291 | 684.7628 | 4.0692 | 58.5075 |
| 4.2672 | 0.8774 | 460000 | 0.0262 | 6.5272 | 683.4909 | 4.0660 | 58.3241 |
| 4.2339 | 0.8869 | 465000 | 0.0262 | 6.5267 | 683.1493 | 4.0636 | 58.1843 |
| 4.2653 | 0.8965 | 470000 | 0.0263 | 6.5230 | 680.6066 | 4.0580 | 57.8584 |
| 4.2606 | 0.9060 | 475000 | 0.0262 | 6.5197 | 678.3849 | 4.0556 | 57.7179 |
| 4.1935 | 0.9155 | 480000 | 0.0261 | 6.5203 | 678.8143 | 4.0534 | 57.5926 |
| 4.2571 | 0.9251 | 485000 | 0.0261 | 6.5182 | 677.3831 | 4.0484 | 57.3040 |
| 4.2549 | 0.9346 | 490000 | 0.0258 | 6.5181 | 677.3023 | 4.0472 | 57.2349 |
| 4.2247 | 0.9441 | 495000 | 57.0751 | 4.0444 | 0.0259 | 676.7443 | 6.5173 |
| 4.206 | 0.9537 | 500000 | 56.9853 | 4.0428 | 0.0257 | 675.6204 | 6.5156 |
| 4.1828 | 1.0095 | 505000 | 56.8521 | 4.0405 | 0.0259 | 673.7910 | 6.5129 |
| 4.2161 | 1.0191 | 510000 | 56.7178 | 4.0381 | 0.0260 | 672.9013 | 6.5116 |
| 4.244 | 1.0286 | 515000 | 56.6553 | 4.0370 | 0.0261 | 673.1997 | 6.5120 |
| 4.2112 | 1.0381 | 520000 | 56.6355 | 4.0366 | 0.0261 | 672.8140 | 6.5115 |
Framework versions
- Transformers 4.57.0.dev0
- Pytorch 2.8.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.1
- Downloads last month
- 15
Model tree for hrezaei/T5Lae-Large-WeightedLoss
Dataset used to train hrezaei/T5Lae-Large-WeightedLoss
Evaluation results
- Accuracy on HuggingFaceFW/fineweb sample-350BTself-reported0.026