T5Laa-Large-WeightedLoss
This model is a fine-tuned version of on the HuggingFaceFW/fineweb sample-350BT dataset. It achieves the following results on the evaluation set:
- Perplexity: 186.1024
- Loss: 5.2263
- Accuracy: 0.0365
- Lookahead Perplexity: 1785.6071
- Lookahead Loss: 7.4875
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- training_steps: 524288
Training results
| Training Loss | Epoch | Step | Accuracy | Lookahead Loss | Lookahead Perplexity | Validation Loss | Perplexity |
|---|---|---|---|---|---|---|---|
| 6.6913 | 0.0095 | 5000 | 0.0282 | 39.3044 | 117399710280460224.0000 | 6.5784 | 719.3614 |
| 6.3853 | 0.0191 | 10000 | 0.0289 | 30.3654 | 15400915759075.514 | 6.2884 | 538.2917 |
| 6.2629 | 0.0286 | 15000 | 0.0305 | 26.5650 | 344358969082.5502 | 6.1319 | 460.3039 |
| 6.1226 | 0.0381 | 20000 | 0.0309 | 20.7767 | 1054904571.7535 | 6.0155 | 409.7168 |
| 5.9646 | 0.0477 | 25000 | 0.0322 | 12.7631 | 349095.7813 | 5.9123 | 369.5545 |
| 5.9014 | 0.0572 | 30000 | 0.0306 | 9.1890 | 9789.0147 | 5.9223 | 373.2783 |
| 5.9364 | 0.0668 | 35000 | 0.0326 | 8.8468 | 6952.2587 | 5.8365 | 342.5753 |
| 5.9101 | 0.0763 | 40000 | 0.0344 | 8.5723 | 5283.3060 | 5.8004 | 330.4170 |
| 5.8641 | 0.0858 | 45000 | 0.0340 | 8.8594 | 7040.4611 | 5.7838 | 325.0032 |
| 5.9081 | 0.0954 | 50000 | 0.0358 | 8.8988 | 7322.9299 | 5.7625 | 318.1431 |
| 5.9661 | 0.1049 | 55000 | 0.0356 | 8.6371 | 5636.9011 | 5.7804 | 323.8902 |
| 5.9456 | 0.1144 | 60000 | 0.0369 | 8.7998 | 6632.8089 | 5.8108 | 333.8912 |
| 5.9131 | 0.1240 | 65000 | 0.0382 | 8.9718 | 7877.6758 | 5.8246 | 338.5269 |
| 5.9408 | 0.1335 | 70000 | 0.0382 | 8.9709 | 7870.4415 | 5.7992 | 330.0488 |
| 6.0065 | 0.1431 | 75000 | 0.0394 | 8.8279 | 6821.6503 | 5.8472 | 346.2572 |
| 5.8967 | 0.1526 | 80000 | 0.0395 | 9.5302 | 13769.0647 | 5.8741 | 355.6943 |
| 6.1743 | 0.1621 | 85000 | 0.0402 | 8.8181 | 6755.5736 | 5.9579 | 386.7979 |
| 6.0193 | 0.1717 | 90000 | 0.0399 | 8.5546 | 5190.6226 | 5.8674 | 353.3374 |
| 6.0285 | 0.1812 | 95000 | 0.0404 | 8.4514 | 4681.4796 | 5.8678 | 353.4621 |
| 5.9659 | 0.1907 | 100000 | 0.0415 | 8.7269 | 6166.5850 | 5.8670 | 353.1936 |
| 6.0304 | 0.2003 | 105000 | 0.0412 | 8.4267 | 4567.2815 | 5.9103 | 368.8350 |
| 6.0583 | 0.2098 | 110000 | 0.0411 | 8.6948 | 5972.0452 | 5.8797 | 357.7156 |
| 6.053 | 0.2193 | 115000 | 0.0417 | 8.5532 | 5183.2960 | 5.9154 | 370.7001 |
| 6.1051 | 0.2289 | 120000 | 0.0426 | 8.5780 | 5313.2307 | 5.9279 | 375.3514 |
| 6.0722 | 0.2384 | 125000 | 0.0416 | 8.5247 | 5037.6345 | 5.9162 | 370.9970 |
| 6.0857 | 0.2480 | 130000 | 0.0427 | 8.3400 | 4187.9384 | 5.8883 | 360.7826 |
| 6.0764 | 0.2575 | 135000 | 0.0426 | 8.6062 | 5465.5602 | 5.9327 | 377.1547 |
| 6.0819 | 0.2670 | 140000 | 0.0417 | 8.7044 | 6029.5392 | 5.9483 | 383.0830 |
| 6.031 | 0.2766 | 145000 | 0.0420 | 8.5064 | 4946.1910 | 5.8796 | 357.6692 |
| 6.0952 | 0.2861 | 150000 | 0.0420 | 8.4238 | 4554.0414 | 5.8847 | 359.4969 |
| 6.0402 | 0.2956 | 155000 | 0.0417 | 8.2736 | 3918.9174 | 5.8895 | 361.2193 |
| 6.0109 | 0.3052 | 160000 | 0.0412 | 8.2933 | 3997.0211 | 5.8591 | 350.4191 |
| 6.0041 | 0.3147 | 165000 | 0.0419 | 7.8818 | 2648.6731 | 5.8305 | 340.5289 |
| 5.9871 | 0.3242 | 170000 | 0.0420 | 7.9843 | 2934.5865 | 5.8205 | 337.1288 |
| 5.9548 | 0.3338 | 175000 | 0.0412 | 8.1642 | 3512.9831 | 5.8359 | 342.3704 |
| 6.0324 | 0.3433 | 180000 | 0.0426 | 8.1206 | 3363.0849 | 5.7951 | 328.6756 |
| 5.9459 | 0.3529 | 185000 | 0.0421 | 7.8193 | 2488.1387 | 5.7750 | 322.1374 |
| 5.9077 | 0.3624 | 190000 | 0.0420 | 7.8562 | 2581.7259 | 5.7724 | 321.2927 |
| 5.983 | 0.3719 | 195000 | 0.0415 | 7.8718 | 2622.1947 | 5.7629 | 318.2699 |
| 5.9019 | 0.3815 | 200000 | 0.0404 | 8.0693 | 3195.0155 | 5.7540 | 315.4574 |
| 5.8664 | 0.3910 | 205000 | 0.0414 | 7.8615 | 2595.5338 | 5.7408 | 311.3156 |
| 5.9601 | 0.4005 | 210000 | 0.0420 | 7.9168 | 2742.9738 | 5.7238 | 306.0784 |
| 5.8749 | 0.4101 | 215000 | 0.0414 | 7.9843 | 2934.3893 | 5.7244 | 306.2388 |
| 5.8563 | 0.4196 | 220000 | 0.0415 | 7.8042 | 2450.8247 | 5.7055 | 300.5256 |
| 5.9083 | 0.4292 | 225000 | 0.0413 | 7.7531 | 2328.6963 | 5.6959 | 297.6468 |
| 5.8473 | 0.4387 | 230000 | 0.0415 | 7.7180 | 2248.4279 | 5.6771 | 292.0966 |
| 5.9021 | 0.4482 | 235000 | 0.0405 | 7.7686 | 2365.1709 | 5.7012 | 299.2285 |
| 5.7837 | 0.4578 | 240000 | 0.0410 | 7.7851 | 2404.3986 | 5.6569 | 286.2585 |
| 5.8015 | 0.4673 | 245000 | 0.0404 | 7.7014 | 2211.3498 | 5.6563 | 286.0834 |
| 5.7699 | 0.4768 | 250000 | 0.0419 | 7.6930 | 2193.0058 | 5.6397 | 281.3667 |
| 5.7673 | 0.4864 | 255000 | 0.0409 | 7.7949 | 2428.2159 | 5.6257 | 277.4645 |
| 5.7342 | 0.4959 | 260000 | 0.0409 | 7.7758 | 2382.1887 | 5.6136 | 274.1414 |
| 5.7296 | 0.5054 | 265000 | 0.0408 | 7.7695 | 2367.2356 | 5.6093 | 272.9645 |
| 5.7854 | 0.5150 | 270000 | 0.0418 | 7.7510 | 2323.8449 | 5.6001 | 270.4489 |
| 5.7778 | 0.5245 | 275000 | 0.0416 | 7.7293 | 2273.9757 | 5.5836 | 266.0243 |
| 5.7347 | 0.5341 | 280000 | 0.0405 | 7.7460 | 2312.1937 | 5.5753 | 263.8191 |
| 5.7254 | 0.5436 | 285000 | 0.0406 | 7.7327 | 2281.6569 | 5.5651 | 261.1643 |
| 5.7287 | 0.5531 | 290000 | 0.0405 | 7.6531 | 2107.0742 | 5.5603 | 259.8962 |
| 5.8322 | 0.5627 | 295000 | 0.0410 | 7.6627 | 2127.4718 | 5.5501 | 257.2545 |
| 5.6957 | 0.5722 | 300000 | 0.0406 | 7.6973 | 2202.4213 | 5.5512 | 257.5555 |
| 5.6885 | 0.5817 | 305000 | 0.0400 | 7.6782 | 2160.7936 | 5.5358 | 253.6155 |
| 5.6931 | 0.5913 | 310000 | 0.0413 | 7.6291 | 2057.2046 | 5.5372 | 253.9556 |
| 5.7024 | 0.6008 | 315000 | 0.0399 | 7.6305 | 2060.1648 | 5.5142 | 248.2015 |
| 5.6039 | 0.6104 | 320000 | 0.0401 | 7.6355 | 2070.4602 | 5.5001 | 244.7091 |
| 5.6658 | 0.6199 | 325000 | 0.0396 | 7.6232 | 2045.0868 | 5.4934 | 243.0818 |
| 5.6381 | 0.6294 | 330000 | 0.0398 | 7.5870 | 1972.3905 | 5.4854 | 241.1361 |
| 5.6829 | 0.6390 | 335000 | 0.0398 | 7.6267 | 2052.2820 | 5.4742 | 238.4706 |
| 5.6595 | 0.6485 | 340000 | 0.0404 | 7.5941 | 1986.4813 | 5.4733 | 238.2478 |
| 5.6537 | 0.6580 | 345000 | 0.0394 | 7.6398 | 2079.3792 | 5.4654 | 236.3743 |
| 5.6034 | 0.6676 | 350000 | 0.0399 | 7.5942 | 1986.5764 | 5.4511 | 233.0051 |
| 5.5748 | 0.6771 | 355000 | 0.0395 | 7.5792 | 1957.0270 | 5.4421 | 230.9297 |
| 5.5886 | 0.6866 | 360000 | 0.0386 | 7.6046 | 2007.4233 | 5.4313 | 228.4520 |
| 5.5795 | 0.6962 | 365000 | 0.0395 | 7.6059 | 2009.9626 | 5.4222 | 226.3869 |
| 5.6283 | 0.7057 | 370000 | 0.0387 | 7.5913 | 1980.9620 | 5.4149 | 224.7189 |
| 5.5595 | 0.7153 | 375000 | 0.0393 | 7.6170 | 2032.3993 | 5.4046 | 222.4215 |
| 5.5667 | 0.7248 | 380000 | 0.0390 | 7.6184 | 2035.3124 | 5.3961 | 220.5489 |
| 5.5508 | 0.7343 | 385000 | 0.0393 | 7.5623 | 1924.3152 | 5.3872 | 218.5816 |
| 5.5796 | 0.7439 | 390000 | 0.0391 | 7.5590 | 1918.0016 | 5.3797 | 216.9580 |
| 5.5285 | 0.7534 | 395000 | 0.0386 | 7.5661 | 1931.5695 | 5.3708 | 215.0273 |
| 5.5194 | 0.7629 | 400000 | 0.0385 | 7.5782 | 1955.1562 | 5.3634 | 213.4451 |
| 5.5849 | 0.7725 | 405000 | 0.0384 | 7.5532 | 1906.7725 | 5.3572 | 212.1205 |
| 5.48 | 0.7820 | 410000 | 210.5396 | 5.3497 | 0.0387 | 1909.0207 | 7.5543 |
| 5.5236 | 0.7915 | 415000 | 208.2445 | 5.3387 | 0.0379 | 1899.2312 | 7.5492 |
| 5.4733 | 0.8011 | 420000 | 206.7636 | 5.3316 | 0.0382 | 1868.5760 | 7.5329 |
| 5.4857 | 0.8106 | 425000 | 205.4097 | 5.3250 | 0.0377 | 1881.2684 | 7.5397 |
| 5.4883 | 0.8202 | 430000 | 203.5777 | 5.3160 | 0.0380 | 1897.0067 | 7.5480 |
| 5.4659 | 0.8297 | 435000 | 201.8842 | 5.3077 | 0.0373 | 1846.9836 | 7.5213 |
| 5.4864 | 0.8392 | 440000 | 200.3570 | 5.3001 | 0.0378 | 1847.6301 | 7.5217 |
| 5.4697 | 0.8488 | 445000 | 199.5629 | 5.2961 | 0.0376 | 1841.5727 | 7.5184 |
| 5.359 | 0.8583 | 450000 | 198.6558 | 5.2916 | 0.0368 | 1839.8899 | 7.5175 |
| 5.5236 | 0.8678 | 455000 | 196.3660 | 5.2800 | 0.0375 | 1823.7893 | 7.5087 |
| 5.4811 | 0.8774 | 460000 | 195.5318 | 5.2757 | 0.0370 | 1820.4498 | 7.5068 |
| 5.4134 | 0.8869 | 465000 | 193.9948 | 5.2678 | 0.0370 | 1811.8785 | 7.5021 |
| 5.4297 | 0.8965 | 470000 | 192.8994 | 5.2622 | 0.0372 | 1819.7303 | 7.5064 |
| 5.4657 | 0.9060 | 475000 | 192.0886 | 5.2580 | 0.0372 | 1798.7231 | 7.4948 |
| 5.398 | 0.9155 | 480000 | 190.8767 | 5.2516 | 0.0368 | 1802.1644 | 7.4967 |
| 5.4084 | 0.9251 | 485000 | 190.0388 | 5.2472 | 0.0367 | 1803.4448 | 7.4975 |
| 5.4512 | 0.9346 | 490000 | 189.2571 | 5.2431 | 0.0367 | 1796.6301 | 7.4937 |
| 5.42 | 0.9441 | 495000 | 188.6367 | 5.2398 | 0.0366 | 1793.9652 | 7.4922 |
| 5.4164 | 0.9537 | 500000 | 187.8678 | 5.2357 | 0.0365 | 1792.0770 | 7.4911 |
| 5.3912 | 1.0095 | 505000 | 187.3297 | 5.2329 | 0.0365 | 1787.5085 | 7.4886 |
| 5.3829 | 1.0191 | 510000 | 186.7343 | 5.2297 | 0.0366 | 1785.1887 | 7.4873 |
| 5.4174 | 1.0286 | 515000 | 186.4854 | 5.2284 | 0.0365 | 1786.2572 | 7.4879 |
| 5.403 | 1.0381 | 520000 | 186.2719 | 5.2272 | 0.0365 | 1785.8340 | 7.4876 |
Framework versions
- Transformers 4.57.0.dev0
- Pytorch 2.8.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.1
- Downloads last month
- 11
Model tree for hrezaei/T5Laa-Large-WeightedLoss
Dataset used to train hrezaei/T5Laa-Large-WeightedLoss
Evaluation results
- Accuracy on HuggingFaceFW/fineweb sample-350BTself-reported0.037