T5Lae-Large-WeightedLoss

This model is a fine-tuned version of on the HuggingFaceFW/fineweb sample-350BT dataset. It achieves the following results on the evaluation set:

  • Perplexity: 56.6106
  • Loss: 4.0362
  • Accuracy: 0.0260
  • Lookahead Perplexity: 672.4668
  • Lookahead Loss: 6.5110

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • training_steps: 524288

Training results

Training Loss Epoch Step Accuracy Lookahead Loss Lookahead Perplexity Validation Loss Perplexity
6.4709 0.0095 5000 0.0300 7.3655 1580.4451 6.4112 608.6332
6.1634 0.0191 10000 0.0288 7.2945 1472.1925 6.1387 463.4569
6.0273 0.0286 15000 0.0312 7.2446 1400.5552 5.9474 382.7631
5.89 0.0381 20000 0.0314 7.2101 1353.0672 5.8083 333.0537
5.7665 0.0477 25000 0.0330 7.1880 1323.4656 5.7197 304.8037
5.6891 0.0572 30000 0.0311 7.2027 1343.0104 5.6905 296.0397
5.6926 0.0668 35000 0.0323 7.1358 1256.1099 5.5811 265.3738
5.6076 0.0763 40000 0.0324 7.1160 1231.5257 5.5175 249.0057
5.5375 0.0858 45000 0.0315 7.1181 1234.1559 5.4709 237.6680
5.5712 0.0954 50000 0.0323 7.0829 1191.4170 5.4163 225.0337
5.5664 0.1049 55000 0.0305 7.0561 1159.9263 5.3879 218.7341
5.4285 0.1144 60000 0.0319 7.0428 1144.5391 5.3540 211.4576
5.3761 0.1240 65000 0.0324 7.0227 1121.8277 5.2855 197.4554
5.3878 0.1335 70000 0.0317 7.0129 1110.8984 5.2550 191.5256
5.3311 0.1431 75000 0.0311 6.9915 1087.3640 5.2316 187.0996
5.192 0.1526 80000 0.0304 6.9784 1073.2082 5.1762 177.0060
5.3575 0.1621 85000 0.0307 6.9608 1054.4677 5.1340 169.6862
5.2409 0.1717 90000 0.0296 6.9574 1050.9358 5.1046 164.7766
5.2178 0.1812 95000 0.0299 6.9414 1034.2007 5.0751 159.9887
5.1463 0.1907 100000 0.0297 6.9335 1026.1214 5.0439 155.0754
5.1763 0.2003 105000 0.0295 6.9240 1016.3639 5.0195 151.3366
5.1507 0.2098 110000 0.0289 6.9072 999.3971 4.9863 146.3885
5.0905 0.2193 115000 0.0291 6.9061 998.3528 4.9651 143.3287
5.133 0.2289 120000 0.0289 6.9054 997.6628 4.9342 138.9648
5.0768 0.2384 125000 0.0292 6.8943 986.6068 4.9211 137.1481
5.0645 0.2480 130000 0.0288 6.8848 977.3311 4.8912 133.1113
5.0714 0.2575 135000 0.0291 6.8772 969.9084 4.8770 131.2382
5.0038 0.2670 140000 0.0283 6.8762 968.9717 4.8573 128.6715
4.9626 0.2766 145000 0.0283 6.8654 958.5580 4.8280 124.9669
5.0443 0.2861 150000 0.0285 6.8569 950.3789 4.8000 121.5045
4.9348 0.2956 155000 0.0284 6.8488 942.7645 4.7827 119.4311
4.8871 0.3052 160000 0.0279 6.8374 932.0763 4.7582 116.5389
4.9312 0.3147 165000 0.0292 6.8212 917.0747 4.7352 113.8875
4.9013 0.3242 170000 0.0285 6.8190 915.0439 4.7225 112.4514
4.8217 0.3338 175000 0.0279 6.8120 908.7160 4.7054 110.5448
4.9256 0.3433 180000 0.0290 6.8176 913.8313 4.6843 108.2374
4.8326 0.3529 185000 0.0284 6.7713 872.4788 4.6523 104.8223
4.7726 0.3624 190000 0.0287 6.7743 875.0953 4.6364 103.1737
4.8655 0.3719 195000 0.0281 6.7721 873.1190 4.6174 101.2283
4.7666 0.3815 200000 0.0271 6.7623 864.6278 4.5986 99.3420
4.7235 0.3910 205000 0.0280 6.7533 856.8730 4.5802 97.5293
4.8219 0.4005 210000 0.0278 6.7477 852.1302 4.5626 95.8318
4.7113 0.4101 215000 0.0278 6.7336 840.1735 4.5455 94.2049
4.7011 0.4196 220000 0.0276 6.7365 842.5797 4.5303 92.7869
4.7134 0.4292 225000 0.0275 6.7356 841.8266 4.5222 92.0394
4.6916 0.4387 230000 0.0277 6.7191 828.0459 4.4974 89.7795
4.6694 0.4482 235000 0.0267 6.7319 838.7439 4.5096 90.8881
4.6188 0.4578 240000 0.0274 6.7032 814.9789 4.4625 86.7048
4.6234 0.4673 245000 0.0272 6.7044 815.9833 4.4569 86.2231
4.5967 0.4768 250000 0.0277 6.6989 811.5360 4.4373 84.5446
4.5814 0.4864 255000 0.0271 6.6857 800.8966 4.4201 83.1059
4.5838 0.4959 260000 0.0276 6.6818 797.7229 4.4135 82.5570
4.575 0.5054 265000 0.0268 6.6774 794.2122 4.3935 80.9230
4.5786 0.5150 270000 0.0272 6.6718 789.7848 4.3830 80.0755
4.5582 0.5245 275000 0.0270 6.6660 785.2693 4.3724 79.2297
4.5571 0.5341 280000 0.0275 6.6558 777.2855 4.3549 77.8625
4.5238 0.5436 285000 0.0276 6.6491 772.0528 4.3424 76.8926
4.4993 0.5531 290000 0.0272 6.6507 773.2902 4.3351 76.3315
4.6375 0.5627 295000 0.0273 6.6448 768.7754 4.3190 75.1133
4.4756 0.5722 300000 0.0273 6.6445 768.5136 4.3120 74.5917
4.4725 0.5817 305000 0.0265 6.6336 760.1904 4.2939 73.2534
4.4974 0.5913 310000 0.0276 6.6359 761.9827 4.3011 73.7798
4.4651 0.6008 315000 0.0263 6.6248 753.5762 4.2764 71.9778
4.4339 0.6104 320000 0.0270 6.6184 748.7501 4.2654 71.1964
4.456 0.6199 325000 0.0267 6.6181 748.5140 4.2558 70.5135
4.4337 0.6294 330000 0.0263 6.6098 742.3199 4.2479 69.9577
4.4737 0.6390 335000 0.0264 6.6077 740.8061 4.2347 69.0383
4.4369 0.6485 340000 0.0271 6.6113 743.4451 4.2307 68.7642
4.4086 0.6580 345000 0.0265 6.5994 734.6393 4.2213 68.1242
4.3934 0.6676 350000 0.0269 6.5908 728.3871 4.2089 67.2815
4.361 0.6771 355000 0.0264 6.5858 724.7172 4.2004 66.7156
4.3219 0.6866 360000 0.0261 6.5836 723.1735 4.1895 65.9867
4.3989 0.6962 365000 0.0264 6.5908 728.3600 4.1812 65.4464
4.3539 0.7057 370000 0.0262 6.5788 719.6894 4.1751 65.0468
4.3217 0.7153 375000 0.0263 6.5773 718.6223 4.1679 64.5772
4.3246 0.7248 380000 0.0263 6.5662 710.6689 4.1580 63.9416
4.3571 0.7343 385000 0.0265 6.5718 714.6888 4.1507 63.4802
4.3611 0.7439 390000 0.0262 6.5647 709.5887 4.1429 62.9843
4.297 0.7534 395000 0.0259 6.5603 706.5064 4.1342 62.4369
4.2687 0.7629 400000 0.0259 6.5615 707.3522 4.1294 62.1408
4.3698 0.7725 405000 0.0264 6.5555 703.1012 4.1234 61.7694
4.2747 0.7820 410000 0.0263 6.5541 702.1163 4.1165 61.3433
4.2801 0.7915 415000 0.0261 6.5474 697.4035 4.1110 61.0067
4.2964 0.8011 420000 0.0262 6.5474 697.4221 4.1041 60.5896
4.2828 0.8106 425000 0.0262 6.5472 697.3089 4.0988 60.2670
4.2847 0.8202 430000 0.0260 6.5395 691.9503 4.0927 59.9032
4.2484 0.8297 435000 0.0261 6.5386 691.3068 4.0874 59.5869
4.2779 0.8392 440000 0.0260 6.5351 688.9287 4.0834 59.3466
4.2502 0.8488 445000 0.0263 6.5322 686.9031 4.0770 58.9670
4.2147 0.8583 450000 0.0261 6.5313 686.3179 4.0766 58.9433
4.3302 0.8678 455000 0.0263 6.5291 684.7628 4.0692 58.5075
4.2672 0.8774 460000 0.0262 6.5272 683.4909 4.0660 58.3241
4.2339 0.8869 465000 0.0262 6.5267 683.1493 4.0636 58.1843
4.2653 0.8965 470000 0.0263 6.5230 680.6066 4.0580 57.8584
4.2606 0.9060 475000 0.0262 6.5197 678.3849 4.0556 57.7179
4.1935 0.9155 480000 0.0261 6.5203 678.8143 4.0534 57.5926
4.2571 0.9251 485000 0.0261 6.5182 677.3831 4.0484 57.3040
4.2549 0.9346 490000 0.0258 6.5181 677.3023 4.0472 57.2349
4.2247 0.9441 495000 57.0751 4.0444 0.0259 676.7443 6.5173
4.206 0.9537 500000 56.9853 4.0428 0.0257 675.6204 6.5156
4.1828 1.0095 505000 56.8521 4.0405 0.0259 673.7910 6.5129
4.2161 1.0191 510000 56.7178 4.0381 0.0260 672.9013 6.5116
4.244 1.0286 515000 56.6553 4.0370 0.0261 673.1997 6.5120
4.2112 1.0381 520000 56.6355 4.0366 0.0261 672.8140 6.5115

Framework versions

  • Transformers 4.57.0.dev0
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.1
Downloads last month
15
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hrezaei/T5Lae-Large-WeightedLoss

Finetunes
2 models

Dataset used to train hrezaei/T5Lae-Large-WeightedLoss

Evaluation results