T5Laa-Large-WeightedLoss

This model is a fine-tuned version of on the HuggingFaceFW/fineweb sample-350BT dataset. It achieves the following results on the evaluation set:

  • Perplexity: 186.1024
  • Loss: 5.2263
  • Accuracy: 0.0365
  • Lookahead Perplexity: 1785.6071
  • Lookahead Loss: 7.4875

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • training_steps: 524288

Training results

Training Loss Epoch Step Accuracy Lookahead Loss Lookahead Perplexity Validation Loss Perplexity
6.6913 0.0095 5000 0.0282 39.3044 117399710280460224.0000 6.5784 719.3614
6.3853 0.0191 10000 0.0289 30.3654 15400915759075.514 6.2884 538.2917
6.2629 0.0286 15000 0.0305 26.5650 344358969082.5502 6.1319 460.3039
6.1226 0.0381 20000 0.0309 20.7767 1054904571.7535 6.0155 409.7168
5.9646 0.0477 25000 0.0322 12.7631 349095.7813 5.9123 369.5545
5.9014 0.0572 30000 0.0306 9.1890 9789.0147 5.9223 373.2783
5.9364 0.0668 35000 0.0326 8.8468 6952.2587 5.8365 342.5753
5.9101 0.0763 40000 0.0344 8.5723 5283.3060 5.8004 330.4170
5.8641 0.0858 45000 0.0340 8.8594 7040.4611 5.7838 325.0032
5.9081 0.0954 50000 0.0358 8.8988 7322.9299 5.7625 318.1431
5.9661 0.1049 55000 0.0356 8.6371 5636.9011 5.7804 323.8902
5.9456 0.1144 60000 0.0369 8.7998 6632.8089 5.8108 333.8912
5.9131 0.1240 65000 0.0382 8.9718 7877.6758 5.8246 338.5269
5.9408 0.1335 70000 0.0382 8.9709 7870.4415 5.7992 330.0488
6.0065 0.1431 75000 0.0394 8.8279 6821.6503 5.8472 346.2572
5.8967 0.1526 80000 0.0395 9.5302 13769.0647 5.8741 355.6943
6.1743 0.1621 85000 0.0402 8.8181 6755.5736 5.9579 386.7979
6.0193 0.1717 90000 0.0399 8.5546 5190.6226 5.8674 353.3374
6.0285 0.1812 95000 0.0404 8.4514 4681.4796 5.8678 353.4621
5.9659 0.1907 100000 0.0415 8.7269 6166.5850 5.8670 353.1936
6.0304 0.2003 105000 0.0412 8.4267 4567.2815 5.9103 368.8350
6.0583 0.2098 110000 0.0411 8.6948 5972.0452 5.8797 357.7156
6.053 0.2193 115000 0.0417 8.5532 5183.2960 5.9154 370.7001
6.1051 0.2289 120000 0.0426 8.5780 5313.2307 5.9279 375.3514
6.0722 0.2384 125000 0.0416 8.5247 5037.6345 5.9162 370.9970
6.0857 0.2480 130000 0.0427 8.3400 4187.9384 5.8883 360.7826
6.0764 0.2575 135000 0.0426 8.6062 5465.5602 5.9327 377.1547
6.0819 0.2670 140000 0.0417 8.7044 6029.5392 5.9483 383.0830
6.031 0.2766 145000 0.0420 8.5064 4946.1910 5.8796 357.6692
6.0952 0.2861 150000 0.0420 8.4238 4554.0414 5.8847 359.4969
6.0402 0.2956 155000 0.0417 8.2736 3918.9174 5.8895 361.2193
6.0109 0.3052 160000 0.0412 8.2933 3997.0211 5.8591 350.4191
6.0041 0.3147 165000 0.0419 7.8818 2648.6731 5.8305 340.5289
5.9871 0.3242 170000 0.0420 7.9843 2934.5865 5.8205 337.1288
5.9548 0.3338 175000 0.0412 8.1642 3512.9831 5.8359 342.3704
6.0324 0.3433 180000 0.0426 8.1206 3363.0849 5.7951 328.6756
5.9459 0.3529 185000 0.0421 7.8193 2488.1387 5.7750 322.1374
5.9077 0.3624 190000 0.0420 7.8562 2581.7259 5.7724 321.2927
5.983 0.3719 195000 0.0415 7.8718 2622.1947 5.7629 318.2699
5.9019 0.3815 200000 0.0404 8.0693 3195.0155 5.7540 315.4574
5.8664 0.3910 205000 0.0414 7.8615 2595.5338 5.7408 311.3156
5.9601 0.4005 210000 0.0420 7.9168 2742.9738 5.7238 306.0784
5.8749 0.4101 215000 0.0414 7.9843 2934.3893 5.7244 306.2388
5.8563 0.4196 220000 0.0415 7.8042 2450.8247 5.7055 300.5256
5.9083 0.4292 225000 0.0413 7.7531 2328.6963 5.6959 297.6468
5.8473 0.4387 230000 0.0415 7.7180 2248.4279 5.6771 292.0966
5.9021 0.4482 235000 0.0405 7.7686 2365.1709 5.7012 299.2285
5.7837 0.4578 240000 0.0410 7.7851 2404.3986 5.6569 286.2585
5.8015 0.4673 245000 0.0404 7.7014 2211.3498 5.6563 286.0834
5.7699 0.4768 250000 0.0419 7.6930 2193.0058 5.6397 281.3667
5.7673 0.4864 255000 0.0409 7.7949 2428.2159 5.6257 277.4645
5.7342 0.4959 260000 0.0409 7.7758 2382.1887 5.6136 274.1414
5.7296 0.5054 265000 0.0408 7.7695 2367.2356 5.6093 272.9645
5.7854 0.5150 270000 0.0418 7.7510 2323.8449 5.6001 270.4489
5.7778 0.5245 275000 0.0416 7.7293 2273.9757 5.5836 266.0243
5.7347 0.5341 280000 0.0405 7.7460 2312.1937 5.5753 263.8191
5.7254 0.5436 285000 0.0406 7.7327 2281.6569 5.5651 261.1643
5.7287 0.5531 290000 0.0405 7.6531 2107.0742 5.5603 259.8962
5.8322 0.5627 295000 0.0410 7.6627 2127.4718 5.5501 257.2545
5.6957 0.5722 300000 0.0406 7.6973 2202.4213 5.5512 257.5555
5.6885 0.5817 305000 0.0400 7.6782 2160.7936 5.5358 253.6155
5.6931 0.5913 310000 0.0413 7.6291 2057.2046 5.5372 253.9556
5.7024 0.6008 315000 0.0399 7.6305 2060.1648 5.5142 248.2015
5.6039 0.6104 320000 0.0401 7.6355 2070.4602 5.5001 244.7091
5.6658 0.6199 325000 0.0396 7.6232 2045.0868 5.4934 243.0818
5.6381 0.6294 330000 0.0398 7.5870 1972.3905 5.4854 241.1361
5.6829 0.6390 335000 0.0398 7.6267 2052.2820 5.4742 238.4706
5.6595 0.6485 340000 0.0404 7.5941 1986.4813 5.4733 238.2478
5.6537 0.6580 345000 0.0394 7.6398 2079.3792 5.4654 236.3743
5.6034 0.6676 350000 0.0399 7.5942 1986.5764 5.4511 233.0051
5.5748 0.6771 355000 0.0395 7.5792 1957.0270 5.4421 230.9297
5.5886 0.6866 360000 0.0386 7.6046 2007.4233 5.4313 228.4520
5.5795 0.6962 365000 0.0395 7.6059 2009.9626 5.4222 226.3869
5.6283 0.7057 370000 0.0387 7.5913 1980.9620 5.4149 224.7189
5.5595 0.7153 375000 0.0393 7.6170 2032.3993 5.4046 222.4215
5.5667 0.7248 380000 0.0390 7.6184 2035.3124 5.3961 220.5489
5.5508 0.7343 385000 0.0393 7.5623 1924.3152 5.3872 218.5816
5.5796 0.7439 390000 0.0391 7.5590 1918.0016 5.3797 216.9580
5.5285 0.7534 395000 0.0386 7.5661 1931.5695 5.3708 215.0273
5.5194 0.7629 400000 0.0385 7.5782 1955.1562 5.3634 213.4451
5.5849 0.7725 405000 0.0384 7.5532 1906.7725 5.3572 212.1205
5.48 0.7820 410000 210.5396 5.3497 0.0387 1909.0207 7.5543
5.5236 0.7915 415000 208.2445 5.3387 0.0379 1899.2312 7.5492
5.4733 0.8011 420000 206.7636 5.3316 0.0382 1868.5760 7.5329
5.4857 0.8106 425000 205.4097 5.3250 0.0377 1881.2684 7.5397
5.4883 0.8202 430000 203.5777 5.3160 0.0380 1897.0067 7.5480
5.4659 0.8297 435000 201.8842 5.3077 0.0373 1846.9836 7.5213
5.4864 0.8392 440000 200.3570 5.3001 0.0378 1847.6301 7.5217
5.4697 0.8488 445000 199.5629 5.2961 0.0376 1841.5727 7.5184
5.359 0.8583 450000 198.6558 5.2916 0.0368 1839.8899 7.5175
5.5236 0.8678 455000 196.3660 5.2800 0.0375 1823.7893 7.5087
5.4811 0.8774 460000 195.5318 5.2757 0.0370 1820.4498 7.5068
5.4134 0.8869 465000 193.9948 5.2678 0.0370 1811.8785 7.5021
5.4297 0.8965 470000 192.8994 5.2622 0.0372 1819.7303 7.5064
5.4657 0.9060 475000 192.0886 5.2580 0.0372 1798.7231 7.4948
5.398 0.9155 480000 190.8767 5.2516 0.0368 1802.1644 7.4967
5.4084 0.9251 485000 190.0388 5.2472 0.0367 1803.4448 7.4975
5.4512 0.9346 490000 189.2571 5.2431 0.0367 1796.6301 7.4937
5.42 0.9441 495000 188.6367 5.2398 0.0366 1793.9652 7.4922
5.4164 0.9537 500000 187.8678 5.2357 0.0365 1792.0770 7.4911
5.3912 1.0095 505000 187.3297 5.2329 0.0365 1787.5085 7.4886
5.3829 1.0191 510000 186.7343 5.2297 0.0366 1785.1887 7.4873
5.4174 1.0286 515000 186.4854 5.2284 0.0365 1786.2572 7.4879
5.403 1.0381 520000 186.2719 5.2272 0.0365 1785.8340 7.4876

Framework versions

  • Transformers 4.57.0.dev0
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.1
Downloads last month
11
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hrezaei/T5Laa-Large-WeightedLoss

Finetunes
2 models

Dataset used to train hrezaei/T5Laa-Large-WeightedLoss

Evaluation results