oguzhanercan
's Collections
Training Theory
updated
Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
Paper
•
2502.03738
•
Published
•
11
Better Embeddings with Coupled Adam
Paper
•
2502.08441
•
Published
•
2
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and
Mixture-of-Experts Optimization Alignment
Paper
•
2502.16894
•
Published
•
32
SALT: Singular Value Adaptation with Low-Rank Transformation
Paper
•
2503.16055
•
Published
•
8
Decoupling Angles and Strength in Low-rank Adaptation
Paper
•
2503.18225
•
Published
•
3
Entropy-Based Adaptive Weighting for Self-Training
Paper
•
2503.23913
•
Published
•
3
Reinforcement Pre-Training
Paper
•
2506.08007
•
Published
•
263
DiffusionBlocks: Blockwise Training for Generative Models via
Score-Based Diffusion
Paper
•
2506.14202
•
Published
•
2
Selective Contrastive Learning for Weakly Supervised Affordance
Grounding
Paper
•
2508.07877
•
Published
•
12
Why Low-Precision Transformer Training Fails: An Analysis on Flash
Attention
Paper
•
2510.04212
•
Published
•
23