arxiv:2604.00590

UniMixer: A Unified Architecture for Scaling Laws in Recommendation Systems

Published on Apr 1

· Submitted by

Mingming Ha on Apr 2

kuaishou technology

Upvote

Authors:

Abstract

A unified scaling architecture for recommendation systems called UniMixer is proposed, featuring a generalized parameterized token mixing module that optimizes token mixing patterns and improves scaling efficiency across different architectures.

AI-generated summary

In recent years, the scaling laws of recommendation models have attracted increasing attention, which govern the relationship between performance and parameters/FLOPs of recommenders. Currently, there are three mainstream architectures for achieving scaling in recommendation models, namely attention-based, TokenMixer-based, and factorization-machine-based methods, which exhibit fundamental differences in both design philosophy and architectural structure. In this paper, we propose a unified scaling architecture for recommendation systems, namely UniMixer, to improve scaling efficiency and establish a unified theoretical framework that unifies the mainstream scaling blocks. By transforming the rule-based TokenMixer to an equivalent parameterized structure, we construct a generalized parameterized feature mixing module that allows the token mixing patterns to be optimized and learned during model training. Meanwhile, the generalized parameterized token mixing removes the constraint in TokenMixer that requires the number of heads to be equal to the number of tokens. Furthermore, we establish a unified scaling module design framework for recommender systems, which bridges the connections among attention-based, TokenMixer-based, and factorization-machine-based methods. To further boost scaling ROI, a lightweight UniMixing module is designed, UniMixing-Lite, which further compresses the model parameters and computational cost while significantly improve the model performance. The scaling curves are shown in the following figure. Extensive offline and online experiments are conducted to verify the superior scaling abilities of UniMixer.

View arXiv page View PDF Add to collection

Community

MingmingHa0705

Paper submitter about 17 hours ago

Recent scaling laws in recommendation models have highlighted three dominant architectures—attention-based, TokenMixer-based, and factorization-machine-based methods—which differ fundamentally in design and structure. This paper proposes UniMixer, a unified scaling architecture that generalizes token mixing into a learnable parameterized module, bridges these paradigms under a unified framework, and introduces a lightweight variant (UniMixing-Lite) to improve scaling efficiency, achieving strong performance gains validated by extensive experiments.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2604.00590

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.00590 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.00590 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.00590 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.