Transformers Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers Paper • 2410.13184 • Published Oct 17, 2024 • 3
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers Paper • 2410.13184 • Published Oct 17, 2024 • 3
Multiplexer MoH: Multi-Head Attention as Mixture-of-Head Attention Paper • 2410.11842 • Published Oct 15, 2024 • 21
MoH: Multi-Head Attention as Mixture-of-Head Attention Paper • 2410.11842 • Published Oct 15, 2024 • 21
Transformers Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers Paper • 2410.13184 • Published Oct 17, 2024 • 3
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers Paper • 2410.13184 • Published Oct 17, 2024 • 3
Multiplexer MoH: Multi-Head Attention as Mixture-of-Head Attention Paper • 2410.11842 • Published Oct 15, 2024 • 21
MoH: Multi-Head Attention as Mixture-of-Head Attention Paper • 2410.11842 • Published Oct 15, 2024 • 21