Submitted by Maksim Afanasyev 27 SLIME: Stabilized Likelihood Implicit Margin Enforcement for Preference Optimization Floating Point Sigma Lab 1 2