M Saad Salman's picture

4 301

M Saad Salman

MSS444

·

MSS444

AI & ML interests

None yet

Recent Activity

upvoted a paper about 20 hours ago

Self-Distillation Enables Continual Learning

upvoted a paper about 20 hours ago

Persona Prompting as a Lens on LLM Social Reasoning

upvoted a paper about 20 hours ago

FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning

View all activity

Organizations

None yet

upvoted 17 papers about 20 hours ago

Self-Distillation Enables Continual Learning

Paper • 2601.19897 • Published 4 days ago • 18

Persona Prompting as a Lens on LLM Social Reasoning

Paper • 2601.20757 • Published 3 days ago • 2

FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning

Paper • 2601.18150 • Published 5 days ago • 5

AACR-Bench: Evaluating Automatic Code Review with Holistic Repository-Level Context

Paper • 2601.19494 • Published 4 days ago • 13

Linear representations in language models can change dramatically over a conversation

Paper • 2601.20834 • Published 3 days ago • 19

DeepSeek-OCR 2: Visual Causal Flow

Paper • 2601.20552 • Published 3 days ago • 41

Reinforcement Learning via Self-Distillation

Paper • 2601.20802 • Published 3 days ago • 22

Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation

Paper • 2601.20614 • Published 3 days ago • 111

ECO: Quantized Training without Full-Precision Master Weights

Paper • 2601.22101 • Published 1 day ago • 3

FROST: Filtering Reasoning Outliers with Attention for Efficient Reasoning

Paper • 2601.19001 • Published 4 days ago • 2

Self-Improving Pretraining: using post-trained models to pretrain better models

Paper • 2601.21343 • Published 2 days ago • 6

Beyond Imitation: Reinforcement Learning for Active Latent Planning

Paper • 2601.21598 • Published 2 days ago • 5

Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening

Paper • 2601.21590 • Published 2 days ago • 10

Language-based Trial and Error Falls Behind in the Era of Experience

Paper • 2601.21754 • Published 1 day ago • 13

Exploring Reasoning Reward Model for Agents

Paper • 2601.22154 • Published 1 day ago • 17

ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation

Paper • 2601.21420 • Published 2 days ago • 22

Scaling Embeddings Outperforms Scaling Experts in Language Models

Paper • 2601.21204 • Published 2 days ago • 78

upvoted 3 papers 3 days ago

Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation

Paper • 2601.11258 • Published 15 days ago • 6

One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment

Paper • 2601.18731 • Published 5 days ago • 7

Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents

Paper • 2601.18217 • Published 5 days ago • 9