Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Patronus AI

Team
company
Verified
https://patronus.ai
patronusai
Activity Feed Request to join this org

AI & ML interests

LLM Evaluation

Papers

Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis

MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments

View all Papers

Rebecca Qian's profile pictureAnand Kannappan's profile pictureBartosz Mielczarek's profile pictureBartosz Mielczarek's profile pictureVarun Joshi's profile pictureArek's profile pictureDarshan Deshpande's profile pictureMaciej Geล‚don's profile pictureShivani Jain's profile pictureVarun Gangal's profile pictureEdgar Colque's profile pictureJedrzej's profile pictureChinmayee Kulkarni's profile pictureDevanshu Bansal's profile pictureBartlomiej Olechno's profile pictureJosh W's profile pictureTobi Akomolede's profile pictureYoshinari Fujinuma's profile picture

PatronusAI 's Spaces 5

pinned
Sleeping
7

TRAIL Leaderboard

๐Ÿฅ‡

Trace Reasoning and Agentic Issue Localization Leaderboard

May 15, 2025
pinned
Build error
105

Enterprise Scenarios Leaderboard

๐Ÿฅ‡

Jun 12, 2024
Running
3

BLUR Leaderboard

๐ŸŒ

BLUR leaderboard.

Apr 2, 2025
Runtime error
7

GLIDER

๐Ÿฆ…

GLIDER: Grading LLM Interactions and Decisions using Explain

Dec 19, 2024
Runtime error
6

LynxDemo

๐Ÿ”ฅ

Evaluate answer fidelity to document

Aug 15, 2024
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs