VibecoderMcSwaggins commited on
Commit
11ad1dd
·
1 Parent(s): d1f91a4

fix(judges): fallback to heuristic extraction when HF quota exhausted

Browse files
docs/bugs/INVESTIGATION_QUOTA_BLOCKER.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Bug Investigation: HF Free Tier Quota Exhaustion
2
+
3
+ ## Status
4
+ - **Date:** 2025-11-29
5
+ - **Reporter:** CLI User
6
+ - **Component:** `HFInferenceJudgeHandler`
7
+ - **Priority:** High (UX Blocker for Free Tier)
8
+ - **Resolution:** FIXED
9
+
10
+ ## Issue Description
11
+ On a fresh run with a simple query ("What drugs improve female libido post-menopause?"), the system retrieved 20 valid sources but failed during the Judge/Analysis phase with:
12
+ `⚠️ Free Tier Quota Exceeded ⚠️`
13
+
14
+ This results in a "Synthesis" step that has 0 candidates and 0 findings, rendering the application useless for free users once the (very low) limit is hit, despite having valid search results.
15
+
16
+ ## Evidence
17
+ Output provided:
18
+ ```
19
+ ### Citations (20 sources)
20
+ ...
21
+ ### Reasoning
22
+ ⚠️ **Free Tier Quota Exceeded** ⚠️
23
+ ```
24
+
25
+ ## Root Cause Analysis
26
+ 1. **Search Success:** `SearchAgent` correctly found 20 documents (PubMed/EuropePMC).
27
+ 2. **Judge Failure:** `HFInferenceJudgeHandler` called the HF Inference API.
28
+ 3. **Quota Trap:** The API returned a 402 (Payment Required) or Quota error.
29
+ 4. **Previous Handling:** The handler caught this error and returned a `JudgeAssessment` with `sufficient=True` (to stop the loop) and *empty* fields.
30
+ 5. **Data Loss:** The 20 valid search results were effectively discarded from the "Analysis" perspective.
31
+
32
+ ## The "Deep Blocker"
33
+ The system had a "hard failure" mode for quota exhaustion, assuming that if the LLM can't judge, we have *no* useful information. This "bricked" the UX for free users immediately upon hitting the limit.
34
+
35
+ ## Solution Implemented
36
+ Modified `HFInferenceJudgeHandler._create_quota_exhausted_assessment` to:
37
+ 1. Accept the `evidence` list as an argument.
38
+ 2. Perform basic heuristic extraction (borrowed from `MockJudgeHandler` logic):
39
+ - Use titles as "Key Findings" (first 5 sources).
40
+ - Add a clear message in "Drug Candidates" telling the user to upgrade.
41
+ 3. Return this "Partial" assessment instead of an empty one.
42
+
43
+ ## Verification
44
+ - Created `tests/unit/agent_factory/test_judges_hf_quota.py` to verify that:
45
+ - 402 errors are caught.
46
+ - `sufficient` is set to `True` (stops loop).
47
+ - `key_findings` are populated from search result titles.
48
+ - `reasoning` contains the warning message.
49
+ - Ran existing tests `tests/unit/agent_factory/test_judges_hf.py` - All passed.
src/agent_factory/judges.py CHANGED
@@ -218,7 +218,7 @@ class HFInferenceJudgeHandler:
218
  or "payment required" in error_str.lower()
219
  ):
220
  logger.error("HF Quota Exhausted", error=error_str)
221
- return self._create_quota_exhausted_assessment(question)
222
 
223
  logger.warning("Model failed", model=model, error=str(e))
224
  last_error = e
@@ -342,16 +342,31 @@ IMPORTANT: Respond with ONLY valid JSON matching this schema:
342
 
343
  return None
344
 
345
- def _create_quota_exhausted_assessment(self, question: str) -> JudgeAssessment:
 
 
346
  """Create an assessment that stops the loop when quota is exhausted."""
 
 
 
 
 
 
 
 
 
 
 
347
  return JudgeAssessment(
348
  details=AssessmentDetails(
349
  mechanism_score=0,
350
- mechanism_reasoning="Free tier quota exhausted.",
351
  clinical_evidence_score=0,
352
- clinical_reasoning="Free tier quota exhausted.",
353
- drug_candidates=[],
354
- key_findings=[],
 
 
355
  ),
356
  sufficient=True, # STOP THE LOOP
357
  confidence=0.0,
@@ -360,6 +375,8 @@ IMPORTANT: Respond with ONLY valid JSON matching this schema:
360
  reasoning=(
361
  "⚠️ **Free Tier Quota Exceeded** ⚠️\n\n"
362
  "The HuggingFace Inference API free tier limit has been reached. "
 
 
363
  "Please try again later, or add an OpenAI/Anthropic API key above "
364
  "for unlimited access."
365
  ),
 
218
  or "payment required" in error_str.lower()
219
  ):
220
  logger.error("HF Quota Exhausted", error=error_str)
221
+ return self._create_quota_exhausted_assessment(question, evidence)
222
 
223
  logger.warning("Model failed", model=model, error=str(e))
224
  last_error = e
 
342
 
343
  return None
344
 
345
+ def _create_quota_exhausted_assessment(
346
+ self, question: str, evidence: list[Evidence]
347
+ ) -> JudgeAssessment:
348
  """Create an assessment that stops the loop when quota is exhausted."""
349
+ # Heuristic extraction for fallback
350
+ findings = []
351
+ for e in evidence[:5]:
352
+ title = e.citation.title
353
+ if len(title) > 150:
354
+ title = title[:147] + "..."
355
+ findings.append(title)
356
+
357
+ if not findings:
358
+ findings = ["No findings available (Quota exceeded and no search results)."]
359
+
360
  return JudgeAssessment(
361
  details=AssessmentDetails(
362
  mechanism_score=0,
363
+ mechanism_reasoning="Free tier quota exhausted. Unable to analyze mechanism.",
364
  clinical_evidence_score=0,
365
+ clinical_reasoning=(
366
+ "Free tier quota exhausted. Unable to analyze clinical evidence."
367
+ ),
368
+ drug_candidates=["Upgrade to paid API for drug extraction."],
369
+ key_findings=findings,
370
  ),
371
  sufficient=True, # STOP THE LOOP
372
  confidence=0.0,
 
375
  reasoning=(
376
  "⚠️ **Free Tier Quota Exceeded** ⚠️\n\n"
377
  "The HuggingFace Inference API free tier limit has been reached. "
378
+ "The search results listed below were retrieved but could not be "
379
+ "analyzed by the AI. "
380
  "Please try again later, or add an OpenAI/Anthropic API key above "
381
  "for unlimited access."
382
  ),
tests/unit/agent_factory/test_judges_hf_quota.py ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Unit tests for HFInferenceJudgeHandler Quota Logic."""
2
+
3
+ from unittest.mock import patch
4
+
5
+ import pytest
6
+
7
+ from src.agent_factory.judges import HFInferenceJudgeHandler
8
+ from src.utils.models import Citation, Evidence
9
+
10
+
11
+ @pytest.mark.unit
12
+ class TestHFInferenceJudgeHandlerQuota:
13
+ """Tests for HFInferenceJudgeHandler Quota handling."""
14
+
15
+ @pytest.mark.asyncio
16
+ async def test_assess_quota_exhausted(self):
17
+ """Test that quota exhaustion triggers fallback extraction."""
18
+ handler = HFInferenceJudgeHandler()
19
+
20
+ # Create some dummy evidence
21
+ evidence = [
22
+ Evidence(
23
+ content="Content 1",
24
+ citation=Citation(
25
+ source="pubmed", title="Important Drug A Findings", url="u1", date="d1"
26
+ ),
27
+ ),
28
+ Evidence(
29
+ content="Content 2",
30
+ citation=Citation(
31
+ source="pubmed", title="Clinical Trial of Drug B", url="u2", date="d2"
32
+ ),
33
+ ),
34
+ ]
35
+
36
+ # Mock _call_with_retry to raise a Quota error
37
+ with patch.object(
38
+ handler, "_call_with_retry", side_effect=Exception("402 Payment Required")
39
+ ):
40
+ result = await handler.assess("question", evidence)
41
+
42
+ # Check that it caught the error and stopped
43
+ assert result.sufficient is True
44
+ assert "Free Tier Quota Exceeded" in result.reasoning
45
+
46
+ # CRITICAL: Check that it extracted findings from titles
47
+ assert "Important Drug A Findings" in result.details.key_findings
48
+ assert "Clinical Trial of Drug B" in result.details.key_findings
49
+ assert result.details.drug_candidates == ["Upgrade to paid API for drug extraction."]