papercast / CLAUDE.md
batuhanozkose
feat: Implement initial PaperCast application with core modules, documentation, a periodic curl script, and a Gradio certificate.
472739a

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

PaperCast is an AI agent application that transforms research papers into engaging podcast-style audio conversations. It takes arXiv URLs or PDF uploads as input, analyzes the paper, generates a natural dialogue between a host and expert, and produces downloadable audio with distinct voices.

Target Platform: HuggingFace Spaces (Gradio 6 application)
Hackathon: MCP 1st Birthday - Track 2 (MCP in Action - Consumer)
Required Tag: mcp-in-action-track-consumer

Development Commands

Environment Setup

pip install -r requirements.txt

Running Locally

python app.py
# Or: gradio app.py

Testing on HuggingFace Spaces

The application must be deployed to HuggingFace Spaces under the MCP-1st-Birthday organization.

Architecture Overview

Core Pipeline Flow

  1. Input Processing: Accept arXiv URL or PDF upload
  2. Paper Extraction: Extract text content from PDF
  3. Agent Analysis: Identify paper structure (abstract, methodology, findings, conclusions)
  4. Script Generation: Create natural dialogue between Host and Guest characters
  5. Audio Synthesis: Generate audio with distinct voices for each speaker
  6. Output Delivery: Provide transcript and audio file for download

Agent Behaviors (Critical for Track 2)

The application MUST demonstrate autonomous agent capabilities:

  • Planning: Analyze paper structure and determine conversation flow strategy
  • Reasoning: Identify which concepts need simplification, determine appropriate depth
  • Execution: Orchestrate multi-step pipeline (fetch β†’ extract β†’ analyze β†’ generate β†’ synthesize)
  • Context Management: Maintain coherence across the dialogue, referencing earlier points

MCP Integration Requirements

Must use MCP (Model Context Protocol) servers as tools. Potential use cases:

  • Web fetching for URL-based paper retrieval
  • PDF processing and text extraction
  • Document parsing and structured analysis
  • Vector database operations if implementing RAG

Character Design

  • Host: Enthusiastic, asks clarifying questions, explains for general audience, keeps conversation flowing
  • Guest: Technical expert/researcher persona, provides depth, answers questions with appropriate detail

Key Technical Considerations

PDF Processing

Academic PDFs have inconsistent formatting. Robust error handling is essential:

  • Handle multi-column layouts
  • Extract references and citations appropriately
  • Deal with equations, figures, and tables
  • Support various paper formats (arXiv, PubMed, conference papers)

LLM Dialogue Generation

  • Use system prompts to establish distinct character personalities
  • Maintain conversation continuity (reference previous points)
  • Balance technical accuracy with accessibility
  • Target appropriate script length (aim for 5-15 minute podcasts)

Text-to-Speech

Critical for user experience:

  • Must have clearly distinct voices for Host vs Guest
  • Audio quality must be intelligible
  • Processing time should be reasonable (target: under 5 minutes total)
  • Consider voice emotion/intonation for natural conversation

Performance & UX

  • Processing can take 2-5 minutes - show clear progress indicators
  • Consider async operations for long-running tasks
  • Implement graceful error handling (invalid URLs, corrupted PDFs, API failures)
  • Optional: Allow script preview before audio generation
  • Cache generated podcasts to avoid reprocessing

Free/Open Source Priority

Budget is limited - prioritize freely available solutions:

  • HuggingFace hosted models where possible
  • Open source libraries (PyMuPDF, pdfplumber, etc.)
  • Free tier APIs within rate limits
  • Self-hosted components on HF Spaces infrastructure

Gradio 6 Interface Requirements

The UI should be simple and intuitive:

  • Input section: URL input field + PDF upload (mutually exclusive or combined)
  • Processing section: Clear status messages and progress indicators
  • Output section:
    • Audio player for immediate listening
    • Download buttons for audio file and transcript
    • Display transcript with speaker labels
  • Error messages should be user-friendly

Submission Requirements Checklist

Required for valid submission:

  • Working Gradio app deployed to HuggingFace Space
  • Published under MCP-1st-Birthday organization (not personal profile)
  • README.md includes mcp-in-action-track-consumer tag
  • Demo video (1-5 minutes) showing project in action
  • Social media post link (X/LinkedIn) in README
  • Clear documentation of purpose, usage, and technical approach
  • All dependencies in requirements.txt
  • Team member HuggingFace usernames in README

Judging Criteria Priority

When making design decisions, optimize for:

  1. Completeness: All deliverables submitted
  2. Design/UI-UX: Intuitive, polished interface
  3. Functionality: Effective use of Gradio 6, MCPs, and agent capabilities
  4. Creativity: Innovative approach to the problem
  5. Documentation: Clear README and demo video
  6. Real-world impact: Practical usefulness

Critical Implementation Notes

Agent vs API Chaining

This must demonstrate true agent behavior, not just API chaining:

  • Show decision-making (e.g., determining which sections to emphasize)
  • Demonstrate adaptive behavior (e.g., different strategies for different paper types)
  • Use MCP servers as tools the agent reasons about, not just sequential calls

Natural Dialogue Generation

Avoid robotic Q&A format:

  • Use conversational connectors ("That's fascinating...", "Building on that point...")
  • Include natural reactions and acknowledgments
  • Vary sentence structure and length
  • Use analogies and examples appropriate for general audience
  • Host should ask genuine questions that guide the conversation

Testing Strategy

Test with diverse paper types:

  • Different fields (CS, biology, physics, social sciences)
  • Various lengths (short letters vs full papers)
  • Different repositories (arXiv, bioRxiv, PubMed)
  • Papers with heavy math vs conceptual papers

File Organization (Recommended)

papercast/
β”œβ”€β”€ app.py                 # Main Gradio application
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ README.md             # Project documentation (must include track tag)
β”œβ”€β”€ agents/               # Agent logic and orchestration
β”œβ”€β”€ mcp_servers/          # MCP server integrations
β”œβ”€β”€ processing/           # PDF extraction and text processing
β”œβ”€β”€ generation/           # Script and dialogue generation
β”œβ”€β”€ synthesis/            # Text-to-speech audio generation
└── utils/                # Helper functions

Known Constraints

  • Deadline: November 30, 2025, 11:59 PM UTC
  • Must be original work created November 14-30, 2025
  • HuggingFace Spaces free tier (GPU available)
  • Processing time target: under 5 minutes per paper
  • All work must demonstrate MCP integration

Reference Materials