Spaces:
Running
A newer version of the Gradio SDK is available:
6.2.0
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
PaperCast is an AI agent application that transforms research papers into engaging podcast-style audio conversations. It takes arXiv URLs or PDF uploads as input, analyzes the paper, generates a natural dialogue between a host and expert, and produces downloadable audio with distinct voices.
Target Platform: HuggingFace Spaces (Gradio 6 application)
Hackathon: MCP 1st Birthday - Track 2 (MCP in Action - Consumer)
Required Tag: mcp-in-action-track-consumer
Development Commands
Environment Setup
pip install -r requirements.txt
Running Locally
python app.py
# Or: gradio app.py
Testing on HuggingFace Spaces
The application must be deployed to HuggingFace Spaces under the MCP-1st-Birthday organization.
Architecture Overview
Core Pipeline Flow
- Input Processing: Accept arXiv URL or PDF upload
- Paper Extraction: Extract text content from PDF
- Agent Analysis: Identify paper structure (abstract, methodology, findings, conclusions)
- Script Generation: Create natural dialogue between Host and Guest characters
- Audio Synthesis: Generate audio with distinct voices for each speaker
- Output Delivery: Provide transcript and audio file for download
Agent Behaviors (Critical for Track 2)
The application MUST demonstrate autonomous agent capabilities:
- Planning: Analyze paper structure and determine conversation flow strategy
- Reasoning: Identify which concepts need simplification, determine appropriate depth
- Execution: Orchestrate multi-step pipeline (fetch β extract β analyze β generate β synthesize)
- Context Management: Maintain coherence across the dialogue, referencing earlier points
MCP Integration Requirements
Must use MCP (Model Context Protocol) servers as tools. Potential use cases:
- Web fetching for URL-based paper retrieval
- PDF processing and text extraction
- Document parsing and structured analysis
- Vector database operations if implementing RAG
Character Design
- Host: Enthusiastic, asks clarifying questions, explains for general audience, keeps conversation flowing
- Guest: Technical expert/researcher persona, provides depth, answers questions with appropriate detail
Key Technical Considerations
PDF Processing
Academic PDFs have inconsistent formatting. Robust error handling is essential:
- Handle multi-column layouts
- Extract references and citations appropriately
- Deal with equations, figures, and tables
- Support various paper formats (arXiv, PubMed, conference papers)
LLM Dialogue Generation
- Use system prompts to establish distinct character personalities
- Maintain conversation continuity (reference previous points)
- Balance technical accuracy with accessibility
- Target appropriate script length (aim for 5-15 minute podcasts)
Text-to-Speech
Critical for user experience:
- Must have clearly distinct voices for Host vs Guest
- Audio quality must be intelligible
- Processing time should be reasonable (target: under 5 minutes total)
- Consider voice emotion/intonation for natural conversation
Performance & UX
- Processing can take 2-5 minutes - show clear progress indicators
- Consider async operations for long-running tasks
- Implement graceful error handling (invalid URLs, corrupted PDFs, API failures)
- Optional: Allow script preview before audio generation
- Cache generated podcasts to avoid reprocessing
Free/Open Source Priority
Budget is limited - prioritize freely available solutions:
- HuggingFace hosted models where possible
- Open source libraries (PyMuPDF, pdfplumber, etc.)
- Free tier APIs within rate limits
- Self-hosted components on HF Spaces infrastructure
Gradio 6 Interface Requirements
The UI should be simple and intuitive:
- Input section: URL input field + PDF upload (mutually exclusive or combined)
- Processing section: Clear status messages and progress indicators
- Output section:
- Audio player for immediate listening
- Download buttons for audio file and transcript
- Display transcript with speaker labels
- Error messages should be user-friendly
Submission Requirements Checklist
Required for valid submission:
- Working Gradio app deployed to HuggingFace Space
- Published under
MCP-1st-Birthdayorganization (not personal profile) - README.md includes
mcp-in-action-track-consumertag - Demo video (1-5 minutes) showing project in action
- Social media post link (X/LinkedIn) in README
- Clear documentation of purpose, usage, and technical approach
- All dependencies in requirements.txt
- Team member HuggingFace usernames in README
Judging Criteria Priority
When making design decisions, optimize for:
- Completeness: All deliverables submitted
- Design/UI-UX: Intuitive, polished interface
- Functionality: Effective use of Gradio 6, MCPs, and agent capabilities
- Creativity: Innovative approach to the problem
- Documentation: Clear README and demo video
- Real-world impact: Practical usefulness
Critical Implementation Notes
Agent vs API Chaining
This must demonstrate true agent behavior, not just API chaining:
- Show decision-making (e.g., determining which sections to emphasize)
- Demonstrate adaptive behavior (e.g., different strategies for different paper types)
- Use MCP servers as tools the agent reasons about, not just sequential calls
Natural Dialogue Generation
Avoid robotic Q&A format:
- Use conversational connectors ("That's fascinating...", "Building on that point...")
- Include natural reactions and acknowledgments
- Vary sentence structure and length
- Use analogies and examples appropriate for general audience
- Host should ask genuine questions that guide the conversation
Testing Strategy
Test with diverse paper types:
- Different fields (CS, biology, physics, social sciences)
- Various lengths (short letters vs full papers)
- Different repositories (arXiv, bioRxiv, PubMed)
- Papers with heavy math vs conceptual papers
File Organization (Recommended)
papercast/
βββ app.py # Main Gradio application
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation (must include track tag)
βββ agents/ # Agent logic and orchestration
βββ mcp_servers/ # MCP server integrations
βββ processing/ # PDF extraction and text processing
βββ generation/ # Script and dialogue generation
βββ synthesis/ # Text-to-speech audio generation
βββ utils/ # Helper functions
Known Constraints
- Deadline: November 30, 2025, 11:59 PM UTC
- Must be original work created November 14-30, 2025
- HuggingFace Spaces free tier (GPU available)
- Processing time target: under 5 minutes per paper
- All work must demonstrate MCP integration
Reference Materials
- Project brief:
PAPERCAST_PROJECT_BRIEF.md - Gradio 6 docs: https://www.gradio.app/
- MCP documentation: https://huggingface.co/blog/gradio-mcp
- Hackathon page: https://huggingface.co/MCP-1st-Birthday