Spaces:

MCP-1st-Birthday
/

papercast

Running

papercast / CLAUDE.md

batuhanozkose

feat: Implement initial PaperCast application with core modules, documentation, a periodic curl script, and a Gradio certificate.

472739a about 1 month ago

preview code

raw

history blame contribute delete

7.32 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

PaperCast is an AI agent application that transforms research papers into engaging podcast-style audio conversations. It takes arXiv URLs or PDF uploads as input, analyzes the paper, generates a natural dialogue between a host and expert, and produces downloadable audio with distinct voices.

Target Platform: HuggingFace Spaces (Gradio 6 application)
Hackathon: MCP 1st Birthday - Track 2 (MCP in Action - Consumer)
Required Tag: mcp-in-action-track-consumer

Development Commands

Environment Setup

pip install -r requirements.txt

Running Locally

python app.py
# Or: gradio app.py

Testing on HuggingFace Spaces

The application must be deployed to HuggingFace Spaces under the MCP-1st-Birthday organization.

Architecture Overview

Core Pipeline Flow

Input Processing: Accept arXiv URL or PDF upload
Paper Extraction: Extract text content from PDF
Agent Analysis: Identify paper structure (abstract, methodology, findings, conclusions)
Script Generation: Create natural dialogue between Host and Guest characters
Audio Synthesis: Generate audio with distinct voices for each speaker
Output Delivery: Provide transcript and audio file for download

Agent Behaviors (Critical for Track 2)

The application MUST demonstrate autonomous agent capabilities:

Planning: Analyze paper structure and determine conversation flow strategy
Reasoning: Identify which concepts need simplification, determine appropriate depth
Execution: Orchestrate multi-step pipeline (fetch → extract → analyze → generate → synthesize)
Context Management: Maintain coherence across the dialogue, referencing earlier points

MCP Integration Requirements

Must use MCP (Model Context Protocol) servers as tools. Potential use cases:

Web fetching for URL-based paper retrieval
PDF processing and text extraction
Document parsing and structured analysis
Vector database operations if implementing RAG

Character Design

Host: Enthusiastic, asks clarifying questions, explains for general audience, keeps conversation flowing
Guest: Technical expert/researcher persona, provides depth, answers questions with appropriate detail

Key Technical Considerations

PDF Processing

Academic PDFs have inconsistent formatting. Robust error handling is essential:

Handle multi-column layouts
Extract references and citations appropriately
Deal with equations, figures, and tables
Support various paper formats (arXiv, PubMed, conference papers)

LLM Dialogue Generation

Use system prompts to establish distinct character personalities
Maintain conversation continuity (reference previous points)
Balance technical accuracy with accessibility
Target appropriate script length (aim for 5-15 minute podcasts)

Text-to-Speech

Critical for user experience:

Must have clearly distinct voices for Host vs Guest
Audio quality must be intelligible
Processing time should be reasonable (target: under 5 minutes total)
Consider voice emotion/intonation for natural conversation

Performance & UX

Processing can take 2-5 minutes - show clear progress indicators
Consider async operations for long-running tasks
Implement graceful error handling (invalid URLs, corrupted PDFs, API failures)
Optional: Allow script preview before audio generation
Cache generated podcasts to avoid reprocessing

Free/Open Source Priority

Budget is limited - prioritize freely available solutions:

HuggingFace hosted models where possible
Open source libraries (PyMuPDF, pdfplumber, etc.)
Free tier APIs within rate limits
Self-hosted components on HF Spaces infrastructure

Gradio 6 Interface Requirements

The UI should be simple and intuitive:

Input section: URL input field + PDF upload (mutually exclusive or combined)
Processing section: Clear status messages and progress indicators
Output section:
- Audio player for immediate listening
- Download buttons for audio file and transcript
- Display transcript with speaker labels
Error messages should be user-friendly

Submission Requirements Checklist

Required for valid submission:

Working Gradio app deployed to HuggingFace Space
Published under MCP-1st-Birthday organization (not personal profile)
README.md includes mcp-in-action-track-consumer tag
Demo video (1-5 minutes) showing project in action
Social media post link (X/LinkedIn) in README
Clear documentation of purpose, usage, and technical approach
All dependencies in requirements.txt
Team member HuggingFace usernames in README

Judging Criteria Priority

When making design decisions, optimize for:

Completeness: All deliverables submitted
Design/UI-UX: Intuitive, polished interface
Functionality: Effective use of Gradio 6, MCPs, and agent capabilities
Creativity: Innovative approach to the problem
Documentation: Clear README and demo video
Real-world impact: Practical usefulness

Critical Implementation Notes

Agent vs API Chaining

This must demonstrate true agent behavior, not just API chaining:

Show decision-making (e.g., determining which sections to emphasize)
Demonstrate adaptive behavior (e.g., different strategies for different paper types)
Use MCP servers as tools the agent reasons about, not just sequential calls

Natural Dialogue Generation

Avoid robotic Q&A format:

Use conversational connectors ("That's fascinating...", "Building on that point...")
Include natural reactions and acknowledgments
Vary sentence structure and length
Use analogies and examples appropriate for general audience
Host should ask genuine questions that guide the conversation

Testing Strategy

Test with diverse paper types:

Different fields (CS, biology, physics, social sciences)
Various lengths (short letters vs full papers)
Different repositories (arXiv, bioRxiv, PubMed)
Papers with heavy math vs conceptual papers

File Organization (Recommended)

papercast/
├── app.py                 # Main Gradio application
├── requirements.txt       # Python dependencies
├── README.md             # Project documentation (must include track tag)
├── agents/               # Agent logic and orchestration
├── mcp_servers/          # MCP server integrations
├── processing/           # PDF extraction and text processing
├── generation/           # Script and dialogue generation
├── synthesis/            # Text-to-speech audio generation
└── utils/                # Helper functions

Known Constraints

Deadline: November 30, 2025, 11:59 PM UTC
Must be original work created November 14-30, 2025
HuggingFace Spaces free tier (GPU available)
Processing time target: under 5 minutes per paper
All work must demonstrate MCP integration

Reference Materials

Project brief: PAPERCAST_PROJECT_BRIEF.md
Gradio 6 docs: https://www.gradio.app/
MCP documentation: https://huggingface.co/blog/gradio-mcp
Hackathon page: https://huggingface.co/MCP-1st-Birthday