Spaces:

MCP-1st-Birthday
/

papercast

Running

App Files Files Community

batuhanozkose commited on Nov 21

Commit

472739a

0 Parent(s):

feat: Implement initial PaperCast application with core modules, documentation, a periodic curl script, and a Gradio certificate.

Browse files

Files changed (25) hide show

.gitignore +53 -0
.gradio/certificate.pem +31 -0
CLAUDE.md +177 -0
PAPERCAST_PROJECT_BRIEF.md +332 -0
README.md +95 -0
agents/__init__.py +1 -0
agents/podcast_agent.py +349 -0
app.py +1203 -0
generation/__init__.py +1 -0
generation/script_generator.py +236 -0
live.py +52 -0
mcp_servers/__init__.py +1 -0
mcp_servers/paper_tools_server.py +32 -0
output/history.json +58 -0
plan.md +90 -0
processing/__init__.py +1 -0
processing/pdf_reader.py +21 -0
processing/url_fetcher.py +56 -0
requirements.txt +12 -0
synthesis/__init__.py +1 -0
synthesis/tts_engine.py +345 -0
todo.md +105 -0
utils/__init__.py +1 -0
utils/config.py +56 -0
utils/history.py +58 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,53 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual Environment
+venv/
+env/
+ENV/
+.venv
+# Environment Variables
+.env
+.env.local
+# IDEs
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# OS
+.DS_Store
+Thumbs.db
+# Project Specific
+*.pdf
+*.mp3
+*.wav
+cache/
+outputs/
+temp/
+# HuggingFace
+.cache/

.gradio/certificate.pem ADDED Viewed

	@@ -0,0 +1,31 @@

+-----BEGIN CERTIFICATE-----
+MIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw
+TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
+cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4
+WhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu
+ZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY
+MTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc
+h77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+
+0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U
+A5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW
+T8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH
+B5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC
+B5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv
+KBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn
+OlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn
+jh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw
+qHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI
+rU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV
+HRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq
+hkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL
+ubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ
+3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK
+NFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5
+ORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur
+TkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC
+jNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc
+oyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq
+4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA
+mRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d
+emyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=
+-----END CERTIFICATE-----

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,177 @@

+# CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+## Project Overview
+PaperCast is an AI agent application that transforms research papers into engaging podcast-style audio conversations. It takes arXiv URLs or PDF uploads as input, analyzes the paper, generates a natural dialogue between a host and expert, and produces downloadable audio with distinct voices.
+**Target Platform:** HuggingFace Spaces (Gradio 6 application)
+**Hackathon:** MCP 1st Birthday - Track 2 (MCP in Action - Consumer)
+**Required Tag:** `mcp-in-action-track-consumer`
+## Development Commands
+### Environment Setup
+```bash
+pip install -r requirements.txt
+```
+### Running Locally
+```bash
+python app.py
+# Or: gradio app.py
+```
+### Testing on HuggingFace Spaces
+The application must be deployed to HuggingFace Spaces under the `MCP-1st-Birthday` organization.
+## Architecture Overview
+### Core Pipeline Flow
+1. **Input Processing**: Accept arXiv URL or PDF upload
+2. **Paper Extraction**: Extract text content from PDF
+3. **Agent Analysis**: Identify paper structure (abstract, methodology, findings, conclusions)
+4. **Script Generation**: Create natural dialogue between Host and Guest characters
+5. **Audio Synthesis**: Generate audio with distinct voices for each speaker
+6. **Output Delivery**: Provide transcript and audio file for download
+### Agent Behaviors (Critical for Track 2)
+The application MUST demonstrate autonomous agent capabilities:
+- **Planning**: Analyze paper structure and determine conversation flow strategy
+- **Reasoning**: Identify which concepts need simplification, determine appropriate depth
+- **Execution**: Orchestrate multi-step pipeline (fetch → extract → analyze → generate → synthesize)
+- **Context Management**: Maintain coherence across the dialogue, referencing earlier points
+### MCP Integration Requirements
+Must use MCP (Model Context Protocol) servers as tools. Potential use cases:
+- Web fetching for URL-based paper retrieval
+- PDF processing and text extraction
+- Document parsing and structured analysis
+- Vector database operations if implementing RAG
+### Character Design
+- **Host**: Enthusiastic, asks clarifying questions, explains for general audience, keeps conversation flowing
+- **Guest**: Technical expert/researcher persona, provides depth, answers questions with appropriate detail
+## Key Technical Considerations
+### PDF Processing
+Academic PDFs have inconsistent formatting. Robust error handling is essential:
+- Handle multi-column layouts
+- Extract references and citations appropriately
+- Deal with equations, figures, and tables
+- Support various paper formats (arXiv, PubMed, conference papers)
+### LLM Dialogue Generation
+- Use system prompts to establish distinct character personalities
+- Maintain conversation continuity (reference previous points)
+- Balance technical accuracy with accessibility
+- Target appropriate script length (aim for 5-15 minute podcasts)
+### Text-to-Speech
+Critical for user experience:
+- Must have clearly distinct voices for Host vs Guest
+- Audio quality must be intelligible
+- Processing time should be reasonable (target: under 5 minutes total)
+- Consider voice emotion/intonation for natural conversation
+### Performance & UX
+- Processing can take 2-5 minutes - show clear progress indicators
+- Consider async operations for long-running tasks
+- Implement graceful error handling (invalid URLs, corrupted PDFs, API failures)
+- Optional: Allow script preview before audio generation
+- Cache generated podcasts to avoid reprocessing
+### Free/Open Source Priority
+Budget is limited - prioritize freely available solutions:
+- HuggingFace hosted models where possible
+- Open source libraries (PyMuPDF, pdfplumber, etc.)
+- Free tier APIs within rate limits
+- Self-hosted components on HF Spaces infrastructure
+## Gradio 6 Interface Requirements
+The UI should be simple and intuitive:
+- Input section: URL input field + PDF upload (mutually exclusive or combined)
+- Processing section: Clear status messages and progress indicators
+- Output section:
+  - Audio player for immediate listening
+  - Download buttons for audio file and transcript
+  - Display transcript with speaker labels
+- Error messages should be user-friendly
+## Submission Requirements Checklist
+Required for valid submission:
+- [ ] Working Gradio app deployed to HuggingFace Space
+- [ ] Published under `MCP-1st-Birthday` organization (not personal profile)
+- [ ] README.md includes `mcp-in-action-track-consumer` tag
+- [ ] Demo video (1-5 minutes) showing project in action
+- [ ] Social media post link (X/LinkedIn) in README
+- [ ] Clear documentation of purpose, usage, and technical approach
+- [ ] All dependencies in requirements.txt
+- [ ] Team member HuggingFace usernames in README
+## Judging Criteria Priority
+When making design decisions, optimize for:
+1. **Completeness**: All deliverables submitted
+2. **Design/UI-UX**: Intuitive, polished interface
+3. **Functionality**: Effective use of Gradio 6, MCPs, and agent capabilities
+4. **Creativity**: Innovative approach to the problem
+5. **Documentation**: Clear README and demo video
+6. **Real-world impact**: Practical usefulness
+## Critical Implementation Notes
+### Agent vs API Chaining
+This must demonstrate true agent behavior, not just API chaining:
+- Show decision-making (e.g., determining which sections to emphasize)
+- Demonstrate adaptive behavior (e.g., different strategies for different paper types)
+- Use MCP servers as tools the agent reasons about, not just sequential calls
+### Natural Dialogue Generation
+Avoid robotic Q&A format:
+- Use conversational connectors ("That's fascinating...", "Building on that point...")
+- Include natural reactions and acknowledgments
+- Vary sentence structure and length
+- Use analogies and examples appropriate for general audience
+- Host should ask genuine questions that guide the conversation
+### Testing Strategy
+Test with diverse paper types:
+- Different fields (CS, biology, physics, social sciences)
+- Various lengths (short letters vs full papers)
+- Different repositories (arXiv, bioRxiv, PubMed)
+- Papers with heavy math vs conceptual papers
+## File Organization (Recommended)
+```
+papercast/
+├── app.py                 # Main Gradio application
+├── requirements.txt       # Python dependencies
+├── README.md             # Project documentation (must include track tag)
+├── agents/               # Agent logic and orchestration
+├── mcp_servers/          # MCP server integrations
+├── processing/           # PDF extraction and text processing
+├── generation/           # Script and dialogue generation
+├── synthesis/            # Text-to-speech audio generation
+└── utils/                # Helper functions
+```
+## Known Constraints
+- Deadline: November 30, 2025, 11:59 PM UTC
+- Must be original work created November 14-30, 2025
+- HuggingFace Spaces free tier (GPU available)
+- Processing time target: under 5 minutes per paper
+- All work must demonstrate MCP integration
+## Reference Materials
+- Project brief: `PAPERCAST_PROJECT_BRIEF.md`
+- Gradio 6 docs: https://www.gradio.app/
+- MCP documentation: https://huggingface.co/blog/gradio-mcp
+- Hackathon page: https://huggingface.co/MCP-1st-Birthday

PAPERCAST_PROJECT_BRIEF.md ADDED Viewed

	@@ -0,0 +1,332 @@

+# PaperCast - Project Brief
+## Hackathon Context
+### Event Details
+- **Name:** MCP's 1st Birthday Hackathon
+- **Organizers:** Anthropic & Gradio
+- **Duration:** November 14-30, 2025 (17 days, 3 weekends)
+- **Total Prize Pool:** $21,000 USD + API Credits
+- **Total Registrations:** 6100+
+- **Platform:** HuggingFace Spaces
+### Our Track: Track 2 - MCP in Action (Agents)
+**Track Description:** Create complete AI agent Gradio applications that showcase autonomous reasoning, planning, and execution using MCP tools.
+**Category:** Consumer Applications
+- **Tag Required:** `mcp-in-action-track-consumer`
+- **Prize Pool Per Category:**
+  - 🥇 First Place: $2,500 USD
+  - 🥈 Second Place: $1,000 USD
+  - 🥉 Third Place: $500 USD
+### Judging Criteria (Priority Order)
+1. **Completeness:** HF Space + Social media post + Documentation + Demo Video
+2. **Design/Polished UI-UX:** How intuitive and easy-to-use the app is
+3. **Functionality:** Effective use of Gradio 6, MCPs, Agentic capabilities
+4. **Creativity:** Innovation in idea and implementation
+5. **Documentation:** Clear communication in README and demo video
+6. **Real-world impact:** Potential for practical usefulness
+### Technical Requirements
+- Must be published as HuggingFace Space under `MCP-1st-Birthday` organization
+- Must be a Gradio application
+- Must demonstrate autonomous agent behavior (planning, reasoning, execution)
+- Must use MCP servers as tools
+- Bonus points for: RAG, Context Engineering, advanced agent features
+- All work must be original and created during Nov 14-30
+### Submission Requirements
+1. Working Gradio app deployed on HuggingFace Space
+2. Track tag in README.md: `mcp-in-action-track-consumer`
+3. Demo video (1-5 minutes) showing project in action
+4. Social media post link (X/LinkedIn) about the project
+5. Clear documentation of purpose, usage, and technical approach
+### Available Credits (For Registered Participants)
+- OpenAI: $25 for all participants
+- HuggingFace: $25 for all participants
+- Modal: $250 for all participants
+- Nebius Token Factory: $50 for all participants
+- ElevenLabs: $44 membership credits (for 5000 participants)
+- SambaNova: $25 (for 1500 participants)
+**Note:** Credits are provided to support hackathon development but availability timing may vary. Build with freely available alternatives as primary approach.
+---
+## Project Vision: PaperCast
+### The Problem
+Research papers are incredibly valuable but present significant accessibility challenges:
+- Dense, technical language requiring domain expertise
+- Time-consuming to read (typically 30-60+ minutes per paper)
+- Difficult to consume during daily activities (commute, exercise, chores)
+- Creates barrier between cutting-edge research and broader audiences
+### Our Solution
+**PaperCast:** An AI agent that transforms research papers into engaging podcast-style conversations between a host and an expert, making complex research accessible through audio.
+### Core Value Proposition
+- **Input:** arXiv/PubMed URL or PDF upload
+- **Process:** AI analyzes and generates natural dialogue between two speakers
+- **Output:** Downloadable podcast audio file + transcript
+- **Benefit:** Consume research during any activity, in accessible language
+### Target Users
+1. **Researchers/Academics:** Stay current with literature during commutes
+2. **Students:** Understand papers more easily through conversational format
+3. **Industry Professionals:** Keep up with relevant research without time investment
+4. **Science Enthusiasts:** Access cutting-edge findings in digestible format
+---
+## Functional Requirements
+### Input Methods (Dual Support)
+1. **URL Input:** Accept links from research repositories
+   - arXiv (e.g., `https://arxiv.org/abs/2401.12345`)
+   - PubMed, bioRxiv, other common repositories
+   - Extract PDF from URL
+2. **PDF Upload:** Direct file upload
+   - Support standard academic paper PDFs
+   - Handle various formatting styles
+### Core Processing Pipeline
+1. **Paper Extraction:** Extract text content from PDF or fetched document
+2. **Analysis:** Identify key components (abstract, methodology, findings, conclusions)
+3. **Script Generation:** Create natural dialogue between two speakers:
+   - **Host Character:** Enthusiastic, asks clarifying questions, explains for general audience
+   - **Guest Character:** The expert/researcher, provides technical depth
+   - Natural conversation flow with context awareness
+   - Appropriate analogies and examples for accessibility
+4. **Audio Synthesis:** Convert dialogue to audio with distinct voices for each speaker
+5. **Output Delivery:** Provide both transcript and audio file
+### Agentic Behaviors to Demonstrate
+- **Planning:** Analyze paper structure and determine conversation flow
+- **Reasoning:** Identify which concepts need simplification or elaboration
+- **Execution:** Orchestrate multiple steps (fetch → extract → analyze → generate → synthesize)
+- **Context Management:** Maintain coherence across the dialogue
+### User Experience Requirements
+- Simple, clean interface (Gradio 6)
+- Clear loading states during processing (can take 2-5 minutes)
+- Preview of generated script before audio synthesis (optional)
+- Audio player for immediate listening
+- Download options for both audio and transcript
+- Error handling for invalid URLs or corrupted PDFs
+---
+## Technical Constraints & Considerations
+### Platform & Framework
+- **Primary Framework:** Gradio 6 (latest version)
+- **Deployment:** HuggingFace Spaces (free tier with GPU)
+- **Language:** Python
+### MCP Integration
+Must use MCP (Model Context Protocol) servers as tools. Potential MCP server use cases:
+- Web fetching for URL-based paper retrieval
+- PDF processing and text extraction
+- Vector database operations for RAG
+- Document parsing and analysis
+### Architecture Considerations
+- Process can be computationally expensive (LLM calls, TTS generation)
+- Consider async operations and progress indicators
+- Graceful degradation if services are unavailable
+- Caching strategies to avoid reprocessing same papers
+### Free/Open Source Priority
+Since budget is limited, prioritize freely available solutions:
+- Open source models and libraries
+- Free tier APIs (within rate limits)
+- HuggingFace ecosystem tools
+- Self-hosted components where feasible
+**Strategy:** Build core functionality with free tools, then optionally enhance with hackathon credits if/when available.
+---
+## Success Metrics
+### Minimum Viable Product (MVP)
+- Accept arXiv URL or PDF upload ✓
+- Extract paper text ✓
+- Generate coherent dialogue script ✓
+- Produce audio with 2 distinct speakers ✓
+- Deployed and functional on HF Space ✓
+### Enhanced Version (If Time Permits)
+- Multiple paper repository support
+- Customizable podcast length (5 min vs 15 min versions)
+- Voice selection or style options
+- Background music/intro/outro
+- Batch processing for multiple papers
+- Save history of generated podcasts
+### Demo Quality Goals
+- Generate a podcast in under 5 minutes
+- Script should be natural and engaging (not robotic)
+- Audio should be clearly intelligible
+- Voices should be distinctly different
+- Technical concepts appropriately explained
+---
+## Deliverables Checklist
+### Code & Deployment
+- [ ] Working Gradio application
+- [ ] Deployed to HuggingFace Space under MCP-1st-Birthday org
+- [ ] All dependencies in requirements.txt
+- [ ] Clear code organization and comments
+### Documentation (README.md)
+- [ ] Project title and description
+- [ ] Track tag: `mcp-in-action-track-consumer`
+- [ ] How to use instructions
+- [ ] Technical architecture overview
+- [ ] Team member(s) HuggingFace usernames
+- [ ] Demo video link (embedded)
+- [ ] Social media post link
+- [ ] Acknowledgment of tools/APIs used
+### Demo Video (1-5 minutes)
+- [ ] Problem introduction (30 sec)
+- [ ] Solution overview (30 sec)
+- [ ] Live demonstration (2-3 min)
+  - Show URL/PDF input
+  - Processing visualization
+  - Script preview
+  - Audio playback (30-60 sec sample)
+- [ ] Technical highlights (30 sec)
+- [ ] Impact statement (30 sec)
+### Social Media Post
+- [ ] Published on X (Twitter) or LinkedIn
+- [ ] Includes project description
+- [ ] Links to HuggingFace Space
+- [ ] Relevant hashtags (#GradioHackathon #MCP)
+- [ ] Demo video or GIF if possible
+---
+## Timeline Recommendation
+### Week 1 (Nov 14-21): Foundation
+- Set up project structure
+- Implement PDF/URL input handling
+- Build text extraction pipeline
+- Initial dialogue generation experiments
+### Week 2 (Nov 22-27): Core Features
+- Refine script generation quality
+- Implement audio synthesis
+- Build Gradio interface
+- Integrate MCP servers
+- Testing and iteration
+### Week 3 (Nov 28-30): Polish & Submit
+- Nov 28: UI refinement, error handling
+- Nov 29: Demo video creation, documentation
+- Nov 30: Social media post, final testing, submission
+---
+## Strategic Notes
+### Differentiation from Competitors
+- Most participants will likely build generic chatbots or simple tools
+- PaperCast is unique: specific use case, multimodal output, clear value
+- The "podcast" angle is memorable and demo-able
+- Strong real-world applicability (education/research)
+### Competitive Advantages
+1. **Clear use case:** Not just "another AI chat app"
+2. **Multimodal:** Text → conversational audio (less competition in this category)
+3. **Viral potential:** Researchers will want to share their papers as podcasts
+4. **Demo appeal:** Juries can literally listen to the output
+### Risk Mitigation
+- **TTS Quality:** Critical for user experience - explore multiple options
+- **Script Coherence:** May need iterative prompt engineering
+- **Processing Time:** Set realistic expectations, show progress
+- **PDF Parsing:** Academic PDFs have inconsistent formatting - robust error handling needed
+### Bonus Opportunities
+- **Modal Innovation Award:** If we use Modal for compute ($2,500)
+- **Google Gemini Award:** If we use Gemini API ($15K in credits)
+- **Blaxel Award:** If we use Blaxel in submission ($2,500)
+- **Community Choice:** Maximize social engagement
+---
+## Resources & Links
+### Essential Links
+- **Hackathon Page:** https://huggingface.co/MCP-1st-Birthday
+- **Discord:** https://discord.gg/fveShqytyh (Channel: #agents-mcp-hackathon-winter25🏆)
+- **Gradio 6 Docs:** https://www.gradio.app/
+- **MCP Documentation:** https://huggingface.co/blog/gradio-mcp
+- **Submission Deadline:** November 30, 2025, 11:59 PM UTC
+### Inspirational Examples (June 2025 Hackathon)
+Look at previous submissions for quality benchmarks and presentation style.
+---
+## Critical Reminders
+1. **Track Tag is MANDATORY:** `mcp-in-action-track-consumer` in README.md
+2. **Organization Requirement:** Must publish under MCP-1st-Birthday, not personal profile
+3. **Social Media is REQUIRED:** Submission invalid without it
+4. **Demo Video is REQUIRED:** Judges won't evaluate without seeing it
+5. **Original Work Only:** Everything must be built Nov 14-30, 2025
+6. **MCP Integration Required:** Must demonstrate MCP server usage
+7. **Agent Behavior Required:** Must show planning, reasoning, execution
+---
+## Open Questions for Implementation
+These are decisions that should be made during development based on experimentation and available resources:
+1. **LLM Selection:** Which model for dialogue generation? (Consider: quality, cost, speed, availability)
+2. **TTS System:** Which text-to-speech solution? (Consider: voice quality, speaker diversity, processing time, cost)
+3. **PDF Processing:** Which library/approach? (PyMuPDF, pdfplumber, etc.)
+4. **MCP Architecture:** Which specific MCP servers to integrate and how?
+5. **RAG Strategy:** Do we need vector embeddings? Which embedding model?
+6. **Script Length:** Target word count for optimal podcast length?
+7. **Caching:** Should we cache generated podcasts? How?
+8. **Voice Personalities:** How to prompt for consistent host/guest characteristics?
+---
+## Success Definition
+**We'll know we succeeded when:**
+- A user can input a real arXiv paper and get a listenable podcast in under 5 minutes
+- The dialogue sounds natural, not like a robotic Q&A
+- Technical concepts are explained accessibly
+- The demo video makes judges go "wow, I want to use this"
+- The project demonstrates clear agent capabilities (not just API chaining)
+- We're proud to share it publicly
+**This is more than a hackathon submission - it's a tool that could genuinely help people access knowledge more easily.**
+---
+## Final Note
+This document provides context and requirements, but implementation decisions are yours to make. Focus on:
+- Building something that works reliably
+- Creating an experience that delights users
+- Demonstrating thoughtful agent design
+- Shipping on time with polish
+Good luck! 🎙️🚀

README.md ADDED Viewed

	@@ -0,0 +1,95 @@

+# PaperCast 🎙️
+Transform research papers into engaging podcast-style conversations.
+**Track:** `mcp-in-action-track-consumer`
+## Overview
+PaperCast is an AI agent application that converts academic research papers into accessible, engaging podcast-style audio conversations between a host and an expert. Simply provide an arXiv URL or upload a PDF, and PaperCast will generate a natural dialogue that explains the research in an approachable way.
+## Features
+- 📄 **Multiple Input Methods**: Accept arXiv URLs or direct PDF uploads
+- 🤖 **Autonomous Agent**: Intelligent analysis and conversation planning
+- 🎭 **Natural Dialogue**: Two distinct speakers (Host and Guest) with conversational flow
+- 🔊 **High-Quality Audio**: Clear, distinct voices for each speaker
+- 📝 **Complete Transcripts**: Download both audio and text versions
+- ⚡ **Fast Processing**: Generate podcasts in under 5 minutes
+## How It Works
+1. **Input**: Provide an arXiv URL or upload a research paper PDF
+2. **Analysis**: AI agent analyzes paper structure and identifies key concepts
+3. **Script Generation**: Creates natural dialogue between host and expert
+4. **Audio Synthesis**: Converts script to audio with distinct voices
+5. **Output**: Download podcast audio and transcript
+## Technical Stack
+- **Framework**: Gradio 6
+- **AI Agent**: Autonomous reasoning with MCP integration
+- **LLM**: Phi-4-mini-instruct / VibeThinker-1.5B
+- **TTS**: Supertone/Maya for distinct voices
+- **PDF Processing**: PyMuPDF, pdfplumber
+- **Platform**: HuggingFace Spaces
+## Installation
+```bash
+pip install -r requirements.txt
+```
+## Usage
+```bash
+python app.py
+```
+Then open your browser to the provided URL (typically `http://localhost:7860`).
+## Project Structure
+```
+papercast/
+├── app.py                 # Main Gradio application
+├── requirements.txt       # Python dependencies
+├── README.md             # This file
+├── agents/               # Agent logic and orchestration
+├── mcp_servers/          # MCP server integrations
+├── processing/           # PDF extraction and text processing
+├── generation/           # Script and dialogue generation
+├── synthesis/            # Text-to-speech audio generation
+└── utils/                # Helper functions
+```
+## Team
+- [Team Member HF Username]
+## Demo
+[Demo video link will be added here]
+## Social Media
+[Social media post link will be added here]
+## Acknowledgments
+Built for the MCP 1st Birthday Hackathon (Track 2: MCP in Action - Consumer).
+Special thanks to:
+- Anthropic & Gradio for organizing the hackathon
+- HuggingFace for hosting infrastructure
+- Open source communities for TTS and LLM models
+## License
+[To be determined]
+---
+**Hackathon:** MCP's 1st Birthday
+**Category:** Consumer Applications
+**Organization:** MCP-1st-Birthday

agents/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """Agent logic and orchestration for PaperCast"""

agents/podcast_agent.py ADDED Viewed

	@@ -0,0 +1,349 @@

+import time
+from generation.script_generator import get_generator
+from processing.pdf_reader import extract_text_from_pdf
+from processing.url_fetcher import fetch_paper_from_url
+from synthesis.tts_engine import get_tts_engine
+from utils.config import (
+    DEMO_INFERENCE_KEY,
+    DEMO_INFERENCE_URL,
+    DEMO_MODE,
+    DEMO_MODEL,
+    DEMO_TTS_KEY,
+    MAX_CONTEXT_CHARS,
+)
+from utils.history import save_to_history
+class PodcastAgent:
+    def __init__(
+        self,
+        provider_mode="demo",
+        own_base_url=None,
+        own_api_key=None,
+        own_model=None,
+        openai_key=None,
+        openai_model=None,
+        tts_provider="edge-tts",
+        elevenlabs_key=None,
+        host_voice=None,
+        guest_voice=None,
+        max_tokens=None,
+    ):
+        self.logs = []
+        # If demo mode is enabled, override all settings with demo credentials
+        if DEMO_MODE:
+            self.provider_mode = "demo"
+            self.own_base_url = DEMO_INFERENCE_URL
+            self.own_api_key = DEMO_INFERENCE_KEY
+            self.own_model = DEMO_MODEL
+            self.openai_key = None
+            self.openai_model = None
+            self.tts_provider = "edge-tts"  # Always use Edge-TTS in demo mode
+            self.elevenlabs_key = None
+            self.host_voice = host_voice
+            self.guest_voice = guest_voice
+        else:
+            self.provider_mode = provider_mode  # "own_inference" or "openai"
+            self.own_base_url = own_base_url
+            self.own_api_key = own_api_key
+            self.own_model = own_model
+            self.openai_key = openai_key
+            self.openai_model = openai_model
+            self.tts_provider = tts_provider
+            self.elevenlabs_key = elevenlabs_key
+            self.host_voice = host_voice
+            self.guest_voice = guest_voice
+        self.max_tokens = max_tokens
+    def log(self, message):
+        timestamp = time.strftime("%H:%M:%S")
+        entry = f"[{timestamp}] {message}"
+        print(entry)
+        self.logs.append(entry)
+        return entry
+    def process(self, url: str = None, pdf_file=None):
+        """
+        Orchestrates the conversion from URL or uploaded PDF to Podcast.
+        Args:
+            url: Paper URL (arXiv or medRxiv)
+            pdf_file: Uploaded PDF file object
+        """
+        # Determine source
+        if pdf_file:
+            yield self.log(
+                f"Received uploaded PDF: {pdf_file.name if hasattr(pdf_file, 'name') else 'file'}"
+            )
+            pdf_path = pdf_file.name if hasattr(pdf_file, "name") else pdf_file
+            source_ref = "Uploaded PDF"
+        elif url:
+            yield self.log(f"Received request for URL: {url}")
+            # Step 1: Fetch Paper
+            yield self.log("Thinking: I need to download the paper first.")
+            yield self.log(f"Tool Call: fetch_paper({url})")
+            pdf_path = fetch_paper_from_url(url)
+            if not pdf_path:
+                yield self.log("Error: Failed to download paper.")
+                return None, "\n".join(self.logs)
+            yield self.log(f"Paper downloaded to: {pdf_path}")
+            source_ref = url
+        else:
+            yield self.log(
+                "Error: No input provided. Please provide either a URL or upload a PDF."
+            )
+            return None, "\n".join(self.logs)
+        # Step 2: Read PDF
+        yield self.log("Thinking: Now I need to extract the text content.")
+        yield self.log(f"Tool Call: read_pdf({pdf_path})")
+        text = extract_text_from_pdf(pdf_path)
+        if not text:
+            yield self.log("Error: Failed to extract text.")
+            return None, self.logs
+        yield self.log(f"Extracted {len(text)} characters.")
+        # Step 3: Generate Script
+        yield self.log(
+            "Thinking: The text is ready. I will now generate a podcast script using the LLM."
+        )
+        if self.provider_mode == "demo":
+            yield self.log("Using Demo Inference")
+        elif self.provider_mode == "own_inference":
+            yield self.log(f"Using Own Inference: {self.own_base_url}")
+        else:
+            yield self.log(f"Using OpenAI ({self.openai_model or 'gpt-4o-mini'})")
+        yield self.log("Tool Call: generate_script(...)")
+        generator = get_generator(
+            provider_mode=self.provider_mode,
+            own_base_url=self.own_base_url,
+            own_api_key=self.own_api_key,
+            own_model=self.own_model,
+            openai_key=self.openai_key,
+            openai_model=self.openai_model,
+            max_tokens=self.max_tokens,
+        )
+        script = generator.generate_podcast_script(text)
+        if not script:
+            yield self.log("Error: Failed to generate script.")
+            return None, self.logs
+        yield self.log(f"Generated script with {len(script)} dialogue turns.")
+        # Step 4: Synthesize Audio
+        yield self.log("Thinking: The script looks good. Sending it to the TTS engine.")
+        if self.tts_provider == "edge-tts":
+            yield self.log("Using Edge-TTS (Microsoft, free)")
+        elif self.tts_provider == "elevenlabs":
+            if self.elevenlabs_key:
+                yield self.log("Using custom ElevenLabs API key")
+            else:
+                yield self.log("Using demo ElevenLabs key")
+        yield self.log("Tool Call: synthesize_podcast(...)")
+        tts = get_tts_engine(
+            tts_provider=self.tts_provider,
+            custom_api_key=self.elevenlabs_key if self.tts_provider == "elevenlabs" else None,
+            host_voice=self.host_voice,
+            guest_voice=self.guest_voice
+        )
+        audio_path = tts.synthesize_dialogue(script)
+        if not audio_path:
+            yield self.log("Error: Failed to synthesize audio.")
+            return None, self.logs
+        yield self.log(f"Podcast generated successfully at: {audio_path}")
+        # Save to history
+        save_to_history(source_ref, audio_path, len(script))
+        yield self.log("✓ Saved to history")
+        return audio_path, "\n".join(self.logs)
+    def process_multiple(self, urls: list = None, pdf_files: list = None):
+        """
+        Orchestrates the conversion from multiple URLs or PDFs to a single comprehensive Podcast.
+        Args:
+            urls: List of paper URLs (arXiv or medRxiv)
+            pdf_files: List of uploaded PDF file objects
+        """
+        all_texts = []
+        source_refs = []
+        total_chars = 0
+        # Process URLs
+        if urls:
+            yield self.log(f"Received {len(urls)} URLs to process.")
+            yield self.log(f"Context limit: {MAX_CONTEXT_CHARS:,} characters")
+            for i, url in enumerate(urls, 1):
+                yield self.log(f"\n=== Processing Paper {i}/{len(urls)} ===")
+                yield self.log(f"URL: {url}")
+                # Step 1: Fetch Paper
+                yield self.log(f"Tool Call: fetch_paper({url})")
+                pdf_path = fetch_paper_from_url(url)
+                if not pdf_path:
+                    yield self.log(f"Warning: Failed to download paper {i}, skipping.")
+                    continue
+                yield self.log(f"Paper {i} downloaded successfully.")
+                # Step 2: Read PDF
+                yield self.log(f"Tool Call: read_pdf({pdf_path})")
+                text = extract_text_from_pdf(pdf_path)
+                if not text:
+                    yield self.log(
+                        f"Warning: Failed to extract text from paper {i}, skipping."
+                    )
+                    continue
+                text_length = len(text)
+                yield self.log(f"Extracted {text_length:,} characters from paper {i}.")
+                # Check context limit
+                if total_chars + text_length > MAX_CONTEXT_CHARS:
+                    yield self.log(f"⚠️ Context limit reached!")
+                    yield self.log(
+                        f"Current total: {total_chars:,} chars + Paper {i}: {text_length:,} chars = {total_chars + text_length:,} chars"
+                    )
+                    yield self.log(f"Maximum allowed: {MAX_CONTEXT_CHARS:,} chars")
+                    yield self.log(
+                        f"Stopping at {len(all_texts)} papers. Remaining papers will be skipped."
+                    )
+                    break
+                all_texts.append(f"=== PAPER {i} ===\n{text}\n")
+                source_refs.append(url)
+                total_chars += text_length
+                yield self.log(
+                    f"✓ Paper {i} added. Total context: {total_chars:,} chars ({(total_chars / MAX_CONTEXT_CHARS) * 100:.1f}% of limit)"
+                )
+        # Process PDFs
+        elif pdf_files:
+            yield self.log(f"Received {len(pdf_files)} PDF files to process.")
+            yield self.log(f"Context limit: {MAX_CONTEXT_CHARS:,} characters")
+            for i, pdf_file in enumerate(pdf_files, 1):
+                yield self.log(f"\n=== Processing PDF {i}/{len(pdf_files)} ===")
+                pdf_name = pdf_file.name if hasattr(pdf_file, "name") else f"file_{i}"
+                yield self.log(f"File: {pdf_name}")
+                pdf_path = pdf_file.name if hasattr(pdf_file, "name") else pdf_file
+                # Read PDF
+                yield self.log(f"Tool Call: read_pdf({pdf_path})")
+                text = extract_text_from_pdf(pdf_path)
+                if not text:
+                    yield self.log(
+                        f"Warning: Failed to extract text from PDF {i}, skipping."
+                    )
+                    continue
+                text_length = len(text)
+                yield self.log(f"Extracted {text_length:,} characters from PDF {i}.")
+                # Check context limit
+                if total_chars + text_length > MAX_CONTEXT_CHARS:
+                    yield self.log(f"⚠️ Context limit reached!")
+                    yield self.log(
+                        f"Current total: {total_chars:,} chars + PDF {i}: {text_length:,} chars = {total_chars + text_length:,} chars"
+                    )
+                    yield self.log(f"Maximum allowed: {MAX_CONTEXT_CHARS:,} chars")
+                    yield self.log(
+                        f"Stopping at {len(all_texts)} files. Remaining PDFs will be skipped."
+                    )
+                    break
+                all_texts.append(f"=== PAPER {i} ===\n{text}\n")
+                source_refs.append(f"Uploaded PDF {i}")
+                total_chars += text_length
+                yield self.log(
+                    f"✓ PDF {i} added. Total context: {total_chars:,} chars ({(total_chars / MAX_CONTEXT_CHARS) * 100:.1f}% of limit)"
+                )
+        if not all_texts:
+            yield self.log("Error: No papers were successfully processed.")
+            return None, "\n".join(self.logs)
+        # Combine all texts
+        yield self.log(f"\n✓ Successfully processed {len(all_texts)} papers")
+        yield self.log(
+            f"Total context: {total_chars:,} characters ({(total_chars / MAX_CONTEXT_CHARS) * 100:.1f}% of limit)"
+        )
+        yield self.log(
+            f"Thinking: Now I'll combine all papers into a comprehensive podcast script."
+        )
+        combined_text = "\n\n".join(all_texts)
+        # Step 3: Generate Comprehensive Script
+        yield self.log(
+            "\nThinking: Creating a comprehensive podcast script covering all papers."
+        )
+        if self.provider_mode == "demo":
+            yield self.log("Using Demo Inference")
+        elif self.provider_mode == "own_inference":
+            yield self.log(f"Using Own Inference: {self.own_base_url}")
+        else:
+            yield self.log(f"Using OpenAI ({self.openai_model or 'gpt-4o-mini'})")
+        yield self.log("Tool Call: generate_script(...)")
+        generator = get_generator(
+            provider_mode=self.provider_mode,
+            own_base_url=self.own_base_url,
+            own_api_key=self.own_api_key,
+            own_model=self.own_model,
+            openai_key=self.openai_key,
+            openai_model=self.openai_model,
+            max_tokens=self.max_tokens,
+        )
+        # Add instruction for multi-paper script
+        multi_paper_prompt = f"[MULTIPLE PAPERS - {len(all_texts)} papers total. Create a comprehensive podcast discussing all papers.]\n\n{combined_text}"
+        script = generator.generate_podcast_script(multi_paper_prompt)
+        if not script:
+            yield self.log("Error: Failed to generate script.")
+            return None, self.logs
+        yield self.log(
+            f"Generated comprehensive script with {len(script)} dialogue turns."
+        )
+        # Step 4: Synthesize Audio
+        yield self.log(
+            "\nThinking: The script looks good. Sending it to the TTS engine."
+        )
+        if self.tts_provider == "edge-tts":
+            yield self.log("Using Edge-TTS (Microsoft, free)")
+        elif self.tts_provider == "elevenlabs":
+            if self.elevenlabs_key:
+                yield self.log("Using custom ElevenLabs API key")
+            else:
+                yield self.log("Using demo ElevenLabs key")
+        yield self.log("Tool Call: synthesize_podcast(...)")
+        tts = get_tts_engine(
+            tts_provider=self.tts_provider,
+            custom_api_key=self.elevenlabs_key if self.tts_provider == "elevenlabs" else None,
+            host_voice=self.host_voice,
+            guest_voice=self.guest_voice
+        )
+        audio_path = tts.synthesize_dialogue(script)
+        if not audio_path:
+            yield self.log("Error: Failed to synthesize audio.")
+            return None, self.logs
+        yield self.log(f"Podcast generated successfully at: {audio_path}")
+        # Save to history
+        source_ref = f"Multiple papers: {', '.join(source_refs[:3])}{'...' if len(source_refs) > 3 else ''}"
+        save_to_history(source_ref, audio_path, len(script))
+        yield self.log("✓ Saved to history")
+        return audio_path, "\n".join(self.logs)

app.py ADDED Viewed

	@@ -0,0 +1,1203 @@

+import os
+from datetime import datetime
+import gradio as gr
+from agents.podcast_agent import PodcastAgent
+from synthesis.tts_engine import EDGE_TTS_VOICES, ELEVENLABS_VOICES
+from utils.config import (
+    DEMO_INFERENCE_KEY,
+    DEMO_INFERENCE_URL,
+    DEMO_MODE,
+    DEMO_MODEL,
+    DEMO_TTS_KEY,
+    OUTPUT_DIR,
+    SCRIPT_GENERATION_MODEL,
+)
+from utils.history import get_history_items, load_history
+# Ensure output directory exists
+os.makedirs(OUTPUT_DIR, exist_ok=True)
+def validate_settings_for_generation(
+    llm_choice, own_base_url, own_api_key, openai_key, tts_provider, elevenlabs_key
+):
+    """
+    Validate user settings for podcast generation in non-demo mode.
+    Returns:
+        tuple: (is_valid, error_message)
+    """
+    # Skip validation if in demo mode
+    if DEMO_MODE:
+        return True, ""
+    errors = []
+    # Validate LLM settings
+    if llm_choice == "Own Inference":
+        if not own_base_url:
+            errors.append("❌ **Own Inference**: Base URL is required")
+        elif not (
+            own_base_url.startswith("http://") or own_base_url.startswith("https://")
+        ):
+            errors.append(
+                "❌ **Own Inference**: Base URL must start with http:// or https://"
+            )
+    elif llm_choice == "OpenAI":
+        if not openai_key:
+            errors.append("❌ **OpenAI**: API key is required")
+        elif not openai_key.startswith("sk-"):
+            errors.append("❌ **OpenAI**: API key must start with 'sk-'")
+    # Validate TTS settings
+    if tts_provider == "elevenlabs":
+        if not elevenlabs_key:
+            errors.append("❌ **ElevenLabs**: API key is required")
+        elif not elevenlabs_key.startswith("sk_"):
+            errors.append("❌ **ElevenLabs**: API key must start with 'sk_'")
+    # Edge-TTS doesn't require any validation (it's free)
+    if errors:
+        return False, "\n".join(errors)
+    return True, ""
+def get_stats():
+    """Get statistics"""
+    history = load_history()
+    total = len(history)
+    return f"🚀 **Total Podcasts: {total}**"
+def generate_progress_indicator(current_step):
+    """Generate visual progress indicator"""
+    steps = [
+        {"name": "Fetching Paper", "icon": "📥"},
+        {"name": "Extracting Text", "icon": "📄"},
+        {"name": "Generating Script", "icon": "✍️"},
+        {"name": "Synthesizing Audio", "icon": "🎙️"},
+    ]
+    progress_html = "<div style='padding: 15px; background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); border-radius: 10px; margin: 10px 0;'>"
+    progress_html += "<div style='display: flex; justify-content: space-between; align-items: center;'>"
+    for i, step in enumerate(steps):
+        step_num = i + 1
+        if step_num < current_step:
+            # Completed step
+            status_color = "#4ade80"  # Green
+            icon = "✅"
+        elif step_num == current_step:
+            # Current step
+            status_color = "#fbbf24"  # Yellow
+            icon = "⏳"
+        else:
+            # Pending step
+            status_color = "#9ca3af"  # Gray
+            icon = "⏸️"
+        progress_html += f"""
+        <div style='text-align: center; flex: 1;'>
+            <div style='font-size: 2em; margin-bottom: 5px;'>{icon}</div>
+            <div style='color: white; font-weight: bold; font-size: 0.9em;'>{step['name']}</div>
+            <div style='color: {status_color}; font-size: 0.8em; margin-top: 3px;'>Step {step_num}/4</div>
+        </div>
+        """
+    progress_html += "</div></div>"
+    return progress_html
+def validated_generate_agent(
+    url,
+    pdf_file,
+    advanced_mode,
+    multi_urls,
+    multi_pdfs,
+    user_llm_choice,
+    user_own_base_url,
+    user_own_api_key,
+    user_own_model,
+    user_openai_key,
+    user_openai_model,
+    user_tts_provider,
+    user_elevenlabs_key,
+    user_host_voice,
+    user_guest_voice,
+    user_podcast_length,
+    progress=gr.Progress(),
+):
+    """Validate settings and run podcast generation"""
+    # Validate settings first
+    is_valid, error_message = validate_settings_for_generation(
+        user_llm_choice,
+        user_own_base_url,
+        user_own_api_key,
+        user_openai_key,
+        user_tts_provider,
+        user_elevenlabs_key,
+    )
+    if not is_valid:
+        # Yield error message
+        yield "", f"⚠️ **Settings Required**\n\n{error_message}\n\nPlease complete your settings in the Settings tab before generating a podcast."
+        return
+    # If valid, run the actual generation
+    yield from run_agent(
+        url,
+        pdf_file,
+        advanced_mode,
+        multi_urls,
+        multi_pdfs,
+        user_llm_choice,
+        user_own_base_url,
+        user_own_api_key,
+        user_own_model,
+        user_openai_key,
+        user_openai_model,
+        user_tts_provider,
+        user_elevenlabs_key,
+        user_host_voice,
+        user_guest_voice,
+        user_podcast_length,
+        progress,
+    )
+def run_agent(
+    url,
+    pdf_file,
+    advanced_mode,
+    multi_urls,
+    multi_pdfs,
+    user_llm_choice,
+    user_own_base_url,
+    user_own_api_key,
+    user_own_model,
+    user_openai_key,
+    user_openai_model,
+    user_tts_provider,
+    user_elevenlabs_key,
+    user_host_voice,
+    user_guest_voice,
+    user_podcast_length,
+    progress=gr.Progress(),
+):
+    """Run podcast generation with optional user settings"""
+    # Determine provider mode
+    if DEMO_MODE:
+        provider_mode = "demo"
+    elif user_llm_choice == "Own Inference":
+        provider_mode = "own_inference"
+    else:  # OpenAI
+        provider_mode = "openai"
+    agent = PodcastAgent(
+        provider_mode=provider_mode,
+        own_base_url=user_own_base_url if user_own_base_url else None,
+        own_api_key=user_own_api_key if user_own_api_key else None,
+        own_model=user_own_model if user_own_model else None,
+        openai_key=user_openai_key if user_openai_key else None,
+        openai_model=user_openai_model if user_openai_model else None,
+        tts_provider=user_tts_provider if user_tts_provider else "edge-tts",
+        elevenlabs_key=user_elevenlabs_key if user_elevenlabs_key else None,
+        host_voice=user_host_voice if user_host_voice else None,
+        guest_voice=user_guest_voice if user_guest_voice else None,
+        max_tokens=user_podcast_length if user_podcast_length else 4096,
+    )
+    logs_history = ""
+    # Log settings being used
+    settings_log = "Settings: "
+    if provider_mode == "demo":
+        settings_log += "LLM: Demo Inference | TTS: Edge-TTS (Microsoft) | "
+    elif provider_mode == "own_inference":
+        settings_log += f"LLM: Own Inference | "
+        if user_tts_provider == "edge-tts":
+            settings_log += "TTS: Edge-TTS (Microsoft) | "
+        elif user_elevenlabs_key:
+            settings_log += "TTS: Custom ElevenLabs | "
+        else:
+            settings_log += "TTS: ElevenLabs (no key provided) | "
+    else:  # openai
+        settings_log += f"LLM: OpenAI ({user_openai_model or 'gpt-4o-mini'}) | "
+        if user_tts_provider == "edge-tts":
+            settings_log += "TTS: Edge-TTS (Microsoft) | "
+        elif user_elevenlabs_key:
+            settings_log += "TTS: Custom ElevenLabs | "
+        else:
+            settings_log += "TTS: ElevenLabs (no key provided) | "
+    settings_log += (
+        f"Length: {user_podcast_length if user_podcast_length else 4096} tokens"
+    )
+    # Initial state
+    current_step = 0
+    yield "", f"Starting process...\n{settings_log}\n"
+    try:
+        # Advanced mode: multiple sources
+        if advanced_mode:
+            if multi_urls and multi_urls.strip():
+                # Multiple URLs
+                urls = [u.strip() for u in multi_urls.strip().split("\n") if u.strip()]
+                if not urls:
+                    raise gr.Error("Please provide at least one paper URL")
+                current_step = 1
+                yield generate_progress_indicator(current_step), f"Processing {len(urls)} papers from URLs...\n"
+                # Process multiple URLs
+                for log_entry in agent.process_multiple(urls=urls):
+                    if isinstance(log_entry, tuple):
+                        audio_path, final_logs = log_entry
+                        generate_transcript(audio_path, final_logs)
+                        current_step = 5  # Completed
+                        yield "", final_logs + f"\n\n✅ **Podcast Generated!**\n📁 Audio saved to: `{audio_path}`\n\n🎧 Check the **History** tab to listen."
+                    else:
+                        logs_history += log_entry + "\n"
+                        # Update step based on log content
+                        if "Extracted" in log_entry or "read_pdf" in log_entry:
+                            current_step = 2
+                        elif "generate_script" in log_entry or "Generated script" in log_entry:
+                            current_step = 3
+                        elif "synthesize_podcast" in log_entry or "Synthesizing" in log_entry:
+                            current_step = 4
+                        yield generate_progress_indicator(current_step), logs_history
+            elif multi_pdfs:
+                # Multiple PDFs
+                if not isinstance(multi_pdfs, list):
+                    multi_pdfs = [multi_pdfs]
+                current_step = 2  # Skip fetching for PDFs
+                yield generate_progress_indicator(current_step), f"Processing {len(multi_pdfs)} PDF files...\n"
+                # Process multiple PDFs
+                for log_entry in agent.process_multiple(pdf_files=multi_pdfs):
+                    if isinstance(log_entry, tuple):
+                        audio_path, final_logs = log_entry
+                        generate_transcript(audio_path, final_logs)
+                        current_step = 5  # Completed
+                        yield "", final_logs + f"\n\n✅ **Podcast Generated!**\n📁 Audio saved to: `{audio_path}`\n\n🎧 Check the **History** tab to listen."
+                    else:
+                        logs_history += log_entry + "\n"
+                        # Update step based on log content
+                        if "generate_script" in log_entry or "Generated script" in log_entry:
+                            current_step = 3
+                        elif "synthesize_podcast" in log_entry or "Synthesizing" in log_entry:
+                            current_step = 4
+                        yield generate_progress_indicator(current_step), logs_history
+            else:
+                raise gr.Error("Please provide multiple URLs or upload multiple PDFs")
+        # Simple mode: single source
+        else:
+            if not url and not pdf_file:
+                raise gr.Error("Please provide a paper URL or upload a PDF file")
+            # Determine starting step
+            if url:
+                current_step = 1  # Fetching
+            else:
+                current_step = 2  # Skip to extraction for uploaded PDF
+            for log_entry in agent.process(url=url if url else None, pdf_file=pdf_file):
+                if isinstance(log_entry, tuple):
+                    audio_path, final_logs = log_entry
+                    generate_transcript(audio_path, final_logs)
+                    current_step = 5  # Completed
+                    yield "", final_logs + f"\n\n✅ **Podcast Generated!**\n📁 Audio saved to: `{audio_path}`\n\n🎧 Check the **History** tab to listen."
+                else:
+                    logs_history += log_entry + "\n"
+                    # Update step based on log content
+                    if "fetch_paper" in log_entry or "downloaded" in log_entry:
+                        current_step = 1
+                    elif "Extracted" in log_entry or "read_pdf" in log_entry:
+                        current_step = 2
+                    elif "generate_script" in log_entry or "Generated script" in log_entry:
+                        current_step = 3
+                    elif "synthesize_podcast" in log_entry or "Synthesizing" in log_entry:
+                        current_step = 4
+                    yield generate_progress_indicator(current_step), logs_history
+    except Exception as e:
+        yield "", f"Error: {str(e)}"
+def generate_transcript(audio_path, logs):
+    """Generate transcript file"""
+    if not audio_path:
+        return None
+    base_name = os.path.splitext(os.path.basename(audio_path))[0]
+    transcript_path = os.path.join(OUTPUT_DIR, f"{base_name}_transcript.txt")
+    with open(transcript_path, "w") as f:
+        f.write("PAPERCAST TRANSCRIPT\n")
+        f.write("=" * 60 + "\n\n")
+        f.write(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n\n")
+        f.write(logs)
+    return transcript_path
+def get_history_data():
+    """Load history for dataframe"""
+    items = get_history_items()
+    if not items:
+        return []
+    data = []
+    for item in items:
+        data.append(
+            [
+                item.get("timestamp", "N/A"),
+                item.get("url", "Uploaded PDF") if item.get("url") else "Uploaded PDF",
+                item.get("audio_path", ""),
+            ]
+        )
+    return data
+def on_history_select(evt: gr.SelectData, data):
+    """Handle history table selection"""
+    try:
+        # data is the dataframe value. evt.index[0] is the row index
+        selected_row = data.iloc[evt.index[0]]
+        audio_path = selected_row.iloc[2]  # 3rd column is audio_path
+        if os.path.exists(audio_path):
+            return audio_path
+    except:
+        pass
+    return None
+def main():
+    theme = gr.themes.Soft(
+        primary_hue="indigo",
+        secondary_hue="blue",
+    )
+    with gr.Blocks(title="PaperCast", theme=theme) as demo:
+        # Session state for settings
+        if DEMO_MODE:
+            user_llm_choice = gr.State(value="demo")
+            user_own_base_url = gr.State(value=DEMO_INFERENCE_URL)
+            user_own_api_key = gr.State(value=DEMO_INFERENCE_KEY)
+            user_own_model = gr.State(value=DEMO_MODEL)
+            user_openai_key = gr.State(value="")
+            user_openai_model = gr.State(value="")
+            user_tts_provider = gr.State(value="edge-tts")
+            user_elevenlabs_key = gr.State(value="")
+            user_host_voice = gr.State(value="en-US-GuyNeural")
+            user_guest_voice = gr.State(value="en-US-JennyNeural")
+        else:
+            user_llm_choice = gr.State(value="Own Inference")
+            user_own_base_url = gr.State(value="")
+            user_own_api_key = gr.State(value="")
+            user_own_model = gr.State(value="")
+            user_openai_key = gr.State(value="")
+            user_openai_model = gr.State(value="")
+            user_tts_provider = gr.State(value="edge-tts")
+            user_elevenlabs_key = gr.State(value="")
+            user_host_voice = gr.State(value="en-US-GuyNeural")
+            user_guest_voice = gr.State(value="en-US-JennyNeural")
+        user_podcast_length = gr.State(value=4096)
+        settings_valid = gr.State(value=DEMO_MODE)  # Settings are valid in demo mode
+        # Initialize generate button state based on demo mode
+        generate_btn_state = gr.State(value=DEMO_MODE)
+        with gr.Row():
+            gr.HTML("""
+            <div style='text-align: center; padding: 35px 20px 25px 20px;'>
+                <h1 style='font-size: 3.5em; margin-bottom: 5px; background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); -webkit-background-clip: text; -webkit-text-fill-color: transparent; background-clip: text; font-weight: bold;'>
+                    🎙️ PaperCast
+                </h1>
+                <p style='font-size: 1.5em; color: #444; margin-top: 12px; margin-bottom: 8px; font-weight: 400; line-height: 1.6;'>
+                    Transform complex research papers into engaging podcast-style conversations
+                </p>
+                <p style='font-size: 1.1em; color: #888; margin-top: 0; font-weight: 300; font-style: italic;'>
+                    AI-powered audio that makes science accessible, enjoyable, and easy to understand
+                </p>
+            </div>
+            """)
+        with gr.Tabs():
+            # ========== CREATE TAB ==========
+            with gr.Tab("🎙️ Generate Podcast"):
+                # Supported Platforms Banner (only in Create tab)
+                with gr.Row():
+                    gr.HTML("""
+                    <div style='text-align: center; padding: 20px; background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); border-radius: 12px; margin-bottom: 20px;'>
+                        <h3 style='color: white; margin-bottom: 15px;'>✨ Supported Platforms</h3>
+                        <div style='display: flex; justify-content: center; gap: 20px; flex-wrap: wrap;'>
+                            <div style='background: rgba(255,255,255,0.95); padding: 15px 25px; border-radius: 10px; min-width: 200px;'>
+                                <div style='font-size: 2em; margin-bottom: 5px;'>📄</div>
+                                <strong style='color: #667eea; font-size: 1.1em;'>arXiv</strong>
+                                <p style='margin: 5px 0 0 0; font-size: 0.9em; color: #666;'>Physics, CS, AI & Math<br/>2M+ papers</p>
+                            </div>
+                            <div style='background: rgba(255,255,255,0.95); padding: 15px 25px; border-radius: 10px; min-width: 200px;'>
+                                <div style='font-size: 2em; margin-bottom: 5px;'>🏥</div>
+                                <strong style='color: #667eea; font-size: 1.1em;'>medRxiv</strong>
+                                <p style='margin: 5px 0 0 0; font-size: 0.9em; color: #666;'>Medical & Health Sciences<br/>Latest research</p>
+                            </div>
+                            <div style='background: rgba(255,255,255,0.95); padding: 15px 25px; border-radius: 10px; min-width: 200px;'>
+                                <div style='font-size: 2em; margin-bottom: 5px;'>📎</div>
+                                <strong style='color: #667eea; font-size: 1.1em;'>Any PDF</strong>
+                                <p style='margin: 5px 0 0 0; font-size: 0.9em; color: #666;'>Direct Upload<br/>Any research paper</p>
+                            </div>
+                        </div>
+                    </div>
+                    """)
+                # Example Papers Section
+                with gr.Row():
+                    with gr.Column():
+                        gr.Markdown("### 📚 Example Papers")
+                        gr.Markdown("*Click any example to auto-fill the URL field and try it out!*")
+                        with gr.Row():
+                            with gr.Column(scale=1):
+                                example_btn1 = gr.Button(
+                                    "🤖 Attention Is All You Need\n\nThe foundational Transformer paper",
+                                    size="sm",
+                                    variant="secondary"
+                                )
+                            with gr.Column(scale=1):
+                                example_btn2 = gr.Button(
+                                    "🧠 GPT-4 Technical Report\n\nOpenAI's GPT-4 capabilities",
+                                    size="sm",
+                                    variant="secondary"
+                                )
+                        with gr.Row():
+                            with gr.Column(scale=1):
+                                example_btn3 = gr.Button(
+                                    "🎨 DALL-E 2 Image Generation\n\nDiffusion models for images",
+                                    size="sm",
+                                    variant="secondary"
+                                )
+                            with gr.Column(scale=1):
+                                example_btn4 = gr.Button(
+                                    "🔬 Deep Residual Learning\n\nRevolutionary ResNet architecture",
+                                    size="sm",
+                                    variant="secondary"
+                                )
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        gr.Markdown("### 1. Source Material")
+                        # Simple Mode Inputs (default)
+                        with gr.Group(visible=True) as simple_inputs:
+                            simple_source_type = gr.Radio(
+                                choices=["Paper URL", "Upload PDF"],
+                                value="Paper URL",
+                                label="Choose Source Type",
+                            )
+                            url_input = gr.Textbox(
+                                label="Paper URL (arXiv, medRxiv)",
+                                placeholder="https://arxiv.org/abs/...",
+                                visible=True,
+                            )
+                            pdf_upload = gr.File(
+                                label="Upload PDF", file_types=[".pdf"], visible=False
+                            )
+                        # Advanced Mode Inputs (hidden by default)
+                        with gr.Group(visible=False) as advanced_inputs:
+                            source_type = gr.Radio(
+                                choices=["Multiple URLs", "Multiple PDFs"],
+                                value="Multiple URLs",
+                                label="Choose Source Type",
+                            )
+                            multi_url_input = gr.Textbox(
+                                label="Paper URLs (one per line)",
+                                placeholder="https://arxiv.org/abs/2301.12345\nhttps://arxiv.org/abs/2302.67890\nhttps://www.medrxiv.org/content/...",
+                                lines=5,
+                                visible=True,
+                            )
+                            multi_pdf_upload = gr.File(
+                                label="Upload Multiple PDFs",
+                                file_types=[".pdf"],
+                                file_count="multiple",
+                                visible=False,
+                            )
+                        generate_btn = gr.Button(
+                            "Generate Podcast",
+                            variant="primary",
+                            size="lg",
+                        )
+                        # Advanced Mode Toggle (below button)
+                        advanced_mode = gr.Checkbox(
+                            label="🚀 Advanced Mode (Multiple Papers)",
+                            value=False,
+                            info="Enable to process multiple papers at once",
+                        )
+                        # Warning message for advanced mode
+                        advanced_warning = gr.Markdown(
+                            """
+> ⚠️ **Experimental Feature Warning**
+>
+> This method processes multiple papers and uses extensive context.
+> Accuracy cannot be guaranteed and results may be inconsistent.
+> **Not recommended for production use.**
+""",
+                            visible=False,
+                            elem_id="advanced-warning",
+                        )
+                        # Toggle visibility based on advanced mode
+                        def toggle_mode(is_advanced):
+                            return {
+                                simple_inputs: gr.update(visible=not is_advanced),
+                                advanced_inputs: gr.update(visible=is_advanced),
+                                advanced_warning: gr.update(visible=is_advanced),
+                            }
+                        advanced_mode.change(
+                            fn=toggle_mode,
+                            inputs=[advanced_mode],
+                            outputs=[simple_inputs, advanced_inputs, advanced_warning],
+                        )
+                        # Toggle between URL and PDF in simple mode
+                        def toggle_simple_source(source):
+                            if source == "Paper URL":
+                                return gr.update(visible=True), gr.update(visible=False)
+                            else:
+                                return gr.update(visible=False), gr.update(visible=True)
+                        simple_source_type.change(
+                            fn=toggle_simple_source,
+                            inputs=[simple_source_type],
+                            outputs=[url_input, pdf_upload],
+                        )
+                        # Toggle between URLs and PDFs in advanced mode
+                        def toggle_advanced_source(source):
+                            if source == "Multiple URLs":
+                                return gr.update(visible=True), gr.update(visible=False)
+                            else:
+                                return gr.update(visible=False), gr.update(visible=True)
+                        source_type.change(
+                            fn=toggle_advanced_source,
+                            inputs=[source_type],
+                            outputs=[multi_url_input, multi_pdf_upload],
+                        )
+                        # Example paper button handlers
+                        example_btn1.click(
+                            fn=lambda: "https://arxiv.org/abs/1706.03762",
+                            outputs=[url_input],
+                        )
+                        example_btn2.click(
+                            fn=lambda: "https://arxiv.org/abs/2303.08774",
+                            outputs=[url_input],
+                        )
+                        example_btn3.click(
+                            fn=lambda: "https://arxiv.org/abs/2204.06125",
+                            outputs=[url_input],
+                        )
+                        example_btn4.click(
+                            fn=lambda: "https://arxiv.org/abs/1512.03385",
+                            outputs=[url_input],
+                        )
+                    with gr.Column(scale=1):
+                        gr.Markdown("### 2. Status & Output")
+                        # Progress Indicator
+                        progress_status = gr.Markdown(
+                            value="",
+                            label="Progress",
+                            visible=False,
+                        )
+                        status_output = gr.Code(
+                            label="Process Log",
+                            language="markdown",
+                            interactive=False,
+                            lines=15,
+                        )
+                generate_btn.click(
+                    fn=validated_generate_agent,
+                    inputs=[
+                        url_input,
+                        pdf_upload,
+                        advanced_mode,
+                        multi_url_input,
+                        multi_pdf_upload,
+                        user_llm_choice,
+                        user_own_base_url,
+                        user_own_api_key,
+                        user_own_model,
+                        user_openai_key,
+                        user_openai_model,
+                        user_tts_provider,
+                        user_elevenlabs_key,
+                        user_host_voice,
+                        user_guest_voice,
+                        user_podcast_length,
+                    ],
+                    outputs=[progress_status, status_output],
+                )
+            # ========== HISTORY TAB ==========
+            with gr.Tab("📚 History"):
+                gr.Markdown("### Past Podcasts")
+                with gr.Row():
+                    refresh_btn = gr.Button("Refresh History", size="sm")
+                history_table = gr.Dataframe(
+                    headers=["Date", "Source", "Audio Path"],
+                    datatype=["str", "str", "str"],
+                    value=get_history_data(),
+                    interactive=False,
+                    label="Click a row to play",
+                )
+                history_player = gr.Audio(label="Playback", type="filepath")
+                def refresh_history():
+                    return get_history_data()
+                refresh_btn.click(fn=refresh_history, outputs=[history_table])
+                # Handle selection
+                history_table.select(
+                    fn=on_history_select,
+                    inputs=[history_table],
+                    outputs=[history_player],
+                )
+            # ========== TRANSCRIPTS TAB ==========
+            with gr.Tab("📝 Transcripts"):
+                gr.Markdown("### Transcript Viewer")
+                gr.Markdown(
+                    "*Coming soon: View and download transcripts from history.*"
+                )
+            # ========== SETTINGS TAB ==========
+            with gr.Tab("⚙️ Settings"):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        pass
+                    with gr.Column(scale=3):
+                        gr.Markdown("""
+<div style="text-align: center;">
+# ⚙️ Settings
+Configure your PaperCast experience with your own API keys and preferences.
+---
+</div>
+""")
+                        gr.Markdown("## 🤖 LLM Selection")
+                        gr.Markdown(
+                            "Choose which language model provider to use for script generation."
+                        )
+                        with gr.Group():
+                            if DEMO_MODE:
+                                gr.Markdown(
+                                    "**🔧 Demo Mode Active** - Using built-in inference and TTS services"
+                                )
+                            else:
+                                llm_choice = gr.Radio(
+                                    choices=[
+                                        "Own Inference",
+                                        "OpenAI",
+                                    ],
+                                    value="Own Inference",
+                                    label="Language Model Provider",
+                                    info="Choose your language model provider for script generation",
+                                )
+                            # Own Inference inputs (base URL + API key)
+                            own_inference_base_url = gr.Textbox(
+                                label="Base URL",
+                                placeholder="https://your-server.com/v1",
+                                info="OpenAI-compatible endpoint",
+                                visible=not DEMO_MODE,
+                            )
+                            own_inference_api_key = gr.Textbox(
+                                label="API Key",
+                                placeholder="Optional - leave empty if not required",
+                                type="password",
+                                info="API key for your inference server (if required)",
+                                visible=not DEMO_MODE,
+                            )
+                            own_inference_model = gr.Textbox(
+                                label="Model Name",
+                                placeholder="e.g., llama-3.1-8b, mistral-7b",
+                                info="Model name on your server",
+                                visible=not DEMO_MODE,
+                            )
+                            # OpenAI inputs
+                            openai_key_input = gr.Textbox(
+                                label="OpenAI API Key",
+                                placeholder="sk-...",
+                                type="password",
+                                info="Required when using OpenAI",
+                                visible=False,  # Hidden by default, shown only when OpenAI is selected
+                            )
+                            openai_model_input = gr.Textbox(
+                                label="OpenAI Model Name",
+                                placeholder="gpt-4o-mini",
+                                value="gpt-4o-mini",
+                                info="Model name (e.g., gpt-4o-mini, gpt-4, gpt-3.5-turbo)",
+                                visible=False,  # Hidden by default, shown only when OpenAI is selected
+                            )
+                        gr.Markdown("---")
+                        gr.Markdown("## 🔊 Text-to-Speech (TTS)")
+                        if DEMO_MODE:
+                            gr.Markdown(
+                                "**🔧 Demo Mode Active** - Using Edge-TTS (Microsoft, free)"
+                            )
+                        else:
+                            gr.Markdown(
+                                "Choose your TTS provider for audio generation"
+                            )
+                        with gr.Group():
+                            tts_provider_choice = gr.Radio(
+                                choices=[
+                                    "Edge-TTS (Free, Microsoft)",
+                                    "ElevenLabs (Paid, Better Quality)",
+                                ],
+                                value="Edge-TTS (Free, Microsoft)",
+                                label="TTS Provider",
+                                info="Edge-TTS is free and works without API key. ElevenLabs offers better voice quality.",
+                                visible=not DEMO_MODE,
+                            )
+                            elevenlabs_key_input = gr.Textbox(
+                                label="ElevenLabs API Key",
+                                placeholder="sk_... (required for ElevenLabs)",
+                                type="password",
+                                info="Get your key at: elevenlabs.io",
+                                visible=False,  # Hidden by default since Edge-TTS is default
+                            )
+                        gr.Markdown("### 🎭 Voice Selection")
+                        if DEMO_MODE:
+                            gr.Markdown("*Choose voices for your podcast (Demo mode uses Edge-TTS)*")
+                        # Edge-TTS voice selections
+                        with gr.Group(visible=True if DEMO_MODE else not DEMO_MODE) as edge_voice_group:
+                            edge_host_voice = gr.Dropdown(
+                                choices=list(EDGE_TTS_VOICES.keys()),
+                                value="Guy (US Male - Casual)",
+                                label="Host Voice (Edge-TTS)",
+                                info="Select voice for the podcast host",
+                            )
+                            edge_guest_voice = gr.Dropdown(
+                                choices=list(EDGE_TTS_VOICES.keys()),
+                                value="Jenny (US Female - Friendly)",
+                                label="Guest Voice (Edge-TTS)",
+                                info="Select voice for the expert guest",
+                            )
+                        # ElevenLabs voice selections (hidden by default, hidden in demo mode)
+                        if not DEMO_MODE:
+                            with gr.Group(visible=False) as elevenlabs_voice_group:
+                                elevenlabs_host_voice = gr.Dropdown(
+                                    choices=list(ELEVENLABS_VOICES.keys()),
+                                    value="Antoni (Male - Well-rounded)",
+                                    label="Host Voice (ElevenLabs)",
+                                    info="Select voice for the podcast host",
+                                )
+                                elevenlabs_guest_voice = gr.Dropdown(
+                                    choices=list(ELEVENLABS_VOICES.keys()),
+                                    value="Bella (Female - Soft)",
+                                    label="Guest Voice (ElevenLabs)",
+                                    info="Select voice for the expert guest",
+                                )
+                        else:
+                            # Create dummy components for demo mode so we can reference them
+                            elevenlabs_voice_group = None
+                            elevenlabs_host_voice = gr.State(value="Antoni (Male - Well-rounded)")
+                            elevenlabs_guest_voice = gr.State(value="Bella (Female - Soft)")
+                        # Toggle visibility based on LLM choice (only when not in demo mode)
+                        if not DEMO_MODE:
+                            def toggle_llm_inputs(choice):
+                                if choice == "Own Inference":
+                                    return {
+                                        own_inference_base_url: gr.update(visible=True),
+                                        own_inference_api_key: gr.update(visible=True),
+                                        own_inference_model: gr.update(visible=True),
+                                        openai_key_input: gr.update(visible=False),
+                                        openai_model_input: gr.update(visible=False),
+                                    }
+                                elif choice == "OpenAI":
+                                    return {
+                                        own_inference_base_url: gr.update(
+                                            visible=False
+                                        ),
+                                        own_inference_api_key: gr.update(visible=False),
+                                        own_inference_model: gr.update(visible=False),
+                                        openai_key_input: gr.update(visible=True),
+                                        openai_model_input: gr.update(visible=True),
+                                    }
+                            llm_choice.change(
+                                fn=toggle_llm_inputs,
+                                inputs=[llm_choice],
+                                outputs=[
+                                    own_inference_base_url,
+                                    own_inference_api_key,
+                                    own_inference_model,
+                                    openai_key_input,
+                                    openai_model_input,
+                                ],
+                            )
+                            # Toggle visibility based on TTS provider choice
+                            def toggle_tts_inputs(choice):
+                                if choice == "Edge-TTS (Free, Microsoft)":
+                                    return {
+                                        elevenlabs_key_input: gr.update(visible=False),
+                                        edge_voice_group: gr.update(visible=True),
+                                        elevenlabs_voice_group: gr.update(visible=False),
+                                    }
+                                else:  # ElevenLabs
+                                    return {
+                                        elevenlabs_key_input: gr.update(visible=True),
+                                        edge_voice_group: gr.update(visible=False),
+                                        elevenlabs_voice_group: gr.update(visible=True),
+                                    }
+                            tts_provider_choice.change(
+                                fn=toggle_tts_inputs,
+                                inputs=[tts_provider_choice],
+                                outputs=[elevenlabs_key_input, edge_voice_group, elevenlabs_voice_group],
+                            )
+                        gr.Markdown("---")
+                        gr.Markdown("## 🎚️ Podcast Settings")
+                        with gr.Group():
+                            podcast_length = gr.Slider(
+                                minimum=1000,
+                                maximum=8000,
+                                value=4096,
+                                step=500,
+                                label="Podcast Length (Max Tokens)",
+                                info="Higher values = longer podcasts",
+                            )
+                        gr.Markdown("---")
+                        save_settings_btn = gr.Button(
+                            "💾 Save Settings", variant="primary", size="lg"
+                        )
+                        settings_status = gr.Markdown("")
+                        def save_settings(
+                            llm_choice,
+                            own_base_url,
+                            own_api_key,
+                            own_model,
+                            openai_key,
+                            openai_model,
+                            tts_provider,
+                            elevenlabs_key,
+                            edge_host,
+                            edge_guest,
+                            elevenlabs_host,
+                            elevenlabs_guest,
+                            length,
+                        ):
+                            status = "✅ **Settings Saved!**\n\n"
+                            # Convert TTS provider choice to internal format
+                            if tts_provider == "Edge-TTS (Free, Microsoft)":
+                                tts_provider_internal = "edge-tts"
+                            else:
+                                tts_provider_internal = "elevenlabs"
+                            # Validate settings first (only in non-demo mode)
+                            is_valid, validation_message = (
+                                validate_settings_for_generation(
+                                    llm_choice,
+                                    own_base_url,
+                                    own_api_key,
+                                    openai_key,
+                                    tts_provider_internal,
+                                    elevenlabs_key,
+                                )
+                            )
+                            # LLM Settings
+                            if DEMO_MODE:
+                                status += "- LLM: Demo Inference ✓\n"
+                            elif llm_choice == "Own Inference":
+                                if own_base_url:
+                                    status += f"- LLM: Own Inference ✓\n"
+                                    status += f"  - URL: {own_base_url[:50]}...\n"
+                                    status += f"  - Model: {own_model or 'Default'}\n"
+                                else:
+                                    status += "- ⚠️ LLM: Own Inference selected but no base URL provided\n"
+                            elif llm_choice == "OpenAI":
+                                if openai_key:
+                                    status += f"- LLM: OpenAI ({openai_model or 'gpt-4o-mini'}) ✓\n"
+                                else:
+                                    status += "- ⚠️ LLM: OpenAI selected but no API key provided\n"
+                            # TTS Settings
+                            if DEMO_MODE:
+                                status += "- TTS: Edge-TTS (Microsoft, free) ✓\n"
+                            else:
+                                if tts_provider_internal == "edge-tts":
+                                    status += "- TTS: Edge-TTS (Microsoft, free) ✓\n"
+                                elif elevenlabs_key:
+                                    status += "- TTS: ElevenLabs (Custom key) ✓\n"
+                                else:
+                                    status += "- ⚠️ TTS: ElevenLabs key required\n"
+                            # Add validation result
+                            if not DEMO_MODE:
+                                if is_valid:
+                                    status += "\n✅ **All settings are valid!**\n"
+                                    status += "🎉 Generate button is now enabled.\n"
+                                else:
+                                    status += "\n⚠️ **Settings incomplete!**\n"
+                                    status += "🚫 Generate button remains disabled.\n"
+                                    status += f"\nRequired fixes:\n{validation_message}"
+                            status += f"\n- Podcast Length: {int(length)} tokens\n"
+                            status += (
+                                "\n*Settings will be used for next podcast generation.*"
+                            )
+                            # Determine which voices to use based on TTS provider
+                            if tts_provider_internal == "edge-tts":
+                                host_voice = EDGE_TTS_VOICES.get(edge_host, "en-US-GuyNeural")
+                                guest_voice = EDGE_TTS_VOICES.get(edge_guest, "en-US-JennyNeural")
+                            else:  # elevenlabs
+                                host_voice = ELEVENLABS_VOICES.get(elevenlabs_host, "ErXwobaYiN019PkySvjV")
+                                guest_voice = ELEVENLABS_VOICES.get(elevenlabs_guest, "EXAVITQu4vr4xnSDxMaL")
+                            return (
+                                status,
+                                llm_choice if not DEMO_MODE else "demo",
+                                own_base_url if not DEMO_MODE else DEMO_INFERENCE_URL,
+                                own_api_key if not DEMO_MODE else DEMO_INFERENCE_KEY,
+                                own_model if not DEMO_MODE else DEMO_MODEL,
+                                openai_key,
+                                openai_model,
+                                tts_provider_internal if not DEMO_MODE else "edge-tts",
+                                elevenlabs_key if not DEMO_MODE else "",
+                                host_voice if not DEMO_MODE else "en-US-GuyNeural",
+                                guest_voice if not DEMO_MODE else "en-US-JennyNeural",
+                                int(length),
+                                is_valid,
+                            )
+                        if DEMO_MODE:
+                            # In demo mode, settings are pre-configured but voices can be customized
+                            def save_demo_settings(edge_host, edge_guest, length):
+                                host_voice = EDGE_TTS_VOICES.get(edge_host, "en-US-GuyNeural")
+                                guest_voice = EDGE_TTS_VOICES.get(edge_guest, "en-US-JennyNeural")
+                                return (
+                                    f"✅ **Settings Saved!**\n\n- LLM: Demo Inference ✓\n- TTS: Edge-TTS (Microsoft, free) ✓\n- Host Voice: {edge_host}\n- Guest Voice: {edge_guest}\n\n*Demo mode is active with built-in services.*",
+                                    "demo",
+                                    DEMO_INFERENCE_URL,
+                                    DEMO_INFERENCE_KEY,
+                                    DEMO_MODEL,
+                                    "",
+                                    "",
+                                    "edge-tts",
+                                    "",
+                                    host_voice,
+                                    guest_voice,
+                                    int(length),
+                                    True,  # settings_valid = True in demo mode
+                                )
+                            save_settings_btn.click(
+                                fn=save_demo_settings,
+                                inputs=[edge_host_voice, edge_guest_voice, podcast_length],
+                                outputs=[
+                                    settings_status,
+                                    user_llm_choice,
+                                    user_own_base_url,
+                                    user_own_api_key,
+                                    user_own_model,
+                                    user_openai_key,
+                                    user_openai_model,
+                                    user_tts_provider,
+                                    user_elevenlabs_key,
+                                    user_host_voice,
+                                    user_guest_voice,
+                                    user_podcast_length,
+                                    settings_valid,
+                                ],
+                            )
+                        else:
+                            save_settings_btn.click(
+                                fn=save_settings,
+                                inputs=[
+                                    llm_choice,
+                                    own_inference_base_url,
+                                    own_inference_api_key,
+                                    own_inference_model,
+                                    openai_key_input,
+                                    openai_model_input,
+                                    tts_provider_choice,
+                                    elevenlabs_key_input,
+                                    edge_host_voice,
+                                    edge_guest_voice,
+                                    elevenlabs_host_voice,
+                                    elevenlabs_guest_voice,
+                                    podcast_length,
+                                ],
+                                outputs=[
+                                    settings_status,
+                                    user_llm_choice,
+                                    user_own_base_url,
+                                    user_own_api_key,
+                                    user_own_model,
+                                    user_openai_key,
+                                    user_openai_model,
+                                    user_tts_provider,
+                                    user_elevenlabs_key,
+                                    user_host_voice,
+                                    user_guest_voice,
+                                    user_podcast_length,
+                                    settings_valid,
+                                ],
+                            )
+                    with gr.Column(scale=1):
+                        pass
+            # ========== ABOUT TAB ==========
+            with gr.Tab("ℹ️ About"):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        pass
+                    with gr.Column(scale=3):
+                        gr.Markdown(f"""
+<div style="text-align: center;">
+# About PaperCast
+**PaperCast** is an AI-powered application that transforms complex research papers into engaging, accessible audio podcasts.
+Making scientific knowledge more accessible, one paper at a time.
+---
+## 🎯 How It Works
+Our intelligent agent orchestrates a multi-step pipeline to create your podcast:
+1. **📥 Input** - Provide a paper URL (arXiv, medRxiv) or upload any PDF
+2. **📄 Extraction** - AI extracts and analyzes the paper content
+3. **🎬 Script Generation** - Creates natural dialogue between Host and Expert personas
+4. **🎤 Voice Synthesis** - Generates high-quality audio with distinct voices
+5. **✅ Delivery** - Your podcast is ready to listen and download
+---
+## 🌟 Key Features
+**Multiple Sources**: Support for arXiv, medRxiv, and direct PDF uploads
+**Natural Dialogue**: Engaging conversation between Host and Expert characters
+**High-Quality Audio**: Professional voice synthesis powered by ElevenLabs
+**Smart Processing**: AI understands paper structure and creates contextual discussions
+**History Tracking**: Keep track of all your generated podcasts
+---
+## 🔧 Technology Stack
+**LLM**: {SCRIPT_GENERATION_MODEL}
+**TTS**: Edge-TTS (Microsoft, Free) / ElevenLabs API (Optional)
+**Infrastructure**: ☁️ Remote Inference
+**Framework**: Gradio 6
+**PDF Processing**: PyMuPDF
+---
+## 🎓 Built For
+**MCP 1st Birthday Hackathon** - Track 2: MCP in Action (Consumer)
+This project demonstrates autonomous agent capabilities through intelligent orchestration
+of multiple AI tools to transform static research papers into dynamic audio content.
+---
+## 📝 About the Agent
+PaperCast uses an autonomous agent that:
+**Plans** conversation flow based on paper structure
+**Reasons** about which concepts need simplification
+**Executes** multi-step processing pipeline
+**Adapts** dialogue based on paper complexity
+---
+## 💡 Use Cases
+🎧 Listen to papers during commute or exercise
+📚 Quick overview of research before deep reading
+🌍 Make research accessible to broader audiences
+🔬 Stay updated with latest papers in your field
+---
+Made with ❤️ using AI, Gradio, and ElevenLabs
+</div>
+""")
+                    with gr.Column(scale=1):
+                        pass
+        # Update generate button state when settings_valid changes
+        def update_generate_button_validity(settings_valid):
+            return gr.update(interactive=settings_valid)
+        settings_valid.change(
+            fn=update_generate_button_validity,
+            inputs=[settings_valid],
+            outputs=[generate_btn],
+        )
+    demo.launch(server_name="0.0.0.0", share=True)
+if __name__ == "__main__":
+    main()

generation/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """Script and dialogue generation for PaperCast"""

generation/script_generator.py ADDED Viewed

	@@ -0,0 +1,236 @@

+import base64
+import json
+import httpx
+from openai import OpenAI
+from utils.config import (
+    DEMO_INFERENCE_KEY,
+    DEMO_INFERENCE_URL,
+    DEMO_MODE,
+    DEMO_MODEL,
+    MAX_TOKENS,
+    SCRIPT_GENERATION_MODEL,
+    TEMPERATURE,
+)
+class ScriptGenerator:
+    def __init__(
+        self,
+        provider_mode="demo",
+        own_base_url=None,
+        own_api_key=None,
+        own_model=None,
+        openai_key=None,
+        openai_model=None,
+        max_tokens=None,
+    ):
+        """
+        Initialize ScriptGenerator with flexible provider support.
+        Args:
+            provider_mode: "demo", "own_inference", or "openai"
+            own_base_url: Base URL for own inference server
+            own_api_key: API key for own inference server
+            own_model: Model name for own inference server
+            openai_key: OpenAI API key
+            openai_model: OpenAI model name (e.g., "gpt-4o-mini", "gpt-4", "gpt-3.5-turbo")
+            max_tokens: Maximum tokens for generation
+        """
+        self.provider_mode = provider_mode
+        self.max_tokens = max_tokens or MAX_TOKENS
+        if provider_mode == "demo":
+            # Demo mode - use hardcoded credentials
+            print(f"Using Demo Inference: {DEMO_INFERENCE_URL}")
+            username, password = DEMO_INFERENCE_KEY.split(":", 1)
+            http_client = httpx.Client(auth=(username, password))
+            self.client = OpenAI(
+                base_url=DEMO_INFERENCE_URL,
+                api_key="dummy",
+                http_client=http_client,
+            )
+            self.model_name = DEMO_MODEL
+            print("✓ Demo inference client initialized")
+        elif provider_mode == "own_inference":
+            # Own inference server
+            print(f"Connecting to own inference API: {own_base_url}")
+            if own_api_key:
+                # If API key is provided, check if it's in "username:password" format
+                if ":" in own_api_key:
+                    username, password = own_api_key.split(":", 1)
+                    http_client = httpx.Client(auth=(username, password))
+                    self.client = OpenAI(
+                        base_url=own_base_url,
+                        api_key="dummy",
+                        http_client=http_client,
+                    )
+                else:
+                    # Regular API key
+                    self.client = OpenAI(
+                        base_url=own_base_url,
+                        api_key=own_api_key,
+                    )
+            else:
+                # No API key - some servers don't require it
+                self.client = OpenAI(
+                    base_url=own_base_url,
+                    api_key="dummy",
+                )
+            self.model_name = own_model or "default"
+            print(f"✓ Own inference client initialized (model: {self.model_name})")
+        elif provider_mode == "openai":
+            # OpenAI
+            print(f"Using OpenAI: {openai_model or 'gpt-4o-mini'}")
+            self.client = OpenAI(api_key=openai_key)
+            self.model_name = openai_model or "gpt-4o-mini"
+            print("✓ OpenAI client initialized")
+        else:
+            raise ValueError(f"Invalid provider_mode: {provider_mode}")
+    def generate_podcast_script(self, paper_text: str) -> list:
+        """
+        Generates a podcast script from the given paper text.
+        Args:
+            paper_text (str): The text content of the research paper.
+        Returns:
+            list: A list of dictionaries representing the dialogue.
+        """
+        system_prompt = """You are an expert podcast producer. Your goal is to convert technical research papers into engaging, accessible podcast dialogues between two hosts:
+- Host (Alex): Enthusiastic, asks clarifying questions, guides the conversation.
+- Guest (Jamie): Expert researcher, explains concepts simply but accurately.
+CRITICAL RULES:
+1. The Host MUST ALWAYS start with "Welcome to PaperCast!" - This is the show's branding and must never be skipped.
+2. NEVER read URLs, links, or web addresses out loud in the dialogue. Skip them completely. They sound awkward in audio format.
+3. NEVER mention arxiv IDs, DOIs, or reference numbers. Focus on the content, not the metadata.
+Output the script in a valid JSON format. The JSON should be a list of objects, where each object has:
+- "speaker": "Host" or "Guest"
+- "text": The dialogue text.
+- "emotion": An emotion tag supported by the TTS engine (e.g., "excited", "neutral", "thoughtful", "happy").
+Example:
+[
+    {"speaker": "Host", "text": "Welcome to PaperCast! Today we're diving into something really cool.", "emotion": "excited"},
+    {"speaker": "Guest", "text": "That's right, Alex. We're looking at a new way to handle large language models.", "emotion": "happy"}
+]
+Keep the conversation natural. Use fillers like "Um", "So", "You know" sparingly but effectively.
+"""
+        user_prompt = f"Here is the research paper text. Generate a podcast script summarizing the key findings, methodology, and implications.\n\n{paper_text[:10000]}..."
+        messages = [
+            {"role": "system", "content": system_prompt},
+            {"role": "user", "content": user_prompt},
+        ]
+        print(
+            f"Generating script with {self.provider_mode} (model: {self.model_name})..."
+        )
+        # Call LLM
+        response = self.client.chat.completions.create(
+            model=self.model_name,
+            messages=messages,
+            max_tokens=self.max_tokens,
+            temperature=TEMPERATURE,
+        )
+        generated_text = response.choices[0].message.content
+        # Extract JSON from the response
+        try:
+            # Find the first '[' and last ']'
+            start_index = generated_text.find("[")
+            end_index = generated_text.rfind("]") + 1
+            if start_index != -1 and end_index != -1:
+                json_str = generated_text[start_index:end_index]
+                script = json.loads(json_str)
+                return script
+            else:
+                print("No JSON found in output.")
+                return []
+        except json.JSONDecodeError as e:
+            print(f"Error parsing JSON: {e}")
+            print(f"Raw output: {generated_text}")
+            return []
+# Global instance to avoid reloading model
+_generator_instance = None
+def get_generator(
+    provider_mode="demo",
+    own_base_url=None,
+    own_api_key=None,
+    own_model=None,
+    openai_key=None,
+    openai_model=None,
+    max_tokens=None,
+):
+    """
+    Get a script generator instance with flexible provider support.
+    Args:
+        provider_mode: "demo", "own_inference", or "openai"
+        own_base_url: Base URL for own inference server
+        own_api_key: API key for own inference server
+        own_model: Model name for own inference server
+        openai_key: OpenAI API key
+        openai_model: OpenAI model name (default: "gpt-4o-mini")
+        max_tokens: Maximum tokens for generation
+    Returns:
+        ScriptGenerator instance
+    """
+    global _generator_instance
+    # Always create new instance for OpenAI or own_inference with custom settings
+    # Reuse demo instance if same config
+    if provider_mode == "openai":
+        if not openai_key:
+            print(
+                "Warning: OpenAI selected but no API key provided. Falling back to demo mode."
+            )
+            provider_mode = "demo"
+        else:
+            return ScriptGenerator(
+                provider_mode="openai",
+                openai_key=openai_key,
+                openai_model=openai_model,
+                max_tokens=max_tokens or MAX_TOKENS,
+            )
+    if provider_mode == "own_inference":
+        if not own_base_url:
+            print(
+                "Warning: Own Inference selected but no base URL provided. Falling back to demo mode."
+            )
+            provider_mode = "demo"
+        else:
+            return ScriptGenerator(
+                provider_mode="own_inference",
+                own_base_url=own_base_url,
+                own_api_key=own_api_key,
+                own_model=own_model,
+                max_tokens=max_tokens or MAX_TOKENS,
+            )
+    # Demo mode - reuse global instance
+    if _generator_instance is None or provider_mode == "demo":
+        _generator_instance = ScriptGenerator(
+            provider_mode="demo",
+            max_tokens=max_tokens or MAX_TOKENS,
+        )
+    return _generator_instance

live.py ADDED Viewed

	@@ -0,0 +1,52 @@

+import time
+import subprocess
+import datetime
+# ---------------------------------------------------------------------------
+# Lütfen curl komutunuzu tırnak işaretleri arasına yapıştırın.
+# Örnek: curl -X POST http://api.example.com/update
+# ---------------------------------------------------------------------------
+CURL_COMMAND = """
+curl --location 'https://8000-dep-01kady4n8bfqjjatmpqtzhdcp9-d.cloudspaces.litng.ai/v1/chat/completions' \
+--header 'Content-Type: application/json' \
+--header 'Authorization: Basic YmF0dTpCYXR1aGFuMTIz' \
+--data '{
+  "model": "unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit",
+  "messages": [
+    {
+      "role": "user",
+      "content": "You are a helpful assistant. How manny letters in strawberry?"
+    }
+  ]
+}'
+"""
+# ---------------------------------------------------------------------------
+def run_periodically():
+    print(f"Script başlatıldı: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+    print(f"Komut: {CURL_COMMAND.strip()}")
+    print("-" * 50)
+    while True:
+        try:
+            current_time = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
+            print(f"[{current_time}] İstek gönderiliyor...")
+            # shell=True, komutun terminaldeki gibi çalışmasını sağlar
+            result = subprocess.run(CURL_COMMAND, shell=True, capture_output=True, text=True)
+            if result.returncode == 0:
+                print(f"Başarılı! Çıktı (ilk 100 karakter): {result.stdout[:100]}...")
+            else:
+                print(f"Hata kodu: {result.returncode}")
+                print(f"Hata çıktısı: {result.stderr}")
+        except Exception as e:
+            print(f"Beklenmedik bir hata oluştu: {e}")
+        print("60 saniye bekleniyor...")
+        print("-" * 50)
+        time.sleep(60)
+if __name__ == "__main__":
+    run_periodically()

mcp_servers/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """MCP server integrations for PaperCast"""

mcp_servers/paper_tools_server.py ADDED Viewed

	@@ -0,0 +1,32 @@

+from mcp.server.fastmcp import FastMCP
+from processing.pdf_reader import extract_text_from_pdf
+from processing.url_fetcher import fetch_paper_from_url
+from generation.script_generator import get_generator
+from synthesis.tts_engine import get_tts_engine
+mcp = FastMCP("PaperCast Tools")
+@mcp.tool()
+def read_pdf(path: str) -> str:
+    """Reads text from a PDF file."""
+    return extract_text_from_pdf(path)
+@mcp.tool()
+def fetch_arxiv(url: str) -> str:
+    """Downloads a paper from an arXiv URL and returns the file path."""
+    return fetch_paper_from_url(url)
+@mcp.tool()
+def synthesize_podcast(script: list) -> str:
+    """Synthesizes a podcast from a script (list of dicts). Returns audio path."""
+    tts = get_tts_engine()
+    return tts.synthesize_dialogue(script)
+@mcp.tool()
+def generate_script(text: str) -> list:
+    """Generates a podcast script from text."""
+    generator = get_generator()
+    return generator.generate_podcast_script(text)
+if __name__ == "__main__":
+    mcp.run()

output/history.json ADDED Viewed

	@@ -0,0 +1,58 @@

+[
+  {
+    "url": "https://arxiv.org/abs/2511.14650",
+    "audio_path": "/home/batuhan/lab/papercast/output/podcast.wav",
+    "script_length": "N/A",
+    "timestamp": "2025-11-19 16:32:00",
+    "audio_filename": "podcast.wav"
+  },
+  {
+    "url": "https://www.medrxiv.org/content/10.1101/2025.11.14.25340242v1",
+    "audio_path": "/home/batuhan/lab/papercast/output/podcast_20251119_170124.wav",
+    "script_length": 11,
+    "timestamp": "2025-11-19 17:01:24",
+    "audio_filename": "podcast_20251119_170124.wav"
+  },
+  {
+    "url": "Uploaded PDF",
+    "audio_path": "/home/batuhan/lab/papercast/output/podcast_20251119_170817.wav",
+    "script_length": 13,
+    "timestamp": "2025-11-19 17:08:17",
+    "audio_filename": "podcast_20251119_170817.wav"
+  },
+  {
+    "url": "https://arxiv.org/abs/2511.14650",
+    "audio_path": "/home/batuhan/lab/papercast/output/podcast_20251119_210844.wav",
+    "script_length": 15,
+    "timestamp": "2025-11-19 21:08:44",
+    "audio_filename": "podcast_20251119_210844.wav"
+  },
+  {
+    "url": "https://arxiv.org/abs/2401.08406",
+    "audio_path": "/home/batuhan/lab/papercast/output/podcast_20251119_212141.wav",
+    "script_length": 14,
+    "timestamp": "2025-11-19 21:21:41",
+    "audio_filename": "podcast_20251119_212141.wav"
+  },
+  {
+    "url": "https://arxiv.org/abs/1706.03762",
+    "audio_path": "/home/batuhan/lab/papercast/output/podcast_20251119_230027.wav",
+    "script_length": 12,
+    "timestamp": "2025-11-19 23:00:27",
+    "audio_filename": "podcast_20251119_230027.wav"
+  },
+  {
+    "url": "https://arxiv.org/abs/2303.08774",
+    "audio_path": "/home/batuhan/lab/papercast/output/podcast_20251119_230323.wav",
+    "script_length": 17,
+    "timestamp": "2025-11-19 23:03:23",
+    "audio_filename": "podcast_20251119_230323.wav"
+  },
+  {
+    "url": "https://www.medrxiv.org/content/10.1101/2025.05.25.25328317v2",
+    "audio_path": "/home/batuhan/lab/papercast/output/podcast_20251119_230742.wav",
+    "script_length": 11,
+    "timestamp": "2025-11-19 23:07:42",
+    "audio_filename": "podcast_20251119_230742.wav"
+  }
+]

plan.md ADDED Viewed

	@@ -0,0 +1,90 @@

+# PaperCast Implementation Plan
+This plan outlines the steps to build **PaperCast**, an AI agent that converts research papers into podcast-style conversations using MCP, Gradio, and LLMs.
+## 1. Infrastructure & Dependencies
+- [ ] **Update `requirements.txt`**
+    - Add `transformers`, `accelerate`, `bitsandbytes` (for 4-bit LLM loading).
+    - Add `scipy` (for audio processing).
+    - Add `beautifulsoup4` (for web parsing).
+    - Add `python-multipart` (for API handling).
+    - Ensure `mcp` and `gradio` versions are pinned.
+- [ ] **Project Structure Setup**
+    - Create `app.py` (entry point).
+    - Ensure `__init__.py` in all subdirs.
+    - Create `config.py` in `utils/` for global settings (LLM model names, paths).
+## 2. Core Processing Modules
+### 2.1. PDF Processing (`processing/`)
+- [ ] **Implement `pdf_reader.py`**
+    - Function `extract_text_from_pdf(pdf_path) -> str`.
+    - Use `PyMuPDF` (fitz) for fast extraction.
+    - Implement basic cleaning (remove headers/footers/references if possible).
+- [ ] **Implement `url_fetcher.py`**
+    - Function `fetch_paper_from_url(url) -> str`.
+    - Handle arXiv URLs (convert `/abs/` to `/pdf/` or scrape abstract).
+    - Download PDF to temporary storage.
+### 2.2. Generation Logic (`generation/`)
+- [ ] **Implement `script_generator.py`**
+    - **Model**: `unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit`.
+    - Define System Prompts for "Host" and "Guest" personas.
+    - Function `generate_podcast_script(paper_text) -> List[Dict]`.
+    - Output format: `[{"speaker": "Host", "text": "...", "emotion": "excited"}, {"speaker": "Guest", "text": "...", "emotion": "neutral"}]`.
+    - **Key Logic**: Prompt the model to include emotion tags (e.g. `[laugh]`, `[sigh]`) supported by Maya1.
+### 2.3. Audio Synthesis (`synthesis/`)
+- [ ] **Implement `tts_engine.py`**
+    - **Model**: `maya-research/maya1`.
+    - Function `synthesize_dialogue(script_json) -> audio_path`.
+    - Parse the script for emotion tags and pass them to Maya1.
+    - Combine audio segments into a single file using `pydub` or `scipy`.
+## 3. MCP Server Integration (`mcp_servers/`)
+To satisfy the "MCP in Action" requirement, we will expose our core tools as MCP resources/tools.
+- [ ] **Create `paper_tools_server.py`**
+    - Implement an MCP server that provides:
+        - Tool: `read_pdf(path)`
+        - Tool: `fetch_arxiv(url)`
+        - Tool: `synthesize_podcast(script)`
+    - This allows the "Agent" to call these tools via the MCP protocol.
+## 4. Agent Orchestration (`agents/`)
+- [ ] **Implement `podcast_agent.py`**
+    - Create a `PodcastAgent` class.
+    - **Planning Loop**:
+        1.  Receive User Input.
+        2.  **Plan**: Decide to fetch/read paper.
+        3.  **Analyze**: Extract key topics.
+        4.  **Draft**: Generate script using Phi-4-mini.
+        5.  **Synthesize**: Create audio using Maya1.
+    - Use `sequential_thinking` pattern (simulated) to show "Agentic" behavior in the logs/UI.
+    - *Crucial*: The Agent should use the MCP Client to call the tools defined in Step 3, demonstrating "Autonomous reasoning using MCP tools".
+## 5. User Interface (`app.py`)
+- [ ] **Build Gradio UI**
+    - Input: Textbox (URL) or File Upload (PDF).
+    - Output: Audio Player, Transcript Textbox, Status/Logs Markdown.
+    - **Agent Visualization**: Show the "Thoughts" of the agent as it plans and executes (e.g., "Fetching paper...", "Analyzing structure...", "Generating script...").
+- [ ] **Deployment Config**
+    - Create `Dockerfile` (if needed for custom deps) or rely on HF Spaces default.
+## 6. Verification & Polish
+- [ ] **Test Run**
+    - Run with a real arXiv paper.
+    - Verify audio quality and script coherence.
+- [ ] **Documentation**
+    - Update `README.md` with usage instructions and "MCP in Action" details.
+    - Record Demo Video.
+## 7. Bonus Features (Time Permitting)
+- [ ] **RAG Integration**: Use a vector store to answer questions about the paper after the podcast.
+- [ ] **Background Music**: Mix in intro/outro music.

processing/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """PDF extraction and text processing for PaperCast"""

processing/pdf_reader.py ADDED Viewed

	@@ -0,0 +1,21 @@

+import fitz  # PyMuPDF
+def extract_text_from_pdf(pdf_path: str) -> str:
+    """
+    Extracts text from a PDF file using PyMuPDF.
+    Args:
+        pdf_path (str): Path to the PDF file.
+    Returns:
+        str: Extracted text content.
+    """
+    try:
+        doc = fitz.open(pdf_path)
+        text = ""
+        for page in doc:
+            text += page.get_text()
+        return text
+    except Exception as e:
+        print(f"Error reading PDF {pdf_path}: {e}")
+        return ""

processing/url_fetcher.py ADDED Viewed

	@@ -0,0 +1,56 @@

+import os
+import requests
+from urllib.parse import urlparse
+from utils.config import TEMP_DIR
+def fetch_paper_from_url(url: str) -> str:
+    """
+    Downloads a PDF from a URL (supports arXiv and medRxiv).
+    Args:
+        url (str): The URL of the paper.
+    Returns:
+        str: Path to the downloaded PDF file.
+    """
+    # Handle arXiv abstract URLs
+    if "arxiv.org/abs/" in url:
+        url = url.replace("/abs/", "/pdf/")
+        if not url.endswith(".pdf"):
+            url += ".pdf"
+    # Handle medRxiv URLs
+    # Example: https://www.medrxiv.org/content/10.1101/2025.11.13.25340182v1
+    # or: https://www.medrxiv.org/content/10.1101/2025.11.13.25340182v1.full.pdf
+    elif "medrxiv.org/content/" in url:
+        if not url.endswith(".pdf"):
+            url = url + ".full.pdf"
+    try:
+        # Add headers to avoid 403 Forbidden errors from bioRxiv/medRxiv
+        headers = {
+            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
+            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
+            'Accept-Language': 'en-US,en;q=0.5',
+            'Connection': 'keep-alive',
+        }
+        response = requests.get(url, stream=True, headers=headers, timeout=30)
+        response.raise_for_status()
+        # Extract filename from URL or use default
+        parsed_url = urlparse(url)
+        filename = os.path.basename(parsed_url.path)
+        if not filename.endswith(".pdf"):
+            filename = "downloaded_paper.pdf"
+        file_path = os.path.join(TEMP_DIR, filename)
+        with open(file_path, "wb") as f:
+            for chunk in response.iter_content(chunk_size=8192):
+                f.write(chunk)
+        return file_path
+    except Exception as e:
+        print(f"Error downloading {url}: {e}")
+        return ""

requirements.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+beautifulsoup4
+edge-tts
+elevenlabs
+gradio
+mcp
+openai
+pydub
+python-dotenv
+python-multipart
+pymupdf
+requests
+scipy

synthesis/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """Text-to-speech audio generation for PaperCast"""

synthesis/tts_engine.py ADDED Viewed

	@@ -0,0 +1,345 @@

+import asyncio
+import os
+from datetime import datetime
+from io import BytesIO
+import edge_tts
+from elevenlabs import ElevenLabs, VoiceSettings
+from pydub import AudioSegment
+from utils.config import (
+    ELEVENLABS_API_KEY,
+    ELEVENLABS_GUEST_VOICE,
+    ELEVENLABS_HOST_VOICE,
+    OUTPUT_DIR,
+)
+# Edge-TTS Voice Options
+EDGE_TTS_VOICES = {
+    # English (US) - Male
+    "Guy (US Male - Casual)": "en-US-GuyNeural",
+    "Christopher (US Male - Authoritative)": "en-US-ChristopherNeural",
+    "Eric (US Male - Professional)": "en-US-EricNeural",
+    "Steffan (US Male - Energetic)": "en-US-SteffanNeural",
+    "Roger (US Male - Elderly)": "en-US-RogerNeural",
+    # English (US) - Female
+    "Jenny (US Female - Friendly)": "en-US-JennyNeural",
+    "Aria (US Female - Professional)": "en-US-AriaNeural",
+    "Michelle (US Female - Enthusiastic)": "en-US-MichelleNeural",
+    "Sara (US Female - News Anchor)": "en-US-SaraNeural",
+    "Ana (US Female - Child)": "en-US-AnaNeural",
+    # English (UK)
+    "Ryan (UK Male)": "en-GB-RyanNeural",
+    "Thomas (UK Male - Elderly)": "en-GB-ThomasNeural",
+    "Sonia (UK Female)": "en-GB-SoniaNeural",
+    "Libby (UK Female - Enthusiastic)": "en-GB-LibbyNeural",
+    # English (Australia)
+    "William (AU Male)": "en-AU-WilliamNeural",
+    "Natasha (AU Female)": "en-AU-NatashaNeural",
+    # English (India)
+    "Prabhat (IN Male)": "en-IN-PrabhatNeural",
+    "Neerja (IN Female)": "en-IN-NeerjaNeural",
+}
+# ElevenLabs Voice Options (popular voices)
+ELEVENLABS_VOICES = {
+    # Male Voices
+    "Antoni (Male - Well-rounded)": "ErXwobaYiN019PkySvjV",
+    "Josh (Male - Deep)": "TxGEqnHWrfWFTfGW9XjX",
+    "Arnold (Male - Crisp)": "VR6AewLTigWG4xSOukaG",
+    "Callum (Male - Hoarse)": "N2lVS1w4EtoT3dr4eOWO",
+    "Charlie (Male - Casual)": "IKne3meq5aSn9XLyUdCD",
+    "Clyde (Male - War veteran)": "2EiwWnXFnvU5JabPnv8n",
+    "Daniel (Male - Deep British)": "onwK4e9ZLuTAKqWW03F9",
+    "Ethan (Male - Young American)": "g5CIjZEefAph4nQFvHAz",
+    "Fin (Male - Irish)": "D38z5RcWu1voky8WS1ja",
+    "George (Male - British)": "JBFqnCBsd6RMkjVDRZzb",
+    # Female Voices
+    "Bella (Female - Soft)": "EXAVITQu4vr4xnSDxMaL",
+    "Rachel (Female - Calm)": "21m00Tcm4TlvDq8ikWAM",
+    "Domi (Female - Strong)": "AZnzlk1XvdvUeBnXmlld",
+    "Elli (Female - Emotional)": "MF3mGyEYCl7XYWbV9V6O",
+    "Emily (Female - Calm British)": "LcfcDJNUP1GQjkzn1xUU",
+    "Freya (Female - Young American)": "jsCqWAovK2LkecY7zXl4",
+    "Gigi (Female - Young Expressive)": "jBpfuIE2acCO8z3wKNLl",
+    "Grace (Female - Southern American)": "oWAxZDx7w5VEj9dCyTzz",
+    "Lily (Female - Warm British)": "pFZP5JQG7iQjIQuC4Bku",
+    "Matilda (Female - Warm)": "XrExE9yKIg1WjnnlVkGX",
+}
+def generate_unique_filename():
+    """Generate unique filename using timestamp"""
+    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+    return f"podcast_{timestamp}.wav"
+class TTSEngine:
+    def __init__(self, tts_provider="edge-tts", custom_api_key=None, host_voice=None, guest_voice=None):
+        """
+        Initialize TTS Engine with specified provider.
+        Args:
+            tts_provider: "edge-tts" or "elevenlabs"
+            custom_api_key: API key for ElevenLabs (only used if provider is "elevenlabs")
+            host_voice: Voice ID/name for Host (optional, uses default if not provided)
+            guest_voice: Voice ID/name for Guest (optional, uses default if not provided)
+        """
+        self.mode = tts_provider.lower()
+        if self.mode == "elevenlabs":
+            print("Initializing ElevenLabs TTS API...")
+            # Use custom key if provided, otherwise use default
+            api_key = custom_api_key if custom_api_key else ELEVENLABS_API_KEY
+            self.client = ElevenLabs(api_key=api_key)
+            # Use custom voices or defaults
+            self.host_voice_id = host_voice if host_voice else ELEVENLABS_HOST_VOICE
+            self.guest_voice_id = guest_voice if guest_voice else ELEVENLABS_GUEST_VOICE
+            if custom_api_key:
+                print("✓ ElevenLabs TTS ready (custom API key)")
+            else:
+                print("✓ ElevenLabs TTS ready (demo API key)")
+            # Print selected voices
+            host_name = [k for k, v in ELEVENLABS_VOICES.items() if v == self.host_voice_id]
+            guest_name = [k for k, v in ELEVENLABS_VOICES.items() if v == self.guest_voice_id]
+            print(f"  Host: {host_name[0] if host_name else 'Custom/Default'}")
+            print(f"  Guest: {guest_name[0] if guest_name else 'Custom/Default'}")
+        elif self.mode == "edge-tts":
+            print("Initializing Edge-TTS (Microsoft)...")
+            # Use custom voices or defaults
+            self.host_voice = host_voice if host_voice else "en-US-GuyNeural"
+            self.guest_voice = guest_voice if guest_voice else "en-US-JennyNeural"
+            print("✓ Edge-TTS ready (free, no API key required)")
+            # Print selected voices
+            host_name = [k for k, v in EDGE_TTS_VOICES.items() if v == self.host_voice]
+            guest_name = [k for k, v in EDGE_TTS_VOICES.items() if v == self.guest_voice]
+            print(f"  Host: {host_name[0] if host_name else 'Custom/Default'}")
+            print(f"  Guest: {guest_name[0] if guest_name else 'Custom/Default'}")
+        else:
+            raise ValueError(f"Unknown TTS provider: {tts_provider}. Use 'edge-tts' or 'elevenlabs'")
+    def synthesize_dialogue(self, script: list) -> str:
+        """
+        Synthesize the script to audio using selected TTS provider.
+        Args:
+            script: List of dialogue items
+        Returns:
+            str: Path to the generated audio file
+        """
+        if self.mode == "elevenlabs":
+            return self._synthesize_elevenlabs(script)
+        elif self.mode == "edge-tts":
+            return self._synthesize_edge_tts(script)
+        else:
+            raise ValueError(f"Unknown TTS mode: {self.mode}")
+    def _synthesize_elevenlabs(self, script: list) -> str:
+        """Synthesize using ElevenLabs API"""
+        print("Synthesizing audio via ElevenLabs API...")
+        audio_segments = []
+        for i, item in enumerate(script):
+            text = item["text"]
+            speaker = item["speaker"]
+            # Select voice based on speaker
+            voice_id = self.guest_voice_id if speaker == "Guest" else self.host_voice_id
+            try:
+                print(f"Synthesizing line {i + 1}/{len(script)} ({speaker})...")
+                # Generate audio using ElevenLabs
+                audio_generator = self.client.text_to_speech.convert(
+                    voice_id=voice_id,
+                    text=text,
+                    model_id="eleven_multilingual_v2",
+                    voice_settings=VoiceSettings(
+                        stability=0.5,
+                        similarity_boost=0.75,
+                        style=0.5,
+                        use_speaker_boost=True,
+                    ),
+                )
+                # Collect audio bytes
+                audio_bytes = b"".join(audio_generator)
+                # Convert to AudioSegment
+                audio_segment = AudioSegment.from_mp3(BytesIO(audio_bytes))
+                audio_segments.append(audio_segment)
+                # Add 500ms silence between speakers
+                silence = AudioSegment.silent(duration=500)
+                audio_segments.append(silence)
+                print(f"✓ Synthesized line {i + 1}/{len(script)}")
+            except Exception as e:
+                print(f"Error synthesizing line '{text[:50]}...': {e}")
+                # Continue with next line even if one fails
+        if not audio_segments:
+            print("No audio generated")
+            return ""
+        # Combine all segments
+        print("Combining audio segments...")
+        combined = sum(audio_segments)
+        # Export as WAV with unique filename
+        filename = generate_unique_filename()
+        output_path = os.path.join(OUTPUT_DIR, filename)
+        combined.export(output_path, format="wav")
+        print(f"✓ Podcast saved to: {output_path}")
+        return output_path
+    def _synthesize_edge_tts(self, script: list) -> str:
+        """Synthesize using Edge-TTS (Microsoft)"""
+        print("Synthesizing audio via Edge-TTS (Microsoft)...")
+        audio_segments = []
+        for i, item in enumerate(script):
+            text = item["text"]
+            speaker = item["speaker"]
+            # Select voice based on speaker
+            voice = self.guest_voice if speaker == "Guest" else self.host_voice
+            try:
+                print(f"Synthesizing line {i + 1}/{len(script)} ({speaker})...")
+                # Generate audio using Edge-TTS (synchronous wrapper for async)
+                audio_bytes = asyncio.run(self._edge_tts_synthesize(text, voice))
+                # Convert to AudioSegment
+                audio_segment = AudioSegment.from_mp3(BytesIO(audio_bytes))
+                # Trim silence from the end of the audio (Edge-TTS adds trailing silence)
+                # Detect silence threshold: -40 dBFS
+                audio_segment = self._trim_silence(audio_segment)
+                audio_segments.append(audio_segment)
+                # Add minimal silence between speakers (50ms for natural flow)
+                silence = AudioSegment.silent(duration=50)
+                audio_segments.append(silence)
+                print(f"✓ Synthesized line {i + 1}/{len(script)}")
+            except Exception as e:
+                print(f"Error synthesizing line '{text[:50]}...': {e}")
+                # Continue with next line even if one fails
+        if not audio_segments:
+            print("No audio generated")
+            return ""
+        # Combine all segments
+        print("Combining audio segments...")
+        combined = sum(audio_segments)
+        # Export as WAV with unique filename
+        filename = generate_unique_filename()
+        output_path = os.path.join(OUTPUT_DIR, filename)
+        combined.export(output_path, format="wav")
+        print(f"✓ Podcast saved to: {output_path}")
+        return output_path
+    async def _edge_tts_synthesize(self, text: str, voice: str) -> bytes:
+        """
+        Async helper to synthesize text using Edge-TTS.
+        Args:
+            text: Text to synthesize
+            voice: Voice name to use
+        Returns:
+            bytes: Audio data in MP3 format
+        """
+        communicate = edge_tts.Communicate(text, voice)
+        audio_data = b""
+        async for chunk in communicate.stream():
+            if chunk["type"] == "audio":
+                audio_data += chunk["data"]
+        return audio_data
+    def _trim_silence(self, audio_segment, silence_thresh=-40, chunk_size=10):
+        """
+        Trim silence from the end of audio segment.
+        Args:
+            audio_segment: AudioSegment to trim
+            silence_thresh: Silence threshold in dBFS (default: -40)
+            chunk_size: Size of chunks to analyze in ms (default: 10)
+        Returns:
+            Trimmed AudioSegment
+        """
+        # Start from the end and find where audio actually ends
+        trim_ms = 0
+        # Check from the end in chunks
+        for i in range(len(audio_segment) - chunk_size, 0, -chunk_size):
+            chunk = audio_segment[i:i + chunk_size]
+            if chunk.dBFS > silence_thresh:
+                # Found non-silent audio
+                trim_ms = i + chunk_size
+                break
+        # If we found non-silent audio, trim there
+        if trim_ms > 0:
+            return audio_segment[:trim_ms]
+        # Otherwise return original
+        return audio_segment
+# Global instance
+_tts_instance = None
+def get_tts_engine(tts_provider="edge-tts", custom_api_key=None, host_voice=None, guest_voice=None):
+    """
+    Get TTS engine instance with optional provider, API key, and voices.
+    Args:
+        tts_provider: "edge-tts" or "elevenlabs" (default: "edge-tts")
+        custom_api_key: Optional custom ElevenLabs API key (only used for ElevenLabs)
+        host_voice: Voice ID/name for Host (optional)
+        guest_voice: Voice ID/name for Guest (optional)
+    Returns:
+        TTSEngine instance
+    """
+    global _tts_instance
+    # Always create new instance if custom settings provided
+    if custom_api_key or tts_provider != "edge-tts" or host_voice or guest_voice:
+        return TTSEngine(
+            tts_provider=tts_provider,
+            custom_api_key=custom_api_key,
+            host_voice=host_voice,
+            guest_voice=guest_voice
+        )
+    # Otherwise, reuse global instance (for default Edge-TTS)
+    if _tts_instance is None:
+        _tts_instance = TTSEngine(tts_provider="edge-tts")
+    return _tts_instance

todo.md ADDED Viewed

	@@ -0,0 +1,105 @@

+# PaperCast New features implementations
+## Vision
+We are not building "another paper summarizer".
+We are building **the world's first interactive, multi-modal, counterfactual-aware, visually-synced academic podcast studio** powered by MCP tools, Gradio 6, Marker-pdf, Semantic Scholar, arXiv and ElevenLabs.
+We invented 4 original frameworks that will be heavily emphasized in the demo and submission:
+- **PPF** — Podcast Persona Framework
+- **PVF** — Paper Visual Framework
+- **PAD** — Paper Auto-Discovery
+- **CPM** — Counterfactual Paper Mode
+We will constantly refer to these acronyms in the demo:
+"We created the Podcast Persona Framework (PPF) to solve the one-size-fits-all podcast problem" → instant "wow this is professional" effect.
+## Mandatory Upgrades (Must be done first)
+1. **Full Gradio 6.x Upgrade**
+   - Entire project migrates to Gradio ≥6.0
+   - Replace gr.ChatInterface with manual gr.Chatbot + gr.Row/gr.Column structure (for full UI control)
+   - Launch with `launch(mcp=True)` → automatic MCP server on HF Spaces
+   - Full streaming support (ElevenLabs + partial transcript updates)
+2. **MCP PDF Server → Marker-pdf Ultra Edition**
+   - Server name: `marker-pdf-mcp-server`
+   - Output: **clean GitHub-flavored markdown** (LaTeX equations preserved with $$...$$, real markdown tables, figure/table metadata with page numbers)
+   - New parameter: `extract_mode` → "full" | "smart_focus" (Claude automatically selects the 3-4 most important sections)
+   - Tool name in agent: `extract_paper_with_marker`
+## Core Features
+### 1. Podcast Persona Framework (PPF) — Killer Feature #1
+User selects persona via dropdown + optional custom text box.
+Implemented modes (exact names):
+1. **Friendly Explainer** → Current default (two friends casually discussing)
+2. **Academic Debate** → One defends the paper, the other politely challenges ("This claim is strong, but Table 2 baseline seems weak...")
+3. **Savage Roast** → One speaker brutally roasts the paper ("This ablation is an absolute clown show", "Figure 4 is statistical noise"), the other stubbornly defends it
+4. **Pedagogical** → Speaker A = Professor, Speaker B = Curious Student (student constantly asks questions)
+5. **Interdisciplinary Clash** → Speaker A = Domain Expert, Speaker B = Complete Outsider (e.g. biologist reading ML paper → "This neuron analogy makes zero biological sense")
+### 2. Paper Auto-Discovery (PAD) — Killer Feature #2
+Input methods:
+- PDF upload
+- Direct URL (arXiv, Semantic Scholar, HF, etc.)
+(NEW INPUT METHOD) - Free text query → "Grok reasons about everything" or "diffusion survey 2025"
+Workflow:
+1. Agent calls **Semantic Scholar Graph v1 API** (`/paper/search?query=...&fields=title,authors,year,abstract,openAccessPdf,url`)
+2. Parallel call to **arXiv API** (`http://export.arxiv.org/api/query?search_query=...`)
+3. Collect top 5 results → show user title + abstract + year + source in gr.Radio or clickable cards
+4. User selects → if openAccessPdf exists → download directly → Marker MCP extract
+5. Otherwise fetch from arXiv
+Zero friction paper discovery.
+### 3. Paper Visual Framework (PVF) — Killer Feature #3 (Jury will lose their minds)
+Right column of Gradio interface shows embedded PDF viewer (PDF.js).
+- Marker provides exact page numbers for figures/tables
+- When speakers say "Let's look at Figure 8" → PDF auto-scrolls to correct page + highlights/zooms the figure
+- Transcript entries become clickable timestamps that jump to the exact location
+- Implementation: ElevenLabs streaming → parse chunk for figure/table mentions → emit JS event → PDF.js control
+This single feature wins "Best UX" + "Most Innovative" categories alone.
+### 4. Counterfactual Paper Mode ("What If?")
+Post-podcast button:
+"What if this paper was written by Yann LeCun? / in 2012? / if GPT-4 never existed? / by DeepMind instead of OpenAI?"
+→ Claude re-writes/re-interprets the same paper in alternate reality → new podcast generated.
+Extremely fun, extremely memorable, extremely shareable.
+### 5. Ultra Transcript System
+- Timestamped (00:00:12)
+- Speaker-labeled (Savage Critic:, Professor:, etc.)
+- Clickable figure/table references (syncs with PVF)
+- LaTeX equations rendered via MathJax
+- Download buttons: .txt, .srt, .docx, .vtt
+- Bonus: "Copy as tweet" → auto-selects the 3 spiciest quotes with citation
+## Final UI Layout (Gradio 6)
+```python
+with gr.Row():
+    with gr.Column(scale=3):
+        chatbot = gr.Chatbot(height=700, render=True)
+        controls = gr.Row()  # query input + PPF dropdown + custom persona + buttons
+        audio_player = gr.Audio(autoplay=True, streaming=True)
+        transcript = gr.Markdown()
+    with gr.Column(scale=2):
+        pdf_viewer = gr.HTML()   # PVF - embedded PDF.js
+        timeline_vis = gr.HTML() # PET timeline
+Required MCP Tools
+extract_paper_with_marker → returns markdown string
+search_semantic_scholar → returns json
+search_arxiv → returns json
+fetch_pdf_from_url → returns bytes
+batch_extract_papers (for PET)

utils/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """Helper functions for PaperCast"""

utils/config.py ADDED Viewed

	@@ -0,0 +1,56 @@

+import os
+from dotenv import load_dotenv
+# Load environment variables from .env.local
+load_dotenv(os.path.join(os.path.dirname(os.path.dirname(__file__)), ".env.local"))
+# Demo Mode Configuration - Load from environment variable
+# Set DEMO_MODE=true in .env.local or HuggingFace Spaces secrets
+DEMO_MODE = True
+# Model Configurations
+SCRIPT_GENERATION_MODEL = "unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit"
+# LLM API Inference Settings (Cloud GPU) - Load from .env.local
+INFERENCE_API_URL = os.getenv("DEMO_INFERENCE_URL")
+INFERENCE_API_KEY = os.getenv("DEMO_INFERENCE_KEY")
+# TTS API Settings (ElevenLabs)
+# Load from .env.local
+ELEVENLABS_API_KEY = os.getenv("DEMO_TTS_KEY")
+# ElevenLabs Voice IDs (you can change these to different voices)
+# Find more voices at: https://api.elevenlabs.io/v1/voices
+ELEVENLABS_HOST_VOICE = "ErXwobaYiN019PkySvjV"  # Antoni - male voice for Host
+ELEVENLABS_GUEST_VOICE = "EXAVITQu4vr4xnSDxMaL"  # Bella - female voice for Guest
+# Demo Mode Settings (loaded from .env.local)
+DEMO_INFERENCE_URL = INFERENCE_API_URL
+DEMO_INFERENCE_KEY = INFERENCE_API_KEY
+DEMO_MODEL = SCRIPT_GENERATION_MODEL
+DEMO_TTS_KEY = ELEVENLABS_API_KEY
+# Optional: Additional API keys for non-demo mode
+OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")
+CUSTOM_ELEVENLABS_KEY = os.getenv("CUSTOM_ELEVENLABS_KEY", "")
+CUSTOM_INFERENCE_URL = os.getenv("CUSTOM_INFERENCE_URL", "")
+CUSTOM_INFERENCE_KEY = os.getenv("CUSTOM_INFERENCE_KEY", "")
+# Paths
+BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+TEMP_DIR = os.path.join(BASE_DIR, "temp")
+OUTPUT_DIR = os.path.join(BASE_DIR, "output")
+# Ensure directories exist
+os.makedirs(TEMP_DIR, exist_ok=True)
+os.makedirs(OUTPUT_DIR, exist_ok=True)
+# Generation Settings
+MAX_TOKENS = 4096  # Supports long-form content generation
+TEMPERATURE = 0.7
+# Context Limits for Multi-Paper Processing
+MAX_CONTEXT_CHARS = 80000  # Maximum total characters for multiple papers (~60K tokens)
+# This ensures we stay well within the 128K token limit while leaving room for prompts and responses

utils/history.py ADDED Viewed

	@@ -0,0 +1,58 @@

+import json
+import os
+from datetime import datetime
+from utils.config import OUTPUT_DIR
+HISTORY_FILE = os.path.join(OUTPUT_DIR, "history.json")
+def load_history():
+    """Load podcast generation history from JSON file"""
+    if not os.path.exists(HISTORY_FILE):
+        return []
+    try:
+        with open(HISTORY_FILE, 'r') as f:
+            return json.load(f)
+    except Exception as e:
+        print(f"Error loading history: {e}")
+        return []
+def save_to_history(url, audio_path, script_length):
+    """Save a podcast generation to history"""
+    history = load_history()
+    entry = {
+        "url": url,
+        "audio_path": audio_path,
+        "script_length": script_length,
+        "timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
+        "audio_filename": os.path.basename(audio_path)
+    }
+    history.append(entry)
+    try:
+        with open(HISTORY_FILE, 'w') as f:
+            json.dump(history, f, indent=2)
+        print(f"✓ Saved to history: {url}")
+    except Exception as e:
+        print(f"Error saving to history: {e}")
+def get_history_items():
+    """Get history items formatted for Gradio display"""
+    history = load_history()
+    if not history:
+        return []
+    # Return in reverse order (newest first)
+    items = []
+    for entry in reversed(history):
+        items.append({
+            "timestamp": entry["timestamp"],
+            "url": entry["url"],
+            "audio_path": entry["audio_path"],
+            "script_length": entry.get("script_length", "N/A")
+        })
+    return items