Spaces:
Running
Running
batuhanozkose
feat: Implement initial PaperCast application with core modules, documentation, a periodic curl script, and a Gradio certificate.
472739a
| # PaperCast Implementation Plan | |
| This plan outlines the steps to build **PaperCast**, an AI agent that converts research papers into podcast-style conversations using MCP, Gradio, and LLMs. | |
| ## 1. Infrastructure & Dependencies | |
| - [ ] **Update `requirements.txt`** | |
| - Add `transformers`, `accelerate`, `bitsandbytes` (for 4-bit LLM loading). | |
| - Add `scipy` (for audio processing). | |
| - Add `beautifulsoup4` (for web parsing). | |
| - Add `python-multipart` (for API handling). | |
| - Ensure `mcp` and `gradio` versions are pinned. | |
| - [ ] **Project Structure Setup** | |
| - Create `app.py` (entry point). | |
| - Ensure `__init__.py` in all subdirs. | |
| - Create `config.py` in `utils/` for global settings (LLM model names, paths). | |
| ## 2. Core Processing Modules | |
| ### 2.1. PDF Processing (`processing/`) | |
| - [ ] **Implement `pdf_reader.py`** | |
| - Function `extract_text_from_pdf(pdf_path) -> str`. | |
| - Use `PyMuPDF` (fitz) for fast extraction. | |
| - Implement basic cleaning (remove headers/footers/references if possible). | |
| - [ ] **Implement `url_fetcher.py`** | |
| - Function `fetch_paper_from_url(url) -> str`. | |
| - Handle arXiv URLs (convert `/abs/` to `/pdf/` or scrape abstract). | |
| - Download PDF to temporary storage. | |
| ### 2.2. Generation Logic (`generation/`) | |
| - [ ] **Implement `script_generator.py`** | |
| - **Model**: `unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit`. | |
| - Define System Prompts for "Host" and "Guest" personas. | |
| - Function `generate_podcast_script(paper_text) -> List[Dict]`. | |
| - Output format: `[{"speaker": "Host", "text": "...", "emotion": "excited"}, {"speaker": "Guest", "text": "...", "emotion": "neutral"}]`. | |
| - **Key Logic**: Prompt the model to include emotion tags (e.g. `[laugh]`, `[sigh]`) supported by Maya1. | |
| ### 2.3. Audio Synthesis (`synthesis/`) | |
| - [ ] **Implement `tts_engine.py`** | |
| - **Model**: `maya-research/maya1`. | |
| - Function `synthesize_dialogue(script_json) -> audio_path`. | |
| - Parse the script for emotion tags and pass them to Maya1. | |
| - Combine audio segments into a single file using `pydub` or `scipy`. | |
| ## 3. MCP Server Integration (`mcp_servers/`) | |
| To satisfy the "MCP in Action" requirement, we will expose our core tools as MCP resources/tools. | |
| - [ ] **Create `paper_tools_server.py`** | |
| - Implement an MCP server that provides: | |
| - Tool: `read_pdf(path)` | |
| - Tool: `fetch_arxiv(url)` | |
| - Tool: `synthesize_podcast(script)` | |
| - This allows the "Agent" to call these tools via the MCP protocol. | |
| ## 4. Agent Orchestration (`agents/`) | |
| - [ ] **Implement `podcast_agent.py`** | |
| - Create a `PodcastAgent` class. | |
| - **Planning Loop**: | |
| 1. Receive User Input. | |
| 2. **Plan**: Decide to fetch/read paper. | |
| 3. **Analyze**: Extract key topics. | |
| 4. **Draft**: Generate script using Phi-4-mini. | |
| 5. **Synthesize**: Create audio using Maya1. | |
| - Use `sequential_thinking` pattern (simulated) to show "Agentic" behavior in the logs/UI. | |
| - *Crucial*: The Agent should use the MCP Client to call the tools defined in Step 3, demonstrating "Autonomous reasoning using MCP tools". | |
| ## 5. User Interface (`app.py`) | |
| - [ ] **Build Gradio UI** | |
| - Input: Textbox (URL) or File Upload (PDF). | |
| - Output: Audio Player, Transcript Textbox, Status/Logs Markdown. | |
| - **Agent Visualization**: Show the "Thoughts" of the agent as it plans and executes (e.g., "Fetching paper...", "Analyzing structure...", "Generating script..."). | |
| - [ ] **Deployment Config** | |
| - Create `Dockerfile` (if needed for custom deps) or rely on HF Spaces default. | |
| ## 6. Verification & Polish | |
| - [ ] **Test Run** | |
| - Run with a real arXiv paper. | |
| - Verify audio quality and script coherence. | |
| - [ ] **Documentation** | |
| - Update `README.md` with usage instructions and "MCP in Action" details. | |
| - Record Demo Video. | |
| ## 7. Bonus Features (Time Permitting) | |
| - [ ] **RAG Integration**: Use a vector store to answer questions about the paper after the podcast. | |
| - [ ] **Background Music**: Mix in intro/outro music. | |