Spaces:

MCP-1st-Birthday
/

papercast

Running

papercast / plan.md

batuhanozkose

feat: Implement initial PaperCast application with core modules, documentation, a periodic curl script, and a Gradio certificate.

472739a about 1 month ago

preview code

raw

history blame contribute delete

4.03 kB

	# PaperCast Implementation Plan

	This plan outlines the steps to build PaperCast, an AI agent that converts research papers into podcast-style conversations using MCP, Gradio, and LLMs.

	## 1. Infrastructure & Dependencies

	- [ ] Update `requirements.txt`
	- Add `transformers`, `accelerate`, `bitsandbytes` (for 4-bit LLM loading).
	- Add `scipy` (for audio processing).
	- Add `beautifulsoup4` (for web parsing).
	- Add `python-multipart` (for API handling).
	- Ensure `mcp` and `gradio` versions are pinned.
	- [ ] Project Structure Setup
	- Create `app.py` (entry point).
	- Ensure `__init__.py` in all subdirs.
	- Create `config.py` in `utils/` for global settings (LLM model names, paths).

	## 2. Core Processing Modules

	### 2.1. PDF Processing (`processing/`)
	- [ ] Implement `pdf_reader.py`
	- Function `extract_text_from_pdf(pdf_path) -> str`.
	- Use `PyMuPDF` (fitz) for fast extraction.
	- Implement basic cleaning (remove headers/footers/references if possible).
	- [ ] Implement `url_fetcher.py`
	- Function `fetch_paper_from_url(url) -> str`.
	- Handle arXiv URLs (convert `/abs/` to `/pdf/` or scrape abstract).
	- Download PDF to temporary storage.

	### 2.2. Generation Logic (`generation/`)
	- [ ] Implement `script_generator.py`
	- Model: `unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit`.
	- Define System Prompts for "Host" and "Guest" personas.
	- Function `generate_podcast_script(paper_text) -> List[Dict]`.
	- Output format: `[{"speaker": "Host", "text": "...", "emotion": "excited"}, {"speaker": "Guest", "text": "...", "emotion": "neutral"}]`.
	- Key Logic: Prompt the model to include emotion tags (e.g. `[laugh]`, `[sigh]`) supported by Maya1.

	### 2.3. Audio Synthesis (`synthesis/`)
	- [ ] Implement `tts_engine.py`
	- Model: `maya-research/maya1`.
	- Function `synthesize_dialogue(script_json) -> audio_path`.
	- Parse the script for emotion tags and pass them to Maya1.
	- Combine audio segments into a single file using `pydub` or `scipy`.

	## 3. MCP Server Integration (`mcp_servers/`)

	To satisfy the "MCP in Action" requirement, we will expose our core tools as MCP resources/tools.

	- [ ] Create `paper_tools_server.py`
	- Implement an MCP server that provides:
	- Tool: `read_pdf(path)`
	- Tool: `fetch_arxiv(url)`
	- Tool: `synthesize_podcast(script)`
	- This allows the "Agent" to call these tools via the MCP protocol.

	## 4. Agent Orchestration (`agents/`)

	- [ ] Implement `podcast_agent.py`
	- Create a `PodcastAgent` class.
	- Planning Loop:
	1. Receive User Input.
	2. Plan: Decide to fetch/read paper.
	3. Analyze: Extract key topics.
	4. Draft: Generate script using Phi-4-mini.
	5. Synthesize: Create audio using Maya1.
	- Use `sequential_thinking` pattern (simulated) to show "Agentic" behavior in the logs/UI.
	- Crucial: The Agent should use the MCP Client to call the tools defined in Step 3, demonstrating "Autonomous reasoning using MCP tools".

	## 5. User Interface (`app.py`)

	- [ ] Build Gradio UI
	- Input: Textbox (URL) or File Upload (PDF).
	- Output: Audio Player, Transcript Textbox, Status/Logs Markdown.
	- Agent Visualization: Show the "Thoughts" of the agent as it plans and executes (e.g., "Fetching paper...", "Analyzing structure...", "Generating script...").
	- [ ] Deployment Config
	- Create `Dockerfile` (if needed for custom deps) or rely on HF Spaces default.

	## 6. Verification & Polish

	- [ ] Test Run
	- Run with a real arXiv paper.
	- Verify audio quality and script coherence.
	- [ ] Documentation
	- Update `README.md` with usage instructions and "MCP in Action" details.
	- Record Demo Video.

	## 7. Bonus Features (Time Permitting)

	- [ ] RAG Integration: Use a vector store to answer questions about the paper after the podcast.
	- [ ] Background Music: Mix in intro/outro music.