Spaces:
Sleeping
title: User Manual Chatbot
emoji: 🔧️
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
User Manual Chatbot
Project Overview
This project is a chatbot developed as part of the Akbank GenAI Bootcamp 2025. The chatbot leverages a database of user manuals for various products to provide accurate and contextually relevant answers to technical questions. By utilizing Retrieval-Augmented Generation (RAG) technology, the chatbot retrieves relevant information from user manuals and combines it with the generative capabilities of the Gemini-2.5-flash model to deliver precise responses. The project includes a user-friendly interface built with Gradio which can be used to interact with the chatbot.
Purpose
The goal of this project is to create an intelligent chatbot capable of answering technical queries about electronic devices and products by referencing user manuals. This enables users to quickly access accurate information without manually searching through lengthy documentation.
Dataset
The dataset used in this project is sourced from the dataset described in the paper Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework. It can be accessed via this Google Drive link.
Dataset Details
- Format: Text-based user manuals for various electronic devices.
- Preprocessing: The manuals are split into overlapping chunks to facilitate efficient retrieval.
- Embedding Generation: The text chunks are converted into embeddings using the sentence-transformers/all-mpnet-base-v2 model from HuggingFace.
- Generation Script: The generate_vector_db.py script processes the dataset and generates the vector database. If the process is interrupted the generated embeddings are saved and when ran again, the program will generate new embeddings.
- Vector Database: The embeddings are stored in a Chroma vector database and can be used locally. However, a pregenerated database already exists can be used via the HuggingFace dataset syildizz/user-manuals-chromadb.
Usage
When generating the dataset, the folders that are used for the input user-manual dataset and the output Chroma dataset by the generate_vector_db.py script is specified in the config.py file.
dataset_directory = "user_manual_dataset_folder_path"
chroma_persist_directory = "chroma_dataset_folder_path"
Methods and Technologies
Solution Architecture
The chatbot employs a Retrieval-Augmented Generation (RAG) pipeline to combine information retrieval with generative AI:
- Vector Database: The embeddings are retrieved from a Chroma vector database for efficient similarity-based retrieval.
- Query Processing: When a user submits a query, the system retrieves the most relevant manual chunks using similarity search.
- Response Generation: The retrieved chunks are passed to the Gemini-2.5-flash model to generate a coherent and contextually accurate response.
- User Interface: A Gradio-based interface allows users to interact with the chatbot seamlessly.
Technologies Used
- LLM: Gemini-2.5-flash (
langchain-google-genai) - Embedding Model: sentence-transformers/all-mpnet-base-v2 (
langchain-huggingface) - Vector Database: Chroma (
langchain-chroma,chromadb) - Text Splitting:
langchain-text-splitters - Interface: Gradio (
gradio) - Environment Management:
python-dotenv,pydantic - Other Libraries:
langchain,langchain-core,langchain-community
Key Features
- RAG-based Retrieval: Ensures answers are grounded in the user manual dataset.
- Incremental Vector Database: The
generate_vector_db.pyscript supports resumable processing. - Configurability: Parameters like chunk size and overlap are adjustable in
config.py. - Interactive UI: Gradio interface for easy user interaction.
Results
The chatbot successfully answers technical questions about electronic devices by retrieving relevant information from user manuals. Key outcomes include:
- Accuracy: The RAG pipeline ensures responses are highly relevant to the query, leveraging the structured manual dataset.
- Scalability: The incremental vector database generation supports large datasets and resumable processing.
- Usability: The Gradio interface provides a seamless experience for users to query the chatbot.
- Deployment: The project is live on HuggingFace Spaces at Placeholder Link.
Setup and Installation
Prerequisites
- Python
- Git
- Virtual environment (recommended)
Installation Steps
Clone the Repository:
git clone https://github.com/syildizz/[your-repo-name].git cd [your-repo-name]Set Up a Virtual Environment:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activateInstall Dependencies: Use the requirements.txt file to install dependencies via running:
pip install -r requirements.txtConfiguration: The public configuration information is stored in the config.py file. The global parameters in the config file specified can be changed if another value is desired for the project.
Default values:
dataset_directory = "dataset" chroma_persist_directory = "chroma_db" huggingface_embedding_model_repo_path = "sentence-transformers/all-mpnet-base-v2" huggingface_vector_embedding_database_repo_path = "syildizz/user-manuals-chromadb" google_llm_model_name = "gemini-2.5-flash"Configure Environment Variables: Create a
.envfile in the project root with the following:GEMINI_API_KEY=[your-gemini-api-key] HUGGINGFACE_TOKEN=[your-huggingface-token]Generate Vector Database (Optional): If you want to generate a local vector database, run:
python generate_vector_db.pyNOTE: Do not generate a vector database if you want to pull the public pregenerated database. If a database does not exist in the next step, app.py will pull the remote pregenerated database.
Run the Application: Launch the Gradio interface:
python app.pyThe interface will be available at
http://localhost:7860.
Web Interface & Product Guide
The chatbot is deployed on HuggingFace Spaces at Placeholder Link. The Gradio-based interface allows users to:
- Enter technical questions about electronic devices.
- Receive responses grounded in user manual content.
Usage Instructions
- Visit the HuggingFace Spaces link: Placeholder Link.
- Enter a question in the text input field (e.g., "How do I reset my [device name]?").
- The chatbot will use relevant manual sections to generate a response.
Screenshots
[Placeholder: Add screenshots or a short video demonstrating the interface]
Live Demo: Placeholder HuggingFace Spaces Link