--- sdk: streamlit sdk_version: 1.50.0 --- # đź§Ş Advanced ML Sentiment Lab [![Streamlit](https://img.shields.io/badge/Powered%20by-Streamlit-FF4B4B)](https://streamlit.io/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-orange.svg)](LICENSE)
[![Made by Tarek Masryo](https://img.shields.io/badge/Made%20by-Tarek%20Masryo-blue)](https://github.com/tarekmasryo) --- ## 📌 Overview Interactive **Streamlit + Plotly** app for **binary sentiment analysis**. Upload any CSV with a **text column** and a **binary label**, then: - Run quick EDA on text lengths, tokens, and class balance - Build TF-IDF word + optional char features - Train multiple classical models (LogReg / RF / GB / Naive Bayes) - Tune the decision threshold with **FP/FN business costs** - Inspect misclassified samples and test arbitrary texts live Works well with the classic **IMDB 50K Reviews** dataset, but is generic enough for product reviews, tickets, surveys, etc. --- ## 📊 Dashboard Preview ### EDA & KPIs ![EDA](assets/eda-hero.png) ### Train & Validation ![Train & Validation](assets/train-validation.png) ### Error Analysis ![Error Analysis](assets/error-analysis.png) ### Deploy & Interactive Prediction ![Deploy](assets/deploy.png) --- ## 🚀 How to use (in this Space) 1. **Load data** - Upload a CSV file - Or place `IMDB Dataset.csv` / `imdb.csv` in the Space and reload 2. **Map columns** - Choose the **text** column - Choose the **label** column and map which values are *positive* vs *negative* 3. **Train models** - Go to **“Train & Validation”** - Set TF-IDF options, pick models, click **Train models** 4. **Analyse & deploy** - Use **“Threshold & Cost”** to pick a business-aware threshold - Check **“Compare Models”** + **“Error Analysis”** - In **“Deploy”**, try any text and see the predicted sentiment + confidence bar No data is stored server-side beyond the current session. --- ## 🧠 Under the hood - **Features** - Word TF-IDF (1–3 n-grams) - Optional char TF-IDF (3–6 n-grams) - **Models** - Logistic Regression (balanced) - Random Forest - Gradient Boosting - Multinomial Naive Bayes - **Artifacts** - Saved under `models_sentiment_lab/`: - `vectorizers.joblib`, `models.joblib`, `results.joblib`, `metadata.joblib` - Reused by Threshold, Compare, Error Analysis, and Deploy tabs --- ## 🖥 Run locally ```bash git clone https://github.com/tarekmasryo/advanced-ml-sentiment-lab.git cd advanced-ml-sentiment-lab python -m venv .venv # Windows: .venv\Scripts\activate source .venv/bin/activate pip install -r requirements.txt streamlit run app.py ``` --- ## 📄 License & credit Code: **Apache 2.0** Space & dashboard by **Tarek Masryo** 🚀