Spaces:

xeeshan404
/

rodla-academic

Sleeping

App Files Files Community

zeeshan commited on 15 days ago

Commit

0c00a14

1 Parent(s): c4ed69a

Inference Fix

Browse files

Files changed (15) hide show

deployment/backend/README.md +0 -2292
deployment/backend/README_Version_TWO.md +0 -0
deployment/backend/README_Version_Three.md +0 -0
deployment/backend/backend.py +634 -66
deployment/backend/backend_adaptive.py +0 -500
deployment/backend/backend_demo.py +0 -366
deployment/backend/backend_lite.py +0 -618
deployment/backend/{backend_two.py → backend_old.py} +0 -0
deployment/backend/perturbations/spatial.py +23 -15
deployment/backend/perturbations_simple.py +516 -0
deployment/backend/register_dino.py +68 -0
frontend/index.html +7 -1
frontend/script.js +348 -77
setup.sh +59 -0
start.sh +0 -143

deployment/backend/README.md DELETED Viewed

@@ -1,2292 +0,0 @@
-# RoDLA Document Layout Analysis API
-<div align="center">
-![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)
-![FastAPI](https://img.shields.io/badge/FastAPI-0.100+-green.svg)
-![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-red.svg)
-![License](https://img.shields.io/badge/License-MIT-yellow.svg)
-![CVPR](https://img.shields.io/badge/CVPR-2024-purple.svg)
-**A Production-Ready API for Robust Document Layout Analysis**
-[Features](#-features) • [Installation](#-installation) • [Quick Start](#-quick-start) • [API Reference](#-api-reference) • [Architecture](#-architecture) • [Metrics](#-metrics-system)
-</div>
----
-## 📋 Table of Contents
-1. [Overview](#-overview)
-2. [Features](#-features)
-3. [System Requirements](#-system-requirements)
-4. [Installation](#-installation)
-5. [Quick Start](#-quick-start)
-6. [Project Structure](#-project-structure)
-7. [Architecture Deep Dive](#-architecture-deep-dive)
-8. [Configuration](#-configuration)
-9. [API Reference](#-api-reference)
-10. [Metrics System](#-metrics-system)
-11. [Visualization Engine](#-visualization-engine)
-12. [Services Layer](#-services-layer)
-13. [Utilities Reference](#-utilities-reference)
-14. [Error Handling](#-error-handling)
-15. [Performance Optimization](#-performance-optimization)
-16. [Security Considerations](#-security-considerations)
-17. [Testing](#-testing)
-18. [Deployment](#-deployment)
-19. [Troubleshooting](#-troubleshooting)
-20. [Contributing](#-contributing)
-21. [Citation](#-citation)
-22. [License](#-license)
----
-## 🎯 Overview
-### What is RoDLA?
-RoDLA (Robust Document Layout Analysis) is a state-of-the-art deep learning model for detecting and classifying layout elements in document images. Published at **CVPR 2024**, it focuses on robustness to various perturbations including noise, blur, and geometric distortions.
-### What is this API?
-This repository provides a **production-ready FastAPI wrapper** around the RoDLA model, featuring:
-- RESTful API endpoints for document analysis
-- Comprehensive metrics calculation (20+ metrics)
-- Automated visualization generation (8 chart types)
-- Robustness assessment based on the RoDLA paper
-- Human-readable interpretation of results
-- Modular, maintainable code architecture
-### Key Statistics
-| Metric | Value |
-|--------|-------|
-| Clean mAP (M6Doc) | 70.0% |
-| Perturbed Average mAP | 61.7% |
-| mRD Score | 147.6 |
-| Max Detections/Image | 300 |
-| Supported Classes | 74 (M6Doc) |
----
-## ✨ Features
-### Core Capabilities
-| Feature | Description |
-|---------|-------------|
-| 🔍 **Multi-class Detection** | Detect 74+ document element types |
-| 📊 **Comprehensive Metrics** | 20+ analytical metrics per image |
-| 📈 **Auto Visualization** | 8 chart types generated automatically |
-| 🛡️ **Robustness Analysis** | mPE and mRD estimation |
-| 🧠 **Smart Interpretation** | Human-readable analysis summaries |
-| ⚡ **GPU Acceleration** | CUDA support for fast inference |
-| 📁 **Flexible Output** | JSON, annotated images, or both |
-### Document Element Types
-The model can detect various document elements including:
-```
-Text Elements:        Structural Elements:    Visual Elements:
-├── Paragraph         ├── Header              ├── Figure
-├── Title             ├── Footer              ├── Table
-├── Caption           ├── Page Number         ├── Chart
-├── List              ├── Section             ├── Logo
-├── Footnote          ├── Column              ├── Stamp
-└── Abstract          └── Margin              └── Equation
-```
----
-## 💻 System Requirements
-### Hardware Requirements
-| Component | Minimum | Recommended |
-|-----------|---------|-------------|
-| CPU | 4 cores | 8+ cores |
-| RAM | 16 GB | 32 GB |
-| GPU | 8 GB VRAM | 16+ GB VRAM |
-| Storage | 10 GB | 20 GB |
-### Software Requirements
-| Software | Version |
-|----------|---------|
-| Python | 3.8 - 3.10 |
-| CUDA | 11.7+ |
-| cuDNN | 8.5+ |
-| OS | Linux (Ubuntu 20.04+) / WSL2 |
-### Python Dependencies
-```
-# Core Framework
-fastapi>=0.100.0
-uvicorn>=0.23.0
-python-multipart>=0.0.6
-# ML/Deep Learning
-torch>=2.0.0
-mmdet>=3.0.0
-mmcv>=2.0.0
-# Data Processing
-numpy>=1.24.0
-pillow>=9.5.0
-# Visualization
-matplotlib>=3.7.0
-seaborn>=0.12.0
-# Utilities
-pydantic>=2.0.0
-```
----
-## 🚀 Installation
-### Step 1: Clone the Repository
-```bash
-git clone https://github.com/yourusername/rodla-api.git
-cd rodla-api
-```
-### Step 2: Create Virtual Environment
-```bash
-# Using conda (recommended)
-conda create -n rodla python=3.9
-conda activate rodla
-# Or using venv
-python -m venv venv
-source venv/bin/activate  # Linux/Mac
-.\venv\Scripts\activate   # Windows
-```
-### Step 3: Install PyTorch with CUDA
-```bash
-# For CUDA 11.8
-pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
-# For CUDA 12.1
-pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
-```
-### Step 4: Install MMDetection
-```bash
-pip install -U openmim
-mim install mmengine
-mim install mmcv>=2.0.0
-mim install mmdet>=3.0.0
-```
-### Step 5: Install Project Dependencies
-```bash
-pip install -r requirements.txt
-```
-### Step 6: Download Model Weights
-```bash
-# Download from official source
-wget https://path-to-weights/rodla_internimage_xl_m6doc.pth -O weights/rodla_internimage_xl_m6doc.pth
-```
-### Step 7: Configure Paths
-Edit `config/settings.py`:
-```python
-REPO_ROOT = Path("/path/to/your/RoDLA")
-MODEL_CONFIG = REPO_ROOT / "model/configs/m6doc/rodla_internimage_xl_m6doc.py"
-MODEL_WEIGHTS = REPO_ROOT / "rodla_internimage_xl_m6doc.pth"
-```
----
-## ⚡ Quick Start
-### Starting the Server
-```bash
-# Development mode
-python backend.py
-# Production mode with uvicorn
-uvicorn backend:app --host 0.0.0.0 --port 8000 --workers 1
-```
-### Making Your First Request
-```bash
-# Using curl
-curl -X POST "http://localhost:8000/api/detect" \
-  -H "accept: application/json" \
-  -F "file=@document.jpg" \
-  -F "score_thr=0.3"
-# Get model information
-curl http://localhost:8000/api/model-info
-```
-### Python Client Example
-```python
-import requests
-# Upload and analyze document
-with open("document.pdf", "rb") as f:
-    response = requests.post(
-        "http://localhost:8000/api/detect",
-        files={"file": f},
-        data={
-            "score_thr": "0.3",
-            "return_image": "false",
-            "generate_visualizations": "true"
-        }
-    )
-result = response.json()
-print(f"Detected {result['core_results']['summary']['total_detections']} elements")
-```
----
-## 📁 Project Structure
-```
-deployment/
-├── backend.py                 # 🚀 Main FastAPI application entry point
-├── requirements.txt           # 📦 Python dependencies
-├── README.md                  # 📖 This documentation
-│
-├── config/                    # ⚙️ Configuration Layer
-│   ├── __init__.py           #    Package initializer
-│   └── settings.py           #    All configuration constants
-│
-├── core/                      # 🧠 Core Application Layer
-│   ├── __init__.py           #    Package initializer
-│   ├── model_loader.py       #    Singleton model management
-│   └── dependencies.py       #    FastAPI dependency injection
-│
-├── api/                       # 🌐 API Layer
-│   ├── __init__.py           #    Package initializer
-│   ├── routes.py             #    API endpoint definitions
-│   └── schemas.py            #    Pydantic request/response models
-│
-├── services/                  # 🔧 Business Logic Layer
-│   ├── __init__.py           #    Package initializer
-│   ├── detection.py          #    Core detection logic
-│   ├── processing.py         #    Result aggregation
-│   ├── visualization.py      #    Chart generation (350+ lines)
-│   └── interpretation.py     #    Human-readable insights
-│
-├── utils/                     # 🛠️ Utility Layer
-│   ├── __init__.py           #    Package initializer
-│   ├── helpers.py            #    General helper functions
-│   ├── serialization.py      #    JSON conversion utilities
-│   └── metrics/              #    Metrics calculation modules
-│       ├── __init__.py       #    Metrics package initializer
-│       ├── core.py           #    Core detection metrics
-│       ├── rodla.py          #    RoDLA-specific metrics
-│       ├── spatial.py        #    Spatial distribution analysis
-│       └── quality.py        #    Quality & complexity metrics
-│
-└── outputs/                   # 📤 Output Directory
-    ├── *.json                #    Detection results
-    └── *.png                 #    Visualization images
-```
-### File Count Summary
-| Layer | Files | Purpose |
-|-------|-------|---------|
-| Config | 2 | Configuration management |
-| Core | 3 | Model and dependency management |
-| API | 3 | HTTP endpoints and schemas |
-| Services | 5 | Business logic implementation |
-| Utils | 7 | Helper functions and metrics |
-| **Total** | **21** | Complete modular architecture |
----
-## 🏗️ Architecture Deep Dive
-### Layered Architecture
-```
-┌─────────────────────────────────────────────────────────────┐
-│                      CLIENT LAYER                           │
-│              (Web Browser / API Clients)                    │
-└─────────────────────────┬───────────────────────────────────┘
-                          │ HTTP Requests
-                          ▼
-┌─��───────────────────────────────────────────────────────────┐
-│                       API LAYER                             │
-│                    api/routes.py                            │
-│  ┌─────────────────┐  ┌─────────────────────────────────┐  │
-│  │ GET /model-info │  │ POST /api/detect                │  │
-│  └─────────────────┘  └─────────────────────────────────┘  │
-└─────────────────────────┬───────────────────────────────────┘
-                          │ Validated Requests
-                          ▼
-┌─────────────────────────────────────────────────────────────┐
-│                    SERVICES LAYER                           │
-│  ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐│
-│  │ detection.py │ │processing.py │ │ visualization.py     ││
-│  │              │ │              │ │                      ││
-│  │ • Inference  │ │ • Aggregate  │ │ • 8 Chart Types      ││
-│  │ • Processing │ │ • Save JSON  │ │ • Base64 Encoding    ││
-│  └──────────────┘ └──────────────┘ └──────────────────────┘│
-│  ┌──────────────────────────────────────────────────────┐  │
-│  │               interpretation.py                       │  │
-│  │         • Human-readable insights                     │  │
-│  └──────────────────────────────────────────────────────┘  │
-└─────────────────────────┬───────────────────────────────────┘
-                          │ Data Processing
-                          ▼
-┌─────────────────────────────────────────────────────────────┐
-│                    UTILITIES LAYER                          │
-│  ┌────────────────────────────────────────────────────────┐│
-│  │                  utils/metrics/                        ││
-│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────────┐  ││
-│  │  │ core.py │ │rodla.py │ │spatial. │ │ quality.py  │  ││
-│  │  │         │ │         │ │  py     │ │             │  ││
-│  │  └─────────┘ └─────────┘ └─────────┘ └─────────────┘  ││
-│  └────────────────────────────────────────────────────────┘│
-│  ┌──────────────┐ ┌────────────────────────────────────┐   │
-│  │ helpers.py   │ │ serialization.py                   │   │
-│  └──────────────┘ └────────────────────────────────────┘   │
-└─────────────────────────┬───────────────────────────────────┘
-                          │ Model Operations
-                          ▼
-┌─────────────────────────────────────────────────────────────┐
-│                      CORE LAYER                             │
-│  ┌─────────────────────────┐ ┌───────────────────────────┐ │
-│  │    model_loader.py      │ │    dependencies.py        │ │
-│  │                         │ │                           │ │
-│  │ • Singleton Pattern     │ │ • FastAPI DI              │ │
-│  │ • GPU Management        │ │ • Model Injection         │ │
-│  │ • Lazy Loading          │ │                           │ │
-│  └─────────────────────────┘ └───────────────────────────┘ │
-└─────────────────────────┬───────────────────────────────────┘
-                          │ Configuration
-                          ▼
-┌─────────────────────────────────────────────────────────────┐
-│                    CONFIG LAYER                             │
-│                   config/settings.py                        │
-│  • Paths  • Constants  • Baseline Metrics  • Thresholds    │
-└─────────────────────────────────────────────────────────────┘
-```
-### Design Patterns Used
-| Pattern | Location | Purpose |
-|---------|----------|---------|
-| **Singleton** | `model_loader.py` | Single model instance |
-| **Factory** | `visualization.py` | Create multiple chart types |
-| **Dependency Injection** | `dependencies.py` | Inject model into routes |
-| **Repository** | `processing.py` | Abstract data persistence |
-| **Facade** | `routes.py` | Simplify complex subsystems |
-| **Strategy** | `metrics/` | Interchangeable metric algorithms |
-### Data Flow Diagram
-```
-┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
-│  Image   │───▶│  Upload  │───▶│  Temp    │───▶│  Model   │
-│  File    │    │  Handler │    │  File    │    │ Inference│
-└──────────┘    └──────────┘    └──────────┘    └────┬─────┘
-                                                     │
-     ┌───────────────────────────────────────────────┘
-     ▼
-┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
-│  Raw     │───▶│ Process  │───▶│ Calculate│───▶│ Generate │
-│  Results │    │Detections│    │ Metrics  │    │  Viz     │
-└──────────┘    └──────────┘    └──────────┘    └────┬─────┘
-                                                     │
-     ┌───────────────────────────────────────────────┘
-     ▼
-┌──────────┐    ┌──────────┐    ┌──────────┐
-│ Generate │───▶│ Assemble │───▶│  JSON    │
-│ Interp.  │    │ Response │    │ Response │
-└──────────┘    └──────────┘    └──────────┘
-```
----
-## ⚙️ Configuration
-### config/settings.py
-This file centralizes all configuration parameters.
-```python
-"""
-Configuration Settings Module
-=============================
-All application constants and configuration in one place.
-"""
-from pathlib import Path
-# =============================================================================
-# PATH CONFIGURATION
-# =============================================================================
-# Root directory of the RoDLA model repository
-REPO_ROOT = Path("/mnt/d/MyStuff/University/Current/CV/Project/RoDLA")
-# Model configuration file path
-MODEL_CONFIG = REPO_ROOT / "model/configs/m6doc/rodla_internimage_xl_m6doc.py"
-# Pre-trained model weights path
-MODEL_WEIGHTS = REPO_ROOT / "rodla_internimage_xl_m6doc.pth"
-# Output directory for results and visualizations
-OUTPUT_DIR = Path("outputs")
-# =============================================================================
-# MODEL CONFIGURATION
-# =============================================================================
-# Default confidence threshold for detections
-DEFAULT_SCORE_THRESHOLD = 0.3
-# Maximum number of detections per image
-MAX_DETECTIONS = 300
-# Model metadata
-MODEL_INFO = {
-    "name": "RoDLA InternImage-XL",
-    "paper": "RoDLA: Benchmarking the Robustness of Document Layout Analysis Models",
-    "conference": "CVPR 2024",
-    "backbone": "InternImage-XL",
-    "framework": "DINO with Channel Attention + Average Pooling",
-    "dataset": "M6Doc-P"
-}
-# =============================================================================
-# BASELINE PERFORMANCE METRICS
-# =============================================================================
-# Clean performance baselines from the RoDLA paper (mAP scores)
-BASELINE_MAP = {
-    "M6Doc": 70.0,      # Main evaluation dataset
-    "PubLayNet": 96.0,  # Scientific documents
-    "DocLayNet": 80.5   # Diverse document types
-}
-# State-of-the-art performance metrics
-SOTA_PERFORMANCE = {
-    "clean_mAP": 70.0,
-    "perturbed_avg_mAP": 61.7,
-    "mRD_score": 147.6
-}
-# =============================================================================
-# ANALYSIS THRESHOLDS
-# =============================================================================
-# Size distribution thresholds (as percentage of image area)
-SIZE_THRESHOLDS = {
-    "tiny": 0.005,      # < 0.5% of image
-    "small": 0.02,      # 0.5% - 2%
-    "medium": 0.1,      # 2% - 10%
-    "large": 1.0        # >= 10%
-}
-# Confidence level thresholds
-CONFIDENCE_THRESHOLDS = {
-    "very_high": 0.9,
-    "high": 0.8,
-    "medium": 0.6,
-    "low": 0.4
-}
-# Robustness assessment thresholds
-ROBUSTNESS_THRESHOLDS = {
-    "mPE_low": 20,
-    "mPE_medium": 40,
-    "mRD_excellent": 100,
-    "mRD_good": 150,
-    "cv_stable": 0.15,
-    "cv_moderate": 0.30
-}
-# Complexity scoring weights
-COMPLEXITY_WEIGHTS = {
-    "class_diversity": 30,
-    "detection_count": 30,
-    "density": 20,
-    "clustering": 20
-}
-# =============================================================================
-# API CONFIGURATION
-# =============================================================================
-# CORS settings
-CORS_ORIGINS = ["*"]  # Restrict in production
-CORS_METHODS = ["*"]
-CORS_HEADERS = ["*"]
-# API metadata
-API_TITLE = "RoDLA Object Detection API"
-API_VERSION = "1.0.0"
-API_DESCRIPTION = "Production-ready API for Robust Document Layout Analysis"
-# =============================================================================
-# VISUALIZATION CONFIGURATION
-# =============================================================================
-# Figure sizes for different chart types
-FIGURE_SIZES = {
-    "bar_chart": (12, 6),
-    "histogram": (10, 6),
-    "heatmap": (10, 8),
-    "boxplot": (12, 6),
-    "scatter": (10, 6),
-    "pie": (8, 8)
-}
-# Color schemes
-COLOR_SCHEMES = {
-    "primary": "steelblue",
-    "secondary": "forestgreen",
-    "accent": "coral",
-    "heatmap": "YlOrRd",
-    "scatter": "viridis"
-}
-# DPI for saved images
-VISUALIZATION_DPI = 100
-```
-### Environment Variables
-For production deployments, use environment variables:
-```bash
-# .env file
-RODLA_REPO_ROOT=/path/to/RoDLA
-RODLA_MODEL_CONFIG=model/configs/m6doc/rodla_internimage_xl_m6doc.py
-RODLA_MODEL_WEIGHTS=rodla_internimage_xl_m6doc.pth
-RODLA_OUTPUT_DIR=outputs
-RODLA_DEFAULT_THRESHOLD=0.3
-RODLA_API_HOST=0.0.0.0
-RODLA_API_PORT=8000
-```
----
-## 🌐 API Reference
-### Endpoints Overview
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| GET | `/api/model-info` | Get model metadata |
-| POST | `/api/detect` | Analyze document image |
-| GET | `/health` | Health check (if implemented) |
-| GET | `/docs` | Swagger UI documentation |
-| GET | `/redoc` | ReDoc documentation |
----
-### GET /api/model-info
-Returns comprehensive information about the loaded model.
-#### Request
-```http
-GET /api/model-info HTTP/1.1
-Host: localhost:8000
-```
-#### Response
-```json
-{
-    "model_name": "RoDLA InternImage-XL",
-    "paper": "RoDLA: Benchmarking the Robustness of Document Layout Analysis Models (CVPR 2024)",
-    "num_classes": 74,
-    "classes": [
-        "paragraph", "title", "figure", "table", "caption",
-        "header", "footer", "page_number", "list", "abstract",
-        // ... additional classes
-    ],
-    "backbone": "InternImage-XL",
-    "detection_framework": "DINO with Channel Attention + Average Pooling",
-    "dataset": "M6Doc-P",
-    "max_detections_per_image": 300,
-    "state_of_the_art_performance": {
-        "clean_mAP": 70.0,
-        "perturbed_avg_mAP": 61.7,
-        "mRD_score": 147.6
-    }
-}
-```
-#### Error Responses
-| Status | Description |
-|--------|-------------|
-| 500 | Model not loaded |
----
-### POST /api/detect
-Analyzes a document image and returns comprehensive detection results.
-#### Request
-```http
-POST /api/detect HTTP/1.1
-Host: localhost:8000
-Content-Type: multipart/form-data
-file: <binary image data>
-score_thr: "0.3"
-return_image: "false"
-save_json: "true"
-generate_visualizations: "true"
-```
-#### Parameters
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `file` | File | Required | Image file (JPEG, PNG, etc.) |
-| `score_thr` | string | "0.3" | Confidence threshold (0.0-1.0) |
-| `return_image` | string | "false" | Return annotated image instead of JSON |
-| `save_json` | string | "true" | Save results to disk |
-| `generate_visualizations` | string | "true" | Generate visualization charts |
-#### Response (JSON Mode)
-```json
-{
-    "success": true,
-    "timestamp": "2024-01-15T10:30:45.123456",
-    "filename": "document.jpg",
-    "image_info": {
-        "width": 2480,
-        "height": 3508,
-        "aspect_ratio": 0.707,
-        "total_pixels": 8699840
-    },
-    "detection_config": {
-        "score_threshold": 0.3,
-        "model": "RoDLA InternImage-XL",
-        "framework": "DINO with Robustness Enhancement",
-        "max_detections": 300
-    },
-    "core_results": {
-        "summary": {
-            "total_detections": 47,
-            "unique_classes": 12,
-            "average_confidence": 0.7823,
-            "median_confidence": 0.8156,
-            "min_confidence": 0.3012,
-            "max_confidence": 0.9876,
-            "coverage_percentage": 68.45,
-            "average_detection_area": 126543.21
-        },
-        "detections": [/* top 20 detections */]
-    },
-    "rodla_metrics": {
-        "note": "Estimated metrics...",
-        "estimated_mPE": 18.45,
-        "estimated_mRD": 87.32,
-        "confidence_std": 0.1234,
-        "confidence_range": 0.6864,
-        "robustness_score": 56.34,
-        "interpretation": {
-            "mPE_level": "low",
-            "mRD_level": "excellent",
-            "overall_robustness": "medium"
-        }
-    },
-    "spatial_analysis": {
-        "horizontal_distribution": {...},
-        "vertical_distribution": {...},
-        "quadrant_distribution": {...},
-        "size_distribution": {...},
-        "density_metrics": {...}
-    },
-    "class_analysis": {
-        "paragraph": {
-            "count": 15,
-            "percentage": 31.91,
-            "confidence_stats": {...},
-            "area_stats": {...},
-            "aspect_ratio_stats": {...}
-        },
-        // ... other classes
-    },
-    "confidence_analysis": {
-        "distribution": {...},
-        "binned_distribution": {...},
-        "percentages": {...},
-        "entropy": 2.3456
-    },
-    "robustness_indicators": {
-        "stability_score": 87.65,
-        "coefficient_of_variation": 0.1234,
-        "high_confidence_ratio": 0.7234,
-        "prediction_consistency": "high",
-        "model_certainty": "medium",
-        "robustness_rating": {
-            "rating": "good",
-            "score": 72.34
-        }
-    },
-    "layout_complexity": {
-        "class_diversity": 12,
-        "total_elements": 47,
-        "detection_density": 5.41,
-        "average_element_distance": 234.56,
-        "complexity_score": 58.23,
-        "complexity_level": "moderate",
-        "layout_characteristics": {
-            "is_dense": true,
-            "is_diverse": true,
-            "is_structured": false
-        }
-    },
-    "quality_metrics": {
-        "overlap_analysis": {...},
-        "size_consistency": {...},
-        "detection_quality_score": 82.45
-    },
-    "visualizations": {
-        "class_distribution": "data:image/png;base64,...",
-        "confidence_distribution": "data:image/png;base64,...",
-        "spatial_heatmap": "data:image/png;base64,...",
-        "confidence_by_class": "data:image/png;base64,...",
-        "area_vs_confidence": "data:image/png;base64,...",
-        "quadrant_distribution": "data:image/png;base64,...",
-        "size_distribution": "data:image/png;base64,...",
-        "top_classes_confidence": "data:image/png;base64,..."
-    },
-    "interpretation": {
-        "overview": "Document Analysis Summary...",
-        "top_elements": "The most common elements are...",
-        "rodla_analysis": "RoDLA Robustness Analysis...",
-        "layout_complexity": "Layout Complexity...",
-        "key_findings": [...],
-        "perturbation_assessment": "...",
-        "recommendations": [...],
-        "confidence_summary": {...}
-    },
-    "all_detections": [/* complete detection list */]
-}
-```
-#### Response (Image Mode)
-When `return_image=true`, returns the annotated image directly:
-```http
-HTTP/1.1 200 OK
-Content-Type: image/jpeg
-Content-Disposition: attachment; filename="annotated_document.jpg"
-<binary image data>
-```
-#### Error Responses
-| Status | Description |
-|--------|-------------|
-| 400 | Invalid file type (not an image) |
-| 500 | Model inference failed |
-| 500 | Visualization generation failed |
----
-## 📊 Metrics System
-### Metrics Architecture
-```
-utils/metrics/
-├── __init__.py          # Exports all metric functions
-├── core.py              # Core detection metrics
-├── rodla.py             # RoDLA-specific robustness metrics
-├── spatial.py           # Spatial distribution analysis
-└── quality.py           # Quality and complexity metrics
-```
-### Core Metrics (utils/metrics/core.py)
-#### `calculate_core_metrics(detections, img_width, img_height)`
-Computes fundamental detection statistics.
-| Metric | Type | Description |
-|--------|------|-------------|
-| `total_detections` | int | Number of detected elements |
-| `unique_classes` | int | Number of distinct element types |
-| `average_confidence` | float | Mean confidence score |
-| `median_confidence` | float | Median confidence score |
-| `min_confidence` | float | Lowest confidence |
-| `max_confidence` | float | Highest confidence |
-| `coverage_percentage` | float | % of image covered by detections |
-| `average_detection_area` | float | Mean area per detection |
-#### `calculate_class_metrics(detections)`
-Per-class statistical analysis.
-```python
-{
-    "paragraph": {
-        "count": 15,
-        "percentage": 31.91,
-        "confidence_stats": {
-            "mean": 0.8234,
-            "std": 0.0876,
-            "min": 0.6543,
-            "max": 0.9654
-        },
-        "area_stats": {
-            "mean": 125432.5,
-            "std": 45678.2,
-            "total": 1881487.5
-        },
-        "aspect_ratio_stats": {
-            "mean": 2.345,
-            "orientation": "horizontal"  # horizontal/vertical/square
-        }
-    }
-}
-```
-#### `calculate_confidence_metrics(detections)`
-Detailed confidence distribution analysis.
-| Component | Description |
-|-----------|-------------|
-| `distribution` | Statistical measures (mean, median, std, quartiles) |
-| `binned_distribution` | Count per confidence range |
-| `percentages` | Percentage per confidence range |
-| `entropy` | Shannon entropy of distribution |
-**Confidence Bins:**
-- Very High: 0.9 - 1.0
-- High: 0.8 - 0.9
-- Medium: 0.6 - 0.8
-- Low: 0.4 - 0.6
-- Very Low: 0.0 - 0.4
----
-### RoDLA Metrics (utils/metrics/rodla.py)
-These metrics are specific to the RoDLA paper's robustness evaluation framework.
-#### `calculate_rodla_metrics(detections, core_metrics)`
-Estimates perturbation effects and robustness degradation.
-| Metric | Formula | Interpretation |
-|--------|---------|----------------|
-| `estimated_mPE` | `(conf_std × 100) + (conf_range × 50)` | Mean Perturbation Effect |
-| `estimated_mRD` | `(degradation / mPE) × 100` | Mean Robustness Degradation |
-| `robustness_score` | `(1 - mRD/200) × 100` | Overall robustness (0-100) |
-**mPE Interpretation:**
-```
-low:    mPE < 20   → Minimal perturbation effect
-medium: 20 ≤ mPE < 40 → Moderate perturbation
-high:   mPE ≥ 40   → Significant perturbation
-```
-**mRD Interpretation:**
-```
-excellent:        mRD < 100  → Highly robust
-good:             100 ≤ mRD < 150 → Acceptable robustness
-needs_improvement: mRD ≥ 150 → Robustness concerns
-```
-#### `calculate_robustness_indicators(detections, core_metrics)`
-Stability and consistency metrics.
-```python
-{
-    "stability_score": 87.65,        # (1 - CV) × 100
-    "coefficient_of_variation": 0.12, # std / mean
-    "high_confidence_ratio": 0.72,   # % with conf ≥ 0.8
-    "prediction_consistency": "high", # Based on CV
-    "model_certainty": "medium",      # Based on avg conf
-    "robustness_rating": {
-        "rating": "good",             # excellent/good/fair/poor
-        "score": 72.34                # Composite score
-    }
-}
-```
-**Robustness Rating Formula:**
-```
-score = (avg_conf × 40) + ((1 - CV) × 30) + (high_conf_ratio × 30)
-Rating:
-- excellent: score ≥ 80
-- good:      60 ≤ score < 80
-- fair:      40 ≤ score < 60
-- poor:      score < 40
-```
----
-### Spatial Metrics (utils/metrics/spatial.py)
-#### `calculate_spatial_analysis(detections, img_width, img_height)`
-Comprehensive spatial distribution analysis.
-##### Horizontal Distribution
-```python
-{
-    "mean": 1240.5,           # Mean x-coordinate
-    "std": 456.7,             # Standard deviation
-    "skewness": -0.234,       # Distribution asymmetry
-    "left_third": 12,         # Count in left 33%
-    "center_third": 25,       # Count in center 33%
-    "right_third": 10         # Count in right 33%
-}
-```
-##### Vertical Distribution
-```python
-{
-    "mean": 1754.2,           # Mean y-coordinate
-    "std": 892.4,             # Standard deviation
-    "skewness": 0.156,        # Distribution asymmetry
-    "top_third": 8,           # Count in top 33%
-    "middle_third": 22,       # Count in middle 33%
-    "bottom_third": 17        # Count in bottom 33%
-}
-```
-##### Quadrant Distribution
-```
-Document divided into 4 quadrants:
-┌─────────┬─────────┐
-│   Q1    │   Q2    │
-│(top-L)  │(top-R)  │
-├─────────┼─────────┤
-│   Q3    │   Q4    │
-│(bot-L)  │(bot-R)  │
-└─────────┴─────────┘
-```
-##### Size Distribution
-| Category | Threshold | Description |
-|----------|-----------|-------------|
-| tiny | < 0.5% of image | Very small elements |
-| small | 0.5% - 2% | Small elements |
-| medium | 2% - 10% | Medium elements |
-| large | ≥ 10% | Large elements |
-##### Density Metrics
-```python
-{
-    "average_nearest_neighbor_distance": 234.56,  # pixels
-    "spatial_clustering_score": 0.67              # 0-1, higher = more clustered
-}
-```
----
-### Quality Metrics (utils/metrics/quality.py)
-#### `calculate_layout_complexity(detections, img_width, img_height)`
-Quantifies document structure complexity.
-**Complexity Score Formula:**
-```
-score = (class_diversity / 20) × 30      # Max 20 classes
-      + min(detections / 50, 1) × 30     # Detection count
-      + min(density / 10, 1) × 20        # Spatial density
-      + (1 - min(avg_dist / 500, 1)) × 20 # Clustering
-```
-**Complexity Levels:**
-| Level | Score Range | Description |
-|-------|-------------|-------------|
-| simple | < 30 | Basic document layout |
-| moderate | 30 - 60 | Average complexity |
-| complex | ≥ 60 | Complex multi-element layout |
-**Layout Characteristics:**
-```python
-{
-    "is_dense": True,       # density > 5 elements/megapixel
-    "is_diverse": True,     # unique_classes ≥ 10
-    "is_structured": False  # avg_distance < 200 pixels
-}
-```
-#### `calculate_quality_metrics(detections, img_width, img_height)`
-Detection quality assessment.
-##### Overlap Analysis
-```python
-{
-    "total_overlapping_pairs": 5,    # Number of overlapping detection pairs
-    "overlap_percentage": 10.64,      # % of detections with overlaps
-    "average_iou": 0.1234             # Mean IoU of overlapping pairs
-}
-```
-##### Size Consistency
-```python
-{
-    "coefficient_of_variation": 0.876,  # std/mean of areas
-    "consistency_level": "medium"        # high (<0.5), medium (0.5-1), low (>1)
-}
-```
-##### Detection Quality Score
-```
-score = (1 - min(overlap_% / 100, 1)) × 50 + (1 - min(size_cv, 1)) × 50
-```
----
-## 📈 Visualization Engine
-### services/visualization.py
-The visualization engine generates 8 distinct chart types, each providing unique insights into the detection results.
-### Chart Types
-#### 1. Class Distribution Bar Chart
-```
-Purpose: Show count of detections per class
-Type: Vertical bar chart
-Features:
-  - Sorted by count (descending)
-  - Value labels on bars
-  - Rotated x-axis labels for readability
-  - Grid lines for easy reading
-```
-#### 2. Confidence Distribution Histogram
-```
-Purpose: Show distribution of confidence scores
-Type: Histogram with 20 bins
-Features:
-  - Mean line (red dashed)
-  - Median line (orange dashed)
-  - Legend with exact values
-  - Grid lines
-```
-#### 3. Spatial Distribution Heatmap
-```
-Purpose: Visualize where detections are concentrated
-Type: 2D histogram heatmap
-Features:
-  - YlOrRd colormap (yellow to red)
-  - Colorbar showing density
-  - Axes showing pixel coordinates
-```
-#### 4. Confidence by Class Box Plot
-```
-Purpose: Compare confidence distributions across classes
-Type: Box plot
-Features:
-  - Top 10 classes by count
-  - Sample sizes in labels
-  - Median, quartiles, outliers
-  - Light blue boxes
-```
-#### 5. Area vs Confidence Scatter Plot
-```
-Purpose: Examine relationship between size and confidence
-Type: Scatter plot
-Features:
-  - Color-coded by confidence (viridis)
-  - Colorbar showing scale
-  - Grid for reading values
-```
-#### 6. Quadrant Distribution Pie Chart
-```
-Purpose: Show spatial distribution by quadrant
-Type: Pie chart
-Features:
-  - 4 segments (Q1-Q4)
-  - Percentage labels
-  - Element counts in labels
-  - Distinct colors per quadrant
-```
-#### 7. Size Distribution Bar Chart
-```
-Purpose: Show distribution of detection sizes
-Type: Vertical bar chart
-Features:
-  - 4 categories (tiny, small, medium, large)
-  - Distinct color per category
-  - Value labels on bars
-```
-#### 8. Top Classes by Average Confidence
-```
-Purpose: Identify most confidently detected classes
-Type: Horizontal bar chart
-Features:
-  - Top 15 classes
-  - Sorted by confidence
-  - Value labels
-  - Coral color scheme
-```
-### Technical Implementation
-```python
-def generate_comprehensive_visualizations(
-    detections: List[dict],
-    class_metrics: dict,
-    confidence_metrics: dict,
-    spatial_metrics: dict,
-    img_width: int,
-    img_height: int
-) -> dict:
-    """
-    Generate all visualization types.
-    Returns:
-        Dictionary with base64-encoded PNG images
-    """
-    visualizations = {}
-    # Each visualization wrapped in try-except for isolation
-    try:
-        fig, ax = plt.subplots(figsize=(12, 6))
-        # ... chart generation code ...
-        visualizations['chart_name'] = fig_to_base64(fig)
-        plt.close(fig)  # Prevent memory leaks
-    except Exception as e:
-        print(f"Error generating chart: {e}")
-    return visualizations
-```
-### Base64 Encoding
-```python
-def fig_to_base64(fig) -> str:
-    """Convert matplotlib figure to base64 data URI."""
-    buffer = BytesIO()
-    fig.savefig(buffer, format='png', dpi=100, bbox_inches='tight')
-    buffer.seek(0)
-    image_base64 = base64.b64encode(buffer.read()).decode()
-    buffer.close()
-    return f"data:image/png;base64,{image_base64}"
-```
-### Usage in HTML
-```html
-<img src="{{ visualizations.class_distribution }}" alt="Class Distribution">
-```
----
-## 🔧 Services Layer
-### services/detection.py
-Core detection logic and result processing.
-#### `process_detections(result, score_thr=0.3)`
-Converts raw model output to structured format.
-**Input:** Raw MMDetection result (list of arrays per class)
-**Output:** List of detection dictionaries
-```python
-[
-    {
-        "class_id": 0,
-        "class_name": "paragraph",
-        "bbox": {
-            "x1": 100.5, "y1": 200.3,
-            "x2": 500.8, "y2": 350.2,
-            "width": 400.3, "height": 149.9,
-            "center_x": 300.65, "center_y": 275.25
-        },
-        "confidence": 0.9234,
-        "area": 60005.0,
-        "aspect_ratio": 2.67
-    },
-    // ... more detections
-]
-```
-**Processing Steps:**
-1. Iterate through class results
-2. Filter by confidence threshold
-3. Extract coordinates and calculate derived values
-4. Sort by confidence (descending)
----
-### services/processing.py
-Result aggregation and persistence.
-#### `aggregate_results(...)`
-Assembles the complete response object.
-```python
-def aggregate_results(
-    detections: List[dict],
-    core_metrics: dict,
-    rodla_metrics: dict,
-    spatial_metrics: dict,
-    class_metrics: dict,
-    confidence_metrics: dict,
-    robustness_indicators: dict,
-    layout_complexity: dict,
-    quality_metrics: dict,
-    visualizations: dict,
-    interpretation: dict,
-    file_info: dict,
-    config: dict
-) -> dict:
-    """Combine all analysis results into final response."""
-    return {
-        "success": True,
-        "timestamp": datetime.now().isoformat(),
-        # ... all components ...
-    }
-```
-#### `save_results(results, filename, output_dir)`
-Persists results to disk.
-```python
-def save_results(results: dict, filename: str, output_dir: Path) -> Path:
-    """
-    Save results as JSON file.
-    - Removes visualizations to reduce file size
-    - Converts numpy types to Python native
-    - Saves visualizations as separate PNG files
-    """
-    json_path = output_dir / f"rodla_results_{filename}.json"
-    # ... save logic ...
-    return json_path
-```
----
-### services/interpretation.py
-Human-readable insight generation.
-#### `generate_comprehensive_interpretation(...)`
-Creates natural language analysis of results.
-**Output Sections:**
-| Section | Description |
-|---------|-------------|
-| `overview` | High-level summary paragraph |
-| `top_elements` | Description of most common elements |
-| `rodla_analysis` | Robustness assessment summary |
-| `layout_complexity` | Complexity analysis text |
-| `key_findings` | List of important observations |
-| `perturbation_assessment` | Perturbation effect analysis |
-| `recommendations` | Actionable suggestions |
-| `confidence_summary` | Confidence level summary |
-**Example Output:**
-```python
-{
-    "overview": """Document Analysis Summary:
-Detected 47 layout elements across 12 different classes.
-The model achieved an average confidence of 78.2%, indicating
-medium certainty in predictions. The detected elements cover
-68.5% of the document area.""",
-    "key_findings": [
-        "✓ Excellent detection confidence - model is highly certain",
-        "✓ High document coverage - most of the page contains elements",
-        "ℹ Complex document structure with diverse element types"
-    ],
-    "recommendations": [
-        "No specific recommendations - detection quality is good"
-    ]
-}
-```
----
-## 🛠️ Utilities Reference
-### utils/helpers.py
-General-purpose helper functions.
-#### Mathematical Functions
-| Function | Purpose | Formula |
-|----------|---------|---------|
-| `calculate_skewness(data)` | Distribution asymmetry | `mean(((x - μ) / σ)³)` |
-| `calculate_entropy(values)` | Information content | `-Σ(p × log₂(p))` |
-| `calculate_avg_nn_distance(xs, ys)` | Average nearest neighbor | Mean of min distances |
-| `calculate_clustering_score(xs, ys)` | Spatial clustering | `1 - (std / mean)` |
-| `calculate_iou(bbox1, bbox2)` | Intersection over Union | `intersection / union` |
-#### Utility Functions
-```python
-def calculate_detection_overlaps(detections: List[dict]) -> dict:
-    """
-    Find all overlapping detection pairs.
-    Returns:
-        {
-            'count': int,        # Number of overlapping pairs
-            'percentage': float, # % of detections with overlaps
-            'avg_iou': float     # Mean IoU of overlaps
-        }
-    """
-```
----
-### utils/serialization.py
-JSON conversion utilities.
-#### `convert_to_json_serializable(obj)`
-Recursively converts numpy types to Python native types.
-**Conversions:**
-| NumPy Type | Python Type |
-|------------|-------------|
-| `np.integer` | `int` |
-| `np.floating` | `float` |
-| `np.ndarray` | `list` |
-| `np.bool_` | `bool` |
-```python
-def convert_to_json_serializable(obj):
-    """
-    Recursively convert numpy types for JSON serialization.
-    Handles:
-    - Dictionaries (recursive)
-    - Lists (recursive)
-    - NumPy scalars and arrays
-    - Native Python types (pass-through)
-    """
-    if isinstance(obj, dict):
-        return {k: convert_to_json_serializable(v) for k, v in obj.items()}
-    elif isinstance(obj, list):
-        return [convert_to_json_serializable(item) for item in obj]
-    elif isinstance(obj, np.integer):
-        return int(obj)
-    elif isinstance(obj, np.floating):
-        return float(obj)
-    elif isinstance(obj, np.ndarray):
-        return obj.tolist()
-    elif isinstance(obj, np.bool_):
-        return bool(obj)
-    return obj
-```
----
-## ⚠️ Error Handling
-### Exception Hierarchy
-```
-Exception
-├── HTTPException (FastAPI)
-│   ├── 400 Bad Request
-│   │   └── Invalid file type
-│   └── 500 Internal Server Error
-│       ├── Model not loaded
-│       ├── Inference failed
-│       └── Processing error
-└── Standard Exceptions
-    ├── FileNotFoundError
-    ├── ValueError
-    └── RuntimeError
-```
-### Error Handling Strategy
-```python
-@app.post("/api/detect")
-async def detect_objects(...):
-    tmp_path = None
-    try:
-        # Main processing logic
-        ...
-    except HTTPException:
-        # Re-raise HTTP exceptions unchanged
-        if tmp_path and os.path.exists(tmp_path):
-            os.unlink(tmp_path)
-        raise
-    except Exception as e:
-        # Handle unexpected errors
-        if tmp_path and os.path.exists(tmp_path):
-            os.unlink(tmp_path)
-        # Log full traceback
-        import traceback
-        traceback.print_exc()
-        # Return structured error response
-        return JSONResponse(
-            {"success": False, "error": str(e)},
-            status_code=500
-        )
-```
-### Visualization Error Isolation
-Each visualization is wrapped individually to prevent cascade failures:
-```python
-for viz_name, viz_func in visualization_functions.items():
-    try:
-        visualizations[viz_name] = viz_func()
-    except Exception as e:
-        print(f"Error generating {viz_name}: {e}")
-        visualizations[viz_name] = None
-```
-### Resource Cleanup
-Temporary files are always cleaned up:
-```python
-finally:
-    if tmp_path and os.path.exists(tmp_path):
-        os.unlink(tmp_path)
-```
----
-## ⚡ Performance Optimization
-### GPU Memory Management
-```python
-# At startup - clear GPU cache
-if torch.cuda.is_available():
-    torch.cuda.empty_cache()
-gc.collect()
-# Monitor memory usage
-print(f"GPU Memory: {torch.cuda.memory_allocated(0) / 1024**3:.2f} GB")
-```
-### Memory-Efficient Visualizations
-```python
-# Always close figures after encoding
-fig, ax = plt.subplots()
-# ... generate chart ...
-base64_str = fig_to_base64(fig)
-plt.close(fig)  # IMPORTANT: Prevents memory leaks
-```
-### Response Size Optimization
-```python
-# Remove large base64 images from saved JSON
-json_results = {k: v for k, v in results.items() if k != "visualizations"}
-# Save visualizations as separate files
-for viz_name, viz_data in visualizations.items():
-    save_visualization(viz_data, f"{filename}_{viz_name}.png")
-```
-### Lazy Model Loading
-```python
-# Model loaded once at startup, reused for all requests
-@app.on_event("startup")
-async def startup_event():
-    global model
-    model = init_detector(config, weights, device)
-```
-### Performance Benchmarks
-| Operation | Time (GPU) | Time (CPU) |
-|-----------|------------|------------|
-| Model loading | 10-15s | 20-30s |
-| Single inference | 0.3-0.5s | 2-5s |
-| Metrics calculation | 0.1-0.2s | 0.1-0.2s |
-| Visualization generation | 1-2s | 1-2s |
-| **Total per request** | **1.5-3s** | **4-8s** |
----
-## 🔒 Security Considerations
-### Current Security Status
-| Aspect | Status | Risk | Recommendation |
-|--------|--------|------|----------------|
-| Authentication | ❌ None | High | Add API key auth |
-| CORS | ⚠️ Permissive | Medium | Restrict origins |
-| Rate Limiting | ❌ None | Medium | Add throttling |
-| Input Validation | ⚠️ Basic | Low | Add size limits |
-| Path Handling | ⚠️ Hardcoded | Low | Use env vars |
-### Recommended Security Enhancements
-#### API Key Authentication
-```python
-from fastapi import Security
-from fastapi.security.api_key import APIKeyHeader
-API_KEY = os.environ.get("RODLA_API_KEY")
-api_key_header = APIKeyHeader(name="X-API-Key")
-async def verify_api_key(api_key: str = Security(api_key_header)):
-    if api_key != API_KEY:
-        raise HTTPException(403, "Invalid API key")
-    return api_key
-@app.post("/api/detect")
-async def detect_objects(
-    ...,
-    api_key: str = Depends(verify_api_key)
-):
-    ...
-```
-#### Rate Limiting
-```python
-from slowapi import Limiter
-from slowapi.util import get_remote_address
-limiter = Limiter(key_func=get_remote_address)
-app.state.limiter = limiter
-@app.post("/api/detect")
-@limiter.limit("10/minute")
-async def detect_objects(...):
-    ...
-```
-#### File Size Limits
-```python
-MAX_FILE_SIZE = 10 * 1024 * 1024  # 10MB
-@app.post("/api/detect")
-async def detect_objects(file: UploadFile = File(...)):
-    content = await file.read()
-    if len(content) > MAX_FILE_SIZE:
-        raise HTTPException(413, "File too large")
-    ...
-```
-#### Restricted CORS
-```python
-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=["https://yourdomain.com"],
-    allow_methods=["GET", "POST"],
-    allow_headers=["X-API-Key", "Content-Type"],
-)
-```
----
-## 🧪 Testing
-### Test Structure
-```
-tests/
-├── __init__.py
-├── conftest.py              # Pytest fixtures
-├── test_api/
-│   ├── test_routes.py       # Endpoint tests
-│   └── test_schemas.py      # Pydantic model tests
-├── test_services/
-│   ├── test_detection.py    # Detection logic tests
-│   ├── test_processing.py   # Processing tests
-│   └── test_visualization.py # Chart generation tests
-├── test_utils/
-│   ├── test_helpers.py      # Helper function tests
-│   ├── test_metrics.py      # Metrics calculation tests
-│   └── test_serialization.py # Serialization tests
-└── test_integration/
-    └── test_full_pipeline.py # End-to-end tests
-```
-### Running Tests
-```bash
-# Run all tests
-pytest
-# Run with coverage
-pytest --cov=. --cov-report=html
-# Run specific test file
-pytest tests/test_utils/test_metrics.py
-# Run with verbose output
-pytest -v
-# Run only fast tests (no model loading)
-pytest -m "not slow"
-```
-### Example Test Cases
-```python
-# tests/test_utils/test_helpers.py
-import pytest
-import numpy as np
-from utils.helpers import calculate_iou, calculate_skewness
-class TestCalculateIoU:
-    def test_complete_overlap(self):
-        bbox1 = {'x1': 0, 'y1': 0, 'x2': 100, 'y2': 100, 'width': 100, 'height': 100}
-        bbox2 = {'x1': 0, 'y1': 0, 'x2': 100, 'y2': 100, 'width': 100, 'height': 100}
-        assert calculate_iou(bbox1, bbox2) == 1.0
-    def test_no_overlap(self):
-        bbox1 = {'x1': 0, 'y1': 0, 'x2': 50, 'y2': 50, 'width': 50, 'height': 50}
-        bbox2 = {'x1': 100, 'y1': 100, 'x2': 150, 'y2': 150, 'width': 50, 'height': 50}
-        assert calculate_iou(bbox1, bbox2) == 0.0
-    def test_partial_overlap(self):
-        bbox1 = {'x1': 0, 'y1': 0, 'x2': 100, 'y2': 100, 'width': 100, 'height': 100}
-        bbox2 = {'x1': 50, 'y1': 50, 'x2': 150, 'y2': 150, 'width': 100, 'height': 100}
-        iou = calculate_iou(bbox1, bbox2)
-        assert 0 < iou < 1
-class TestCalculateSkewness:
-    def test_symmetric_distribution(self):
-        data = [1, 2, 3, 4, 5]
-        skew = calculate_skewness(data)
-        assert abs(skew) < 0.1  # Nearly symmetric
-    def test_right_skewed(self):
-        data = [1, 1, 1, 1, 10]
-        skew = calculate_skewness(data)
-        assert skew > 0  # Positive skew
-```
-### Mocking the Model
-```python
-# tests/conftest.py
-import pytest
-from unittest.mock import Mock, patch
-@pytest.fixture
-def mock_model():
-    """Create a mock detection model."""
-    model = Mock()
-    model.CLASSES = ['paragraph', 'title', 'figure', 'table']
-    return model
-@pytest.fixture
-def mock_detections():
-    """Sample detection results."""
-    return [
-        {
-            'class_id': 0,
-            'class_name': 'paragraph',
-            'bbox': {'x1': 100, 'y1': 100, 'x2': 500, 'y2': 300,
-                    'width': 400, 'height': 200, 'center_x': 300, 'center_y': 200},
-            'confidence': 0.95,
-            'area': 80000,
-            'aspect_ratio': 2.0
-        }
-    ]
-```
----
-## 🚢 Deployment
-### Development Server
-```bash
-python backend.py
-# or
-uvicorn backend:app --reload --host 0.0.0.0 --port 8000
-```
-### Production with Gunicorn
-```bash
-gunicorn backend:app -w 1 -k uvicorn.workers.UvicornWorker \
-    --bind 0.0.0.0:8000 \
-    --timeout 120 \
-    --keep-alive 5
-```
-**Note:** Use `workers=1` for GPU models to avoid memory issues.
-### Docker Deployment
-```dockerfile
-# Dockerfile
-FROM nvidia/cuda:11.8-cudnn8-runtime-ubuntu22.04
-# Install Python
-RUN apt-get update && apt-get install -y python3.9 python3-pip
-# Set working directory
-WORKDIR /app
-# Copy requirements first for caching
-COPY requirements.txt .
-RUN pip install --no-cache-dir -r requirements.txt
-# Copy application code
-COPY . .
-# Create output directory
-RUN mkdir -p outputs
-# Expose port
-EXPOSE 8000
-# Run application
-CMD ["uvicorn", "backend:app", "--host", "0.0.0.0", "--port", "8000"]
-```
-```yaml
-# docker-compose.yml
-version: '3.8'
-services:
-  rodla-api:
-    build: .
-    ports:
-      - "8000:8000"
-    volumes:
-      - ./outputs:/app/outputs
-      - ./weights:/app/weights
-    deploy:
-      resources:
-        reservations:
-          devices:
-            - driver: nvidia
-              count: 1
-              capabilities: [gpu]
-    environment:
-      - RODLA_API_KEY=${RODLA_API_KEY}
-    restart: unless-stopped
-```
-### Kubernetes Deployment
-```yaml
-# k8s/deployment.yaml
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  name: rodla-api
-spec:
-  replicas: 1
-  selector:
-    matchLabels:
-      app: rodla-api
-  template:
-    metadata:
-      labels:
-        app: rodla-api
-    spec:
-      containers:
-      - name: rodla-api
-        image: your-registry/rodla-api:latest
-        ports:
-        - containerPort: 8000
-        resources:
-          limits:
-            nvidia.com/gpu: 1
-            memory: "16Gi"
-          requests:
-            memory: "8Gi"
-        volumeMounts:
-        - name: outputs
-          mountPath: /app/outputs
-      volumes:
-      - name: outputs
-        persistentVolumeClaim:
-          claimName: rodla-outputs-pvc
-```
-### Nginx Reverse Proxy
-```nginx
-# /etc/nginx/sites-available/rodla-api
-upstream rodla_backend {
-    server 127.0.0.1:8000;
-}
-server {
-    listen 80;
-    server_name api.yourdomain.com;
-    client_max_body_size 50M;
-    location / {
-        proxy_pass http://rodla_backend;
-        proxy_set_header Host $host;
-        proxy_set_header X-Real-IP $remote_addr;
-        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
-        proxy_read_timeout 120s;
-    }
-}
-```
----
-## 🔧 Troubleshooting
-### Common Issues
-#### Model Loading Failures
-**Symptom:** `RuntimeError: CUDA out of memory`
-**Solutions:**
-```bash
-# Clear GPU memory before starting
-nvidia-smi --gpu-reset
-# Or in Python
-import torch
-torch.cuda.empty_cache()
-# Check available GPU memory
-nvidia-smi
-```
-**Symptom:** `ModuleNotFoundError: No module named 'mmdet'`
-**Solution:**
-```bash
-pip install -U openmim
-mim install mmengine mmcv mmdet
-```
-**Symptom:** `FileNotFoundError: Config file not found`
-**Solution:**
-```python
-# Check paths in config/settings.py
-from pathlib import Path
-print(Path(MODEL_CONFIG).exists())  # Should be True
-print(Path(MODEL_WEIGHTS).exists())  # Should be True
-```
----
-#### Inference Errors
-**Symptom:** `RuntimeError: Input type and weight type should be the same`
-**Solution:**
-```python
-# Ensure model and input are on same device
-model = model.to('cuda')
-# or
-model = model.to('cpu')
-```
-**Symptom:** `ValueError: could not broadcast input array`
-**Solution:**
-```python
-# Check image dimensions
-from PIL import Image
-img = Image.open(image_path)
-print(f"Image size: {img.size}")  # Should be reasonable dimensions
-```
----
-#### Visualization Errors
-**Symptom:** `RuntimeError: main thread is not in main loop`
-**Solution:**
-```python
-# Set matplotlib backend before importing pyplot
-import matplotlib
-matplotlib.use('Agg')  # Non-interactive backend
-import matplotlib.pyplot as plt
-```
-**Symptom:** Memory usage grows with each request
-**Solution:**
-```python
-# Always close figures after use
-fig, ax = plt.subplots()
-# ... plotting code ...
-plt.savefig(buffer, format='png')
-plt.close(fig)  # CRITICAL: Prevents memory leak
-plt.close('all')  # Nuclear option if needed
-```
----
-#### API Errors
-**Symptom:** `422 Unprocessable Entity`
-**Cause:** Invalid request format
-**Solution:**
-```bash
-# Correct multipart form data format
-curl -X POST "http://localhost:8000/api/detect" \
-  -H "accept: application/json" \
-  -F "file=@image.jpg;type=image/jpeg" \
-  -F "score_thr=0.3"
-```
-**Symptom:** `413 Request Entity Too Large`
-**Solution:**
-```python
-# Increase upload limit in FastAPI
-from fastapi import FastAPI, File, UploadFile
-app = FastAPI()
-# Or configure in nginx
-# client_max_body_size 50M;
-```
----
-### Debugging Tips
-#### Enable Debug Logging
-```python
-import logging
-logging.basicConfig(level=logging.DEBUG)
-logger = logging.getLogger(__name__)
-# In your code
-logger.debug(f"Processing image: {filename}")
-logger.debug(f"Detections found: {len(detections)}")
-```
-#### GPU Monitoring
-```bash
-# Real-time GPU monitoring
-watch -n 1 nvidia-smi
-# Or use gpustat
-pip install gpustat
-gpustat -i 1
-```
-#### Memory Profiling
-```python
-# Install memory profiler
-pip install memory_profiler
-# Use decorator
-from memory_profiler import profile
-@profile
-def detect_objects(...):
-    ...
-```
-#### Request Timing
-```python
-import time
-@app.post("/api/detect")
-async def detect_objects(...):
-    start_time = time.time()
-    # ... processing ...
-    elapsed = time.time() - start_time
-    logger.info(f"Request completed in {elapsed:.2f}s")
-```
----
-### Health Checks
-```python
-# Add health check endpoint
-@app.get("/health")
-async def health_check():
-    return {
-        "status": "healthy",
-        "model_loaded": model is not None,
-        "gpu_available": torch.cuda.is_available(),
-        "gpu_memory_used": f"{torch.cuda.memory_allocated(0) / 1024**3:.2f} GB"
-            if torch.cuda.is_available() else "N/A"
-    }
-```
----
-## 🤝 Contributing
-### Getting Started
-1. Fork the repository
-2. Create a feature branch: `git checkout -b feature/amazing-feature`
-3. Make your changes
-4. Run tests: `pytest`
-5. Commit: `git commit -m 'Add amazing feature'`
-6. Push: `git push origin feature/amazing-feature`
-7. Open a Pull Request
-### Code Style
-```bash
-# Install development dependencies
-pip install black isort flake8 mypy
-# Format code
-black .
-isort .
-# Check style
-flake8 .
-# Type checking
-mypy .
-```
-### Pre-commit Hooks
-```yaml
-# .pre-commit-config.yaml
-repos:
-  - repo: https://github.com/psf/black
-    rev: 23.7.0
-    hooks:
-      - id: black
-  - repo: https://github.com/pycqa/isort
-    rev: 5.12.0
-    hooks:
-      - id: isort
-  - repo: https://github.com/pycqa/flake8
-    rev: 6.1.0
-    hooks:
-      - id: flake8
-```
-```bash
-pip install pre-commit
-pre-commit install
-```
-### Adding New Metrics
-1. Create function in appropriate module under `utils/metrics/`
-2. Export from `utils/metrics/__init__.py`
-3. Call from `services/processing.py`
-4. Add to response schema in `api/schemas.py`
-5. Document in this README
-6. Add tests in `tests/test_utils/test_metrics.py`
-### Adding New Visualizations
-1. Add function in `services/visualization.py`
-2. Call from `generate_comprehensive_visualizations()`
-3. Handle errors with try-except
-4. Always close figures with `plt.close(fig)`
-5. Document chart type in this README
----
-## 📚 Citation
-If you use this API or the RoDLA model in your research, please cite:
-```bibtex
-@inproceedings{rodla2024cvpr,
-  title={RoDLA: Benchmarking the Robustness of Document Layout Analysis Models},
-  author={Author Names},
-  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision
-             and Pattern Recognition (CVPR)},
-  year={2024}
-}
-```
-### Related Publications
-```bibtex
-@article{internimage2023,
-  title={InternImage: Exploring Large-Scale Vision Foundation Models
-         with Deformable Convolutions},
-  author={Wang et al.},
-  journal={CVPR},
-  year={2023}
-}
-@article{dino2022,
-  title={DINO: DETR with Improved DeNoising Anchor Boxes
-         for End-to-End Object Detection},
-  author={Zhang et al.},
-  journal={ICLR},
-  year={2023}
-}
-```
----
-## 📄 License
-This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
-```
-MIT License
-Copyright (c) 2024 [Your Name]
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
-```
----
-## 📞 Support
-### Getting Help
-- **Documentation:** This README
-- **Issues:** [GitHub Issues](https://github.com/yourusername/rodla-api/issues)
-- **Discussions:** [GitHub Discussions](https://github.com/yourusername/rodla-api/discussions)
-### Reporting Bugs
-When reporting bugs, please include:
-1. Operating system and version
-2. Python version
-3. GPU model and driver version
-4. Complete error traceback
-5. Minimal reproducible example
-6. Input image (if possible)
-### Feature Requests
-We welcome feature requests! Please:
-1. Check existing issues first
-2. Describe the use case
-3. Explain expected behavior
-4. Provide examples if possible
----
-## 🙏 Acknowledgments
-- **RoDLA Authors** - For the original model and research
-- **MMDetection Team** - For the detection framework
-- **InternImage Team** - For the backbone architecture
-- **FastAPI** - For the excellent web framework
-- **Open Source Community** - For countless contributions
----
-<div align="center">
-**Built with ❤️ for Document Analysis**
-[⬆ Back to Top](#rodla-document-layout-analysis-api)
-</div>

deployment/backend/README_Version_TWO.md DELETED Viewed

The diff for this file is too large to render. See raw diff

deployment/backend/README_Version_Three.md DELETED Viewed

The diff for this file is too large to render. See raw diff

deployment/backend/backend.py CHANGED Viewed

@@ -1,98 +1,666 @@
 """
-RoDLA Object Detection API - Refactored Main Backend
-Clean separation of concerns with modular components
-Now with Perturbation Support!
 """
-from fastapi import FastAPI
 from fastapi.middleware.cors import CORSMiddleware
 import uvicorn
-from pathlib import Path
-# Import configuration
-from config.settings import (
-    API_TITLE, API_HOST, API_PORT,
-    CORS_ORIGINS, CORS_METHODS, CORS_HEADERS,
-    OUTPUT_DIR, PERTURBATION_OUTPUT_DIR  # NEW
-)
-# Import core functionality
-from core.model_loader import load_model
-# Import API routes
-from api.routes import router
-# Initialize FastAPI app
-app = FastAPI(
-    title=API_TITLE,
-    description="RoDLA Document Layout Analysis API with comprehensive metrics and perturbation testing",
-    version="2.1.0"  # Bumped version for perturbation feature
-)
 # Add CORS middleware
 app.add_middleware(
     CORSMiddleware,
-    allow_origins=CORS_ORIGINS,
     allow_credentials=True,
-    allow_methods=CORS_METHODS,
-    allow_headers=CORS_HEADERS,
 )
-# Include API routes
-app.include_router(router)
 @app.on_event("startup")
 async def startup_event():
-    """Initialize model and create directories on startup"""
     try:
-        print("="*60)
-        print("Starting RoDLA Document Layout Analysis API")
-        print("="*60)
-        # Create output directories
-        print("📁 Creating output directories...")
-        OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
-        PERTURBATION_OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
-        print(f"   ✓ Main output: {OUTPUT_DIR}")
-        print(f"   ✓ Perturbations: {PERTURBATION_OUTPUT_DIR}")
-        # Load model
-        print("\n🔧 Loading RoDLA model...")
         load_model()
-        print("\n" + "="*60)
-        print("✅ API Ready!")
-        print("="*60)
-        print(f"🌐 Main API: http://{API_HOST}:{API_PORT}")
-        print(f"📚 Docs: http://{API_HOST}:{API_PORT}/docs")
-        print(f"📖 ReDoc: http://{API_HOST}:{API_PORT}/redoc")
-        print("\n🎯 Available Endpoints:")
-        print("   • GET  /api/model-info              - Model information")
-        print("   • POST /api/detect                  - Standard detection")
-        print("   • GET  /api/perturbations/info      - Perturbation info (NEW)")
-        print("   • POST /api/perturb                 - Apply perturbations (NEW)")
-        print("   • POST /api/detect-with-perturbation - Detect with perturbations (NEW)")
-        print("="*60)
     except Exception as e:
-        print(f"❌ Startup failed: {e}")
-        import traceback
-        traceback.print_exc()
-        raise e
-@app.on_event("shutdown")
-async def shutdown_event():
-    """Cleanup on shutdown"""
-    print("\n" + "="*60)
-    print("🛑 Shutting down RoDLA API...")
-    print("="*60)
 if __name__ == "__main__":
     uvicorn.run(
         app,
-        host=API_HOST,
-        port=API_PORT,
         log_level="info"
-    )

 """
+RoDLA Backend - Production Version
+Uses real InternImage-XL weights and all 12 perturbation types with 3 degree levels
+MMDET disabled if MMCV extensions unavailable - perturbations always functional
 """
+import os
+import sys
+import json
+import base64
+import traceback
+from pathlib import Path
+from typing import Dict, List, Any, Optional, Tuple
+from io import BytesIO
+from datetime import datetime
+import numpy as np
+from PIL import Image
+import cv2
+from fastapi import FastAPI, File, UploadFile, HTTPException
 from fastapi.middleware.cors import CORSMiddleware
+from pydantic import BaseModel
 import uvicorn
+# ============================================================================
+# Configuration
+# ============================================================================
+class Config:
+    """Global configuration"""
+    API_PORT = 8000
+    REPO_ROOT = Path("/home/admin/CV/rodla-academic")
+    MODEL_CONFIG_PATH = REPO_ROOT / "model/configs/m6doc/rodla_internimage_xl_m6doc.py"
+    MODEL_WEIGHTS_PATH = REPO_ROOT / "finetuning_rodla/finetuning_rodla/checkpoints/rodla_internimage_xl_publaynet.pth"
+    PERTURBATIONS_DIR = REPO_ROOT / "deployment/backend/perturbations"
+    # Automatically use GPU if available, otherwise CPU
+    @staticmethod
+    def get_device():
+        import torch
+        if torch.cuda.is_available():
+            return "cuda:0"
+        else:
+            return "cpu"
+# ============================================================================
+# Global State
+# ============================================================================
+app = FastAPI(title="RoDLA Production Backend", version="3.0.0")
+# Detect device
+import torch
+DEVICE = "cuda:0" if torch.cuda.is_available() else "cpu"
+model_state = {
+    "loaded": False,
+    "model": None,
+    "error": None,
+    "model_type": "RoDLA InternImage-XL (MMDET)",
+    "device": DEVICE,
+    "mmdet_available": False
+}
 # Add CORS middleware
 app.add_middleware(
     CORSMiddleware,
+    allow_origins=["*"],
     allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
 )
+# ============================================================================
+# M6Doc Dataset Classes
+# ============================================================================
+LAYOUT_CLASS_MAP = {
+    i: "Text" for i in range(75)
+}
+# Simplified mapping to layout elements
+for i in range(75):
+    if i in [1, 2, 3, 4, 5]:
+        LAYOUT_CLASS_MAP[i] = "Title"
+    elif i in [6, 7]:
+        LAYOUT_CLASS_MAP[i] = "List"
+    elif i in [8, 9]:
+        LAYOUT_CLASS_MAP[i] = "Figure"
+    elif i in [10, 11]:
+        LAYOUT_CLASS_MAP[i] = "Table"
+    elif i in [12, 13, 14]:
+        LAYOUT_CLASS_MAP[i] = "Header"
+# ============================================================================
+# Utility Functions
+# ============================================================================
+def encode_image_to_base64(image: np.ndarray) -> str:
+    """Convert numpy array to base64 string"""
+    if len(image.shape) == 3 and image.shape[2] == 3:
+        # Ensure RGB order
+        if isinstance(image.flat[0], np.uint8):
+            image_to_encode = image
+        else:
+            image_to_encode = (image * 255).astype(np.uint8)
+    else:
+        image_to_encode = image
+    _, buffer = cv2.imencode('.png', image_to_encode)
+    return base64.b64encode(buffer).decode('utf-8')
+def heuristic_detect(image_np: np.ndarray) -> List[Dict]:
+    """Enhanced heuristic-based detection when MMDET is unavailable
+    Uses multiple edge detection methods and texture analysis"""
+    h, w = image_np.shape[:2]
+    detections = []
+    # Convert to grayscale for analysis
+    gray = cv2.cvtColor(image_np, cv2.COLOR_RGB2GRAY)
+    # Try multiple edge detection methods for better coverage
+    edges1 = cv2.Canny(gray, 50, 150)
+    edges2 = cv2.Canny(gray, 30, 100)
+    # Combine edges
+    edges = cv2.bitwise_or(edges1, edges2)
+    # Apply morphological operations to connect nearby edges
+    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
+    edges = cv2.morphologyEx(edges, cv2.MORPH_CLOSE, kernel)
+    # Find contours
+    contours, _ = cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
+    # Also try watershed/connected components for text detection
+    blur = cv2.GaussianBlur(gray, (5, 5), 0)
+    _, binary = cv2.threshold(blur, 127, 255, cv2.THRESH_BINARY)
+    # Find connected components
+    num_labels, labels = cv2.connectedComponents(binary)
+    # Process contours to create pseudo-detections
+    processed_boxes = set()
+    for contour in contours:
+        x, y, cw, ch = cv2.boundingRect(contour)
+        # Skip if too small or too large
+        if cw < 15 or ch < 15 or cw > w * 0.98 or ch > h * 0.98:
+            continue
+        area_ratio = (cw * ch) / (w * h)
+        if area_ratio < 0.0005 or area_ratio > 0.9:
+            continue
+        # Skip if box is too similar to already processed boxes
+        box_key = (round(x/10)*10, round(y/10)*10, round(cw/10)*10, round(ch/10)*10)
+        if box_key in processed_boxes:
+            continue
+        processed_boxes.add(box_key)
+        # Analyze content to determine class
+        roi = gray[y:y+ch, x:x+cw]
+        roi_blur = cv2.GaussianBlur(roi, (5, 5), 0)
+        roi_edges = cv2.Canny(roi_blur, 50, 150)
+        edge_density = np.sum(roi_edges > 0) / roi.size
+        aspect_ratio = cw / (ch + 1e-6)
+        # Classification logic
+        if aspect_ratio > 2.5 or (aspect_ratio > 2 and edge_density < 0.05):
+            # Wide with sparse edges = likely figure/table
+            class_name = "Figure"
+            class_id = 8
+            confidence = 0.6 + 0.35 * (1 - min(area_ratio / 0.5, 1.0))
+        elif aspect_ratio < 0.3:
+            # Narrow = likely list or table column
+            class_name = "List"
+            class_id = 6
+            confidence = 0.55 + 0.4 * (1 - min(area_ratio / 0.3, 1.0))
+        elif edge_density > 0.15:
+            # High edge density = likely table or complex content
+            class_name = "Table"
+            class_id = 10
+            confidence = 0.5 + 0.4 * edge_density
+        else:
+            # Default = text content
+            class_name = "Text"
+            class_id = 50
+            confidence = 0.5 + 0.4 * (1 - min(area_ratio / 0.3, 1.0))
+        # Ensure confidence in [0, 1]
+        confidence = min(max(confidence, 0.3), 0.95)
+        detections.append({
+            "class_id": class_id,
+            "class_name": class_name,
+            "confidence": float(confidence),
+            "bbox": {
+                "x": float(x / w),
+                "y": float(y / h),
+                "width": float(cw / w),
+                "height": float(ch / h)
+            },
+            "area": float(area_ratio)
+        })
+    # Sort by confidence and keep top 30
+    detections.sort(key=lambda x: x["confidence"], reverse=True)
+    return detections[:30]
+# ============================================================================
+# Model Loading
+# ============================================================================
+def load_model():
+    """Load the RoDLA model with actual weights"""
+    global model_state
+    print("\n" + "="*70)
+    print("🚀 Loading RoDLA InternImage-XL with Real Weights")
+    print("="*70)
+    # Verify weight file exists
+    if not Config.MODEL_WEIGHTS_PATH.exists():
+        error_msg = f"Weights not found: {Config.MODEL_WEIGHTS_PATH}"
+        print(f"❌ {error_msg}")
+        model_state["loaded"] = False
+        model_state["error"] = error_msg
+        return None
+    weights_size = Config.MODEL_WEIGHTS_PATH.stat().st_size / (1024**3)
+    print(f"✅ Weights file: {Config.MODEL_WEIGHTS_PATH}")
+    print(f"   Size: {weights_size:.2f}GB")
+    # Verify config exists
+    if not Config.MODEL_CONFIG_PATH.exists():
+        error_msg = f"Config not found: {Config.MODEL_CONFIG_PATH}"
+        print(f"❌ {error_msg}")
+        model_state["loaded"] = False
+        model_state["error"] = error_msg
+        return None
+    print(f"✅ Config file: {Config.MODEL_CONFIG_PATH}")
+    print(f"📍 Device: {model_state['device']}")
+    if model_state["device"] == "cpu":
+        print("⚠️  WARNING: DCNv3 (used in InternImage backbone) only supports CUDA")
+        print("   CPU inference is NOT available. Using heuristic fallback.")
+    # Try to import and load MMDET
+    try:
+        print("⏳ Setting up model environment...")
+        import torch
+        # Import and use DINO registration helper
+        from register_dino import try_load_with_dino_registration
+        print("⏳ Loading model from weights (this will take ~30-60 seconds)...")
+        print("   File: 3.8GB checkpoint...")
+        model = try_load_with_dino_registration(
+            str(Config.MODEL_CONFIG_PATH),
+            str(Config.MODEL_WEIGHTS_PATH),
+            device=model_state["device"]
+        )
+        if model is not None:
+            # Set model to evaluation mode
+            model.eval()
+            model_state["model"] = model
+            model_state["loaded"] = True
+            model_state["mmdet_available"] = True
+            model_state["error"] = None
+            print("✅ RoDLA Model loaded successfully!")
+            print("   Model set to evaluation mode (eval())")
+            print("   Ready for inference with real 3.8GB weights")
+            print("="*70 + "\n")
+            return model
+        else:
+            raise Exception("Model loading returned None")
+    except Exception as e:
+        error_msg = f"Failed to load model: {str(e)}"
+        print(f"❌ {error_msg}")
+        print(f"   Traceback: {traceback.format_exc()}")
+        model_state["loaded"] = False
+        model_state["mmdet_available"] = False
+        model_state["error"] = error_msg
+        print("   Backend will run in HYBRID mode:")
+        print("   - Detection: Enhanced heuristic-based (contour analysis)")
+        print("   - Perturbations: Real module with all 12 types")
+        print("="*70 + "\n")
+        return None
+def run_inference(image_np: np.ndarray, threshold: float = 0.3) -> List[Dict]:
+    """Run detection on image (MMDET if available, else heuristic)"""
+    if model_state["mmdet_available"] and model_state["model"] is not None:
+        try:
+            import torch
+            from mmdet.apis import inference_detector
+            # Ensure model is in eval mode for inference
+            model = model_state["model"]
+            model.eval()
+            # Disable gradients for inference (saves memory and speeds up)
+            with torch.no_grad():
+                # Convert to BGR for inference
+                image_bgr = cv2.cvtColor(image_np, cv2.COLOR_RGB2BGR)
+                h, w = image_np.shape[:2]
+                # Run inference with loaded model
+                result = inference_detector(model, image_bgr)
+            detections = []
+            if result is not None:
+                # Handle different result formats
+                if hasattr(result, 'pred_instances'):
+                    # Newer MMDET format
+                    bboxes = result.pred_instances.bboxes.cpu().numpy()
+                    scores = result.pred_instances.scores.cpu().numpy()
+                    labels = result.pred_instances.labels.cpu().numpy()
+                elif isinstance(result, tuple) and len(result) > 0:
+                    # Legacy format: (bbox_results, segm_results, ...)
+                    bbox_results = result[0]
+                    if isinstance(bbox_results, list):
+                        # List of arrays per class
+                        for class_id, class_bboxes in enumerate(bbox_results):
+                            if class_bboxes.size == 0:
+                                continue
+                            for box in class_bboxes:
+                                x1, y1, x2, y2, score = box
+                                bw = x2 - x1
+                                bh = y2 - y1
+                                class_name = LAYOUT_CLASS_MAP.get(class_id, f"Class_{class_id}")
+                                detections.append({
+                                    "class_id": class_id,
+                                    "class_name": class_name,
+                                    "confidence": float(score),
+                                    "bbox": {
+                                        "x": float(x1 / w),
+                                        "y": float(y1 / h),
+                                        "width": float(bw / w),
+                                        "height": float(bh / h)
+                                    },
+                                    "area": float((bw * bh) / (w * h))
+                                })
+                        # Skip the pred_instances path for legacy format
+                        detections.sort(key=lambda x: x["confidence"], reverse=True)
+                        return detections[:100]
+                # Handle pred_instances format
+                if 'bboxes' in locals():
+                    for bbox, score, label in zip(bboxes, scores, labels):
+                        if score < threshold:
+                            continue
+                        x1, y1, x2, y2 = bbox
+                        bw = x2 - x1
+                        bh = y2 - y1
+                        class_id = int(label)
+                        class_name = LAYOUT_CLASS_MAP.get(class_id, f"Class_{class_id}")
+                        detections.append({
+                            "class_id": class_id,
+                            "class_name": class_name,
+                            "confidence": float(score),
+                            "bbox": {
+                                "x": float(x1 / w),
+                                "y": float(y1 / h),
+                                "width": float(bw / w),
+                                "height": float(bh / h)
+                            },
+                            "area": float((bw * bh) / (w * h))
+                        })
+            # Sort by confidence and limit results
+            detections.sort(key=lambda x: x["confidence"], reverse=True)
+            return detections[:100]
+        except Exception as e:
+            print(f"⚠️  MMDET inference failed: {e}")
+            print(f"   Error details: {traceback.format_exc()}")
+            # Fall back to heuristic if inference fails
+            return heuristic_detect(image_np)
+    else:
+        # Use heuristic detection
+        return heuristic_detect(image_np)
+# ============================================================================
+# API Routes
+# ============================================================================
 @app.on_event("startup")
 async def startup_event():
+    """Initialize model on startup"""
     try:
         load_model()
+    except Exception as e:
+        print(f"⚠️  Model loading failed: {e}")
+        model_state["loaded"] = False
+@app.get("/api/health")
+async def health_check():
+    """Health check endpoint"""
+    return {
+        "status": "ok",
+        "model_loaded": model_state["loaded"],
+        "mmdet_available": model_state["mmdet_available"],
+        "detection_mode": "MMDET" if model_state["mmdet_available"] else "Heuristic",
+        "device": model_state["device"],
+        "model_type": model_state["model_type"],
+        "weights_path": str(Config.MODEL_WEIGHTS_PATH),
+        "weights_exists": Config.MODEL_WEIGHTS_PATH.exists(),
+        "weights_size_gb": Config.MODEL_WEIGHTS_PATH.stat().st_size / (1024**3) if Config.MODEL_WEIGHTS_PATH.exists() else 0
+    }
+@app.get("/api/model-info")
+async def model_info():
+    """Get model information"""
+    return {
+        "name": "RoDLA InternImage-XL",
+        "version": "3.0.0",
+        "type": "Document Layout Analysis",
+        "mmdet_loaded": model_state["loaded"],
+        "mmdet_available": model_state["mmdet_available"],
+        "detection_mode": "MMDET (Real Model)" if model_state["mmdet_available"] else "Heuristic (Contour-based)",
+        "error": model_state["error"],
+        "device": model_state["device"],
+        "framework": "MMDET + PyTorch (or Heuristic Fallback)",
+        "backbone": "InternImage-XL with DCNv3",
+        "detector": "DINO",
+        "dataset": "M6Doc (75 classes)",
+        "weights_file": str(Config.MODEL_WEIGHTS_PATH),
+        "config_file": str(Config.MODEL_CONFIG_PATH),
+        "perturbations_available": True,
+        "supported_perturbations": [
+            "defocus", "vibration", "speckle", "texture",
+            "watermark", "background", "ink_holdout", "ink_bleeding",
+            "illumination", "rotation", "keystoning", "warping"
+        ]
+    }
+@app.get("/api/perturbations/info")
+async def perturbation_info():
+    """Get information about available perturbations"""
+    return {
+        "total_perturbations": 12,
+        "categories": {
+            "blur": {
+                "types": ["defocus", "vibration"],
+                "description": "Blur effects simulating optical issues"
+            },
+            "noise": {
+                "types": ["speckle", "texture"],
+                "description": "Noise patterns and texture artifacts"
+            },
+            "content": {
+                "types": ["watermark", "background"],
+                "description": "Content additions like watermarks and backgrounds"
+            },
+            "inconsistency": {
+                "types": ["ink_holdout", "ink_bleeding", "illumination"],
+                "description": "Print quality issues and lighting variations"
+            },
+            "spatial": {
+                "types": ["rotation", "keystoning", "warping"],
+                "description": "Geometric transformations"
+            }
+        },
+        "all_types": [
+            "defocus", "vibration", "speckle", "texture",
+            "watermark", "background", "ink_holdout", "ink_bleeding",
+            "illumination", "rotation", "keystoning", "warping"
+        ],
+        "degree_levels": {
+            1: "Mild - Subtle effect",
+            2: "Moderate - Noticeable effect",
+            3: "Severe - Strong effect"
+        }
+    }
+@app.post("/api/detect")
+async def detect(file: UploadFile = File(...), threshold: float = 0.3):
+    """Detect document layout using RoDLA with real weights or heuristic fallback"""
+    start_time = datetime.now()
+    try:
+        # Load image
+        contents = await file.read()
+        image = Image.open(BytesIO(contents)).convert('RGB')
+        image_np = np.array(image)
+        h, w = image_np.shape[:2]
+        # Run inference
+        detections = run_inference(image_np, threshold=threshold)
+        # Build class distribution
+        class_distribution = {}
+        for det in detections:
+            cn = det["class_name"]
+            class_distribution[cn] = class_distribution.get(cn, 0) + 1
+        processing_time = (datetime.now() - start_time).total_seconds() * 1000
+        detection_mode = "Real MMDET Model (3.8GB weights)" if model_state["mmdet_available"] else "Heuristic Detection"
+        return {
+            "success": True,
+            "message": f"Detection completed using {detection_mode}",
+            "detection_mode": detection_mode,
+            "image_width": w,
+            "image_height": h,
+            "num_detections": len(detections),
+            "detections": detections,
+            "class_distribution": class_distribution,
+            "processing_time_ms": processing_time
+        }
     except Exception as e:
+        print(f"❌ Detection error: {e}\n{traceback.format_exc()}")
+        processing_time = (datetime.now() - start_time).total_seconds() * 1000
+        return {
+            "success": False,
+            "message": str(e),
+            "image_width": 0,
+            "image_height": 0,
+            "num_detections": 0,
+            "detections": [],
+            "class_distribution": {},
+            "processing_time_ms": processing_time
+        }
+@app.post("/api/generate-perturbations")
+async def generate_perturbations(file: UploadFile = File(...)):
+    """Generate all 12 perturbations with 3 degree levels each (36 total images)"""
+    try:
+        # Import simple perturbation functions (no external dependencies beyond common libs)
+        from perturbations_simple import apply_perturbation as simple_apply_perturbation
+        # Load image
+        contents = await file.read()
+        image = Image.open(BytesIO(contents)).convert('RGB')
+        image_np = np.array(image)
+        image_bgr = cv2.cvtColor(image_np, cv2.COLOR_RGB2BGR)
+        perturbations = {}
+        # Original
+        perturbations["original"] = {
+            "original": encode_image_to_base64(image_np)
+        }
+        # All 12 perturbation types
+        all_types = [
+            "defocus", "vibration", "speckle", "texture",
+            "watermark", "background", "ink_holdout", "ink_bleeding",
+            "illumination", "rotation", "keystoning", "warping"
+        ]
+        print(f"📊 Generating perturbations for {len(all_types)} types × 3 degrees = 36 images...")
+        # Generate all perturbations with 3 degree levels
+        generated_count = 0
+        for ptype in all_types:
+            perturbations[ptype] = {}
+            for degree in [1, 2, 3]:
+                try:
+                    # Use simple perturbation function (no external heavy dependencies)
+                    result_image, success, message = simple_apply_perturbation(
+                        image_bgr.copy(),
+                        ptype,
+                        degree=degree
+                    )
+                    if success:
+                        # Convert BGR to RGB for display
+                        if len(result_image.shape) == 3 and result_image.shape[2] == 3:
+                            result_rgb = cv2.cvtColor(result_image, cv2.COLOR_BGR2RGB)
+                        else:
+                            result_rgb = result_image
+                        perturbations[ptype][f"degree_{degree}"] = encode_image_to_base64(result_rgb)
+                        generated_count += 1
+                        print(f"  ✅ {ptype:12} degree {degree}: {message}")
+                    else:
+                        print(f"  ⚠️  {ptype:12} degree {degree}: {message}")
+                        perturbations[ptype][f"degree_{degree}"] = encode_image_to_base64(image_np)
+                except Exception as e:
+                    print(f"  ⚠️  Exception {ptype:12} degree {degree}: {e}")
+                    perturbations[ptype][f"degree_{degree}"] = encode_image_to_base64(image_np)
+        print(f"\n✅ Generated {generated_count}/36 perturbation images successfully")
+        return {
+            "success": True,
+            "message": f"Perturbations generated: 12 types × 3 degrees = 36 images + 1 original = 37 total",
+            "perturbations": perturbations,
+            "grid_info": {
+                "total_perturbations": 12,
+                "degree_levels": 3,
+                "total_images": 37,
+                "generated_count": generated_count
+            }
+        }
+    except ImportError as e:
+        print(f"❌ Import error: {e}\n{traceback.format_exc()}")
+        return {
+            "success": False,
+            "message": f"Perturbation module import error: {str(e)}",
+            "perturbations": {}
+        }
+    except Exception as e:
+        print(f"❌ Perturbation generation error: {e}\n{traceback.format_exc()}")
+        return {
+            "success": False,
+            "message": str(e),
+            "perturbations": {}
+        }
+# ============================================================================
+# Main
+# ============================================================================
 if __name__ == "__main__":
+    print("\n" + "🔷"*35)
+    print("🔷 RoDLA PRODUCTION BACKEND")
+    print("🔷 Model: InternImage-XL with DINO")
+    print("🔷 Weights: 3.8GB (rodla_internimage_xl_publaynet.pth)")
+    print("🔷 Perturbations: 12 types × 3 degrees each")
+    print("🔷 Detection: MMDET (if available) or Heuristic fallback")
+    print("🔷"*35)
     uvicorn.run(
         app,
+        host="0.0.0.0",
+        port=Config.API_PORT,
         log_level="info"
+    )

deployment/backend/backend_adaptive.py DELETED Viewed

@@ -1,500 +0,0 @@
-"""
-RoDLA Object Detection API - Adaptive Backend
-Attempts to use real model if available, falls back to enhanced simulation
-"""
-from fastapi import FastAPI, File, UploadFile, HTTPException, Form
-from fastapi.middleware.cors import CORSMiddleware
-from fastapi.responses import JSONResponse
-import uvicorn
-from pathlib import Path
-import json
-import base64
-import cv2
-import numpy as np
-from io import BytesIO
-from PIL import Image, ImageDraw, ImageFont
-import asyncio
-import sys
-# Try to import ML frameworks
-try:
-    import torch
-    from mmdet.apis import init_detector, inference_detector
-    HAS_MMDET = True
-    print("✓ PyTorch/MMDET available - Using REAL model")
-except ImportError:
-    HAS_MMDET = False
-    print("⚠ PyTorch/MMDET not available - Using enhanced simulation")
-# Add paths for config access
-sys.path.insert(0, '/home/admin/CV/rodla-academic')
-sys.path.insert(0, '/home/admin/CV/rodla-academic/model')
-# Try to import settings
-try:
-    from deployment.backend.config.settings import (
-        MODEL_CONFIG_PATH, MODEL_WEIGHTS_PATH,
-        API_HOST, API_PORT, CORS_ORIGINS, CORS_METHODS, CORS_HEADERS
-    )
-    print(f"✓ Config loaded from: {MODEL_CONFIG_PATH}")
-except Exception as e:
-    print(f"⚠ Could not load config: {e}")
-    API_HOST = "0.0.0.0"
-    API_PORT = 8000
-    CORS_ORIGINS = ["*"]
-    CORS_METHODS = ["*"]
-    CORS_HEADERS = ["*"]
-# Initialize FastAPI app
-app = FastAPI(
-    title="RoDLA Object Detection API (Adaptive)",
-    description="RoDLA Document Layout Analysis API - Real or Simulated Backend",
-    version="2.1.0"
-)
-# Add CORS middleware
-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=CORS_ORIGINS,
-    allow_credentials=True,
-    allow_methods=CORS_METHODS,
-    allow_headers=CORS_HEADERS,
-)
-# Configuration
-OUTPUT_DIR = Path("outputs")
-OUTPUT_DIR.mkdir(exist_ok=True)
-# Model classes (from DINO detection)
-MODEL_CLASSES = [
-    'Title', 'Abstract', 'Introduction', 'Related Work', 'Methodology',
-    'Experiments', 'Results', 'Discussion', 'Conclusion', 'References',
-    'Text', 'Figure', 'Table', 'Header', 'Footer', 'Page Number',
-    'Caption', 'Section', 'Subsection', 'Equation', 'Chart', 'List'
-]
-# Global model instance
-_model = None
-backend_mode = "SIMULATED"  # Will change if model loads
-# ============================================
-# MODEL LOADING
-# ============================================
-def load_real_model():
-    """Try to load the actual RoDLA model"""
-    global _model, backend_mode
-    if not HAS_MMDET:
-        return False
-    try:
-        print("\n🔄 Attempting to load real RoDLA model...")
-        # Check if files exist
-        if not Path(MODEL_CONFIG_PATH).exists():
-            print(f"❌ Config not found: {MODEL_CONFIG_PATH}")
-            return False
-        if not Path(MODEL_WEIGHTS_PATH).exists():
-            print(f"❌ Weights not found: {MODEL_WEIGHTS_PATH}")
-            return False
-        # Load model
-        device = "cuda:0" if torch.cuda.is_available() else "cpu"
-        print(f"Using device: {device}")
-        _model = init_detector(
-            str(MODEL_CONFIG_PATH),
-            str(MODEL_WEIGHTS_PATH),
-            device=device
-        )
-        backend_mode = "REAL"
-        print("✅ Real RoDLA model loaded successfully!")
-        return True
-    except Exception as e:
-        print(f"❌ Failed to load real model: {e}")
-        print("Falling back to enhanced simulation...")
-        return False
-def predict_with_model(image_array, score_threshold=0.3):
-    """Run inference with actual model"""
-    try:
-        if _model is None or backend_mode != "REAL":
-            return None
-        result = inference_detector(_model, image_array)
-        return result
-    except Exception as e:
-        print(f"Model inference error: {e}")
-        return None
-# ============================================
-# ENHANCED SIMULATION
-# ============================================
-class EnhancedDetector:
-    """Enhanced simulation that respects document layout"""
-    def __init__(self):
-        self.regions = []
-    def analyze_layout(self, image_array):
-        """Analyze document layout to place detections intelligently"""
-        h, w = image_array.shape[:2]
-        # Common document layout regions
-        layouts = {
-            'title': (0.05*w, 0.02*h, 0.95*w, 0.08*h),
-            'abstract': (0.05*w, 0.09*h, 0.95*w, 0.2*h),
-            'introduction': (0.05*w, 0.21*h, 0.95*w, 0.35*h),
-            'figure': (0.1*w, 0.36*h, 0.5*w, 0.65*h),
-            'table': (0.55*w, 0.36*h, 0.95*w, 0.65*h),
-            'references': (0.05*w, 0.7*h, 0.95*w, 0.98*h),
-        }
-        return layouts
-    def generate_detections(self, image_array, num_detections=None):
-        """Generate contextual detections"""
-        if num_detections is None:
-            num_detections = np.random.randint(10, 25)
-        h, w = image_array.shape[:2]
-        layouts = self.analyze_layout(image_array)
-        detections = []
-        # Grid-based detection for realistic distribution
-        grid_w, grid_h = np.random.randint(2, 4), np.random.randint(3, 6)
-        cell_w, cell_h = w // grid_w, h // grid_h
-        for i in range(num_detections):
-            # Pick random grid cell
-            grid_x = np.random.randint(0, grid_w)
-            grid_y = np.random.randint(0, grid_h)
-            # Add some variation within cell
-            margin = 0.1
-            x_min = int(grid_x * cell_w + margin * cell_w)
-            x_max = int((grid_x + 1) * cell_w - margin * cell_w)
-            y_min = int(grid_y * cell_h + margin * cell_h)
-            y_max = int((grid_y + 1) * cell_h - margin * cell_h)
-            if x_max <= x_min or y_max <= y_min:
-                continue
-            x1 = np.random.randint(x_min, x_max)
-            y1 = np.random.randint(y_min, y_max)
-            x2 = x1 + np.random.randint(50, min(200, x_max - x1))
-            y2 = y1 + np.random.randint(30, min(150, y_max - y1))
-            # Prefer certain classes in certain regions
-            if y1 < h * 0.1:
-                class_name = np.random.choice(['Title', 'Abstract', 'Header'])
-            elif y1 > h * 0.85:
-                class_name = np.random.choice(['Footer', 'References', 'Page Number'])
-            elif (x1 < w * 0.15 or x2 > w * 0.85):
-                class_name = np.random.choice(['Figure', 'Table', 'List'])
-            else:
-                class_name = np.random.choice(MODEL_CLASSES)
-            detection = {
-                'class': class_name,
-                'confidence': float(np.random.uniform(0.6, 0.98)),
-                'box': {
-                    'x1': int(max(0, x1)),
-                    'y1': int(max(0, y1)),
-                    'x2': int(min(w, x2)),
-                    'y2': int(min(h, y2))
-                }
-            }
-            detections.append(detection)
-        return detections
-detector = EnhancedDetector()
-# ============================================
-# HELPER FUNCTIONS
-# ============================================
-def generate_detections(image_shape, num_detections=None):
-    """Generate detections"""
-    return detector.generate_detections(np.zeros(image_shape), num_detections)
-def create_annotated_image(image_array, detections):
-    """Create annotated image with bounding boxes"""
-    img = Image.fromarray(image_array.astype('uint8'))
-    draw = ImageDraw.Draw(img)
-    box_color = (0, 255, 0)  # Lime green
-    text_color = (0, 255, 255)  # Cyan
-    for detection in detections:
-        box = detection['box']
-        x1, y1, x2, y2 = box['x1'], box['y1'], box['x2'], box['y2']
-        conf = detection['confidence']
-        class_name = detection['class']
-        draw.rectangle([x1, y1, x2, y2], outline=box_color, width=2)
-        label_text = f"{class_name} {conf*100:.0f}%"
-        draw.text((x1, y1-15), label_text, fill=text_color)
-    return np.array(img)
-def apply_perturbation(image_array, perturbation_type):
-    """Apply perturbation to image"""
-    result = image_array.copy()
-    if perturbation_type == 'blur':
-        result = cv2.GaussianBlur(result, (15, 15), 0)
-    elif perturbation_type == 'noise':
-        noise = np.random.normal(0, 25, result.shape)
-        result = np.clip(result.astype(float) + noise, 0, 255).astype(np.uint8)
-    elif perturbation_type == 'rotation':
-        h, w = result.shape[:2]
-        center = (w // 2, h // 2)
-        angle = np.random.uniform(-15, 15)
-        M = cv2.getRotationMatrix2D(center, angle, 1.0)
-        result = cv2.warpAffine(result, M, (w, h))
-    elif perturbation_type == 'scaling':
-        scale = np.random.uniform(0.8, 1.2)
-        h, w = result.shape[:2]
-        new_h, new_w = int(h * scale), int(w * scale)
-        result = cv2.resize(result, (new_w, new_h))
-        if new_h > h or new_w > w:
-            result = result[:h, :w]
-        else:
-            pad_h = h - new_h
-            pad_w = w - new_w
-            result = cv2.copyMakeBorder(result, pad_h//2, pad_h-pad_h//2,
-                                       pad_w//2, pad_w-pad_w//2, cv2.BORDER_CONSTANT)
-    elif perturbation_type == 'perspective':
-        h, w = result.shape[:2]
-        pts1 = np.float32([[0, 0], [w, 0], [0, h], [w, h]])
-        pts2 = np.float32([
-            [np.random.randint(0, 30), np.random.randint(0, 30)],
-            [w - np.random.randint(0, 30), np.random.randint(0, 30)],
-            [np.random.randint(0, 30), h - np.random.randint(0, 30)],
-            [w - np.random.randint(0, 30), h - np.random.randint(0, 30)]
-        ])
-        M = cv2.getPerspectiveTransform(pts1, pts2)
-        result = cv2.warpPerspective(result, M, (w, h))
-    return result
-def image_to_base64(image_array):
-    """Convert image array to base64 string"""
-    img = Image.fromarray(image_array.astype('uint8'))
-    buffer = BytesIO()
-    img.save(buffer, format='PNG')
-    return base64.b64encode(buffer.getvalue()).decode()
-# ============================================
-# API ENDPOINTS
-# ============================================
-@app.on_event("startup")
-async def startup_event():
-    """Initialize on startup"""
-    print("="*60)
-    print("Starting RoDLA Document Layout Analysis API (Adaptive)")
-    print("="*60)
-    # Try to load real model
-    load_real_model()
-    print(f"\n📊 Backend Mode: {backend_mode}")
-    print(f"🌐 Main API: http://{API_HOST}:{API_PORT}")
-    print(f"📚 Docs: http://localhost:{API_PORT}/docs")
-    print(f"📖 ReDoc: http://localhost:{API_PORT}/redoc")
-    print("\n🎯 Available Endpoints:")
-    print("   • GET  /api/health              - Health check")
-    print("   • GET  /api/model-info          - Model information")
-    print("   • POST /api/detect              - Standard detection")
-    print("   • GET  /api/perturbations/info  - Perturbation info")
-    print("   • POST /api/generate-perturbations - Generate perturbations")
-    print("   • POST /api/detect-with-perturbation - Detect with perturbations")
-    print("="*60)
-    print("✅ API Ready!\n")
-@app.get("/api/health")
-async def health_check():
-    """Health check endpoint"""
-    return JSONResponse({
-        "status": "healthy",
-        "mode": backend_mode,
-        "has_model": backend_mode == "REAL"
-    })
-@app.get("/api/model-info")
-async def model_info():
-    """Get model information"""
-    return JSONResponse({
-        "model_name": "RoDLA InternImage-XL",
-        "paper": "RoDLA: Benchmarking the Robustness of Document Layout Analysis Models (CVPR 2024)",
-        "backbone": "InternImage-XL",
-        "detection_framework": "DINO with Channel Attention + Average Pooling",
-        "dataset": "M6Doc-P",
-        "max_detections_per_image": 300,
-        "backend_mode": backend_mode,
-        "state_of_the_art_performance": {
-            "clean_mAP": 70.0,
-            "perturbed_avg_mAP": 61.7,
-            "mRD_score": 147.6
-        }
-    })
-@app.post("/api/detect")
-async def detect(file: UploadFile = File(...), score_threshold: float = Form(0.3)):
-    """Standard detection endpoint"""
-    try:
-        contents = await file.read()
-        image = Image.open(BytesIO(contents)).convert('RGB')
-        image_array = np.array(image)
-        detections = generate_detections(image_array.shape)
-        detections = [d for d in detections if d['confidence'] >= score_threshold]
-        annotated = create_annotated_image(image_array, detections)
-        annotated_b64 = image_to_base64(annotated)
-        class_dist = {}
-        for det in detections:
-            cls = det['class']
-            class_dist[cls] = class_dist.get(cls, 0) + 1
-        return JSONResponse({
-            "detections": detections,
-            "class_distribution": class_dist,
-            "annotated_image": annotated_b64,
-            "metrics": {
-                "total_detections": len(detections),
-                "average_confidence": float(np.mean([d['confidence'] for d in detections]) if detections else 0),
-                "max_confidence": float(max([d['confidence'] for d in detections]) if detections else 0),
-                "min_confidence": float(min([d['confidence'] for d in detections]) if detections else 0),
-                "backend_mode": backend_mode
-            }
-        })
-    except Exception as e:
-        raise HTTPException(status_code=400, detail=str(e))
-@app.get("/api/perturbations/info")
-async def perturbations_info():
-    """Get available perturbation types"""
-    return JSONResponse({
-        "available_perturbations": [
-            "blur",
-            "noise",
-            "rotation",
-            "scaling",
-            "perspective"
-        ],
-        "description": "Various document perturbations for robustness testing"
-    })
-@app.post("/api/generate-perturbations")
-async def generate_perturbations(
-    file: UploadFile = File(...),
-    perturbation_types: str = Form("blur,noise")
-):
-    """Generate and return perturbations"""
-    try:
-        contents = await file.read()
-        image = Image.open(BytesIO(contents)).convert('RGB')
-        image_array = np.array(image)
-        pert_types = [p.strip() for p in perturbation_types.split(',')]
-        results = {
-            "original": image_to_base64(image_array),
-            "perturbations": {}
-        }
-        for pert_type in pert_types:
-            if pert_type:
-                perturbed = apply_perturbation(image_array, pert_type)
-                results["perturbations"][pert_type] = image_to_base64(perturbed)
-        return JSONResponse(results)
-    except Exception as e:
-        raise HTTPException(status_code=400, detail=str(e))
-@app.post("/api/detect-with-perturbation")
-async def detect_with_perturbation(
-    file: UploadFile = File(...),
-    score_threshold: float = Form(0.3),
-    perturbation_types: str = Form("blur,noise")
-):
-    """Detect with perturbations"""
-    try:
-        contents = await file.read()
-        image = Image.open(BytesIO(contents)).convert('RGB')
-        image_array = np.array(image)
-        pert_types = [p.strip() for p in perturbation_types.split(',')]
-        results = {
-            "clean": {},
-            "perturbed": {}
-        }
-        # Clean detection
-        clean_dets = generate_detections(image_array.shape)
-        clean_dets = [d for d in clean_dets if d['confidence'] >= score_threshold]
-        clean_img = create_annotated_image(image_array, clean_dets)
-        results["clean"]["detections"] = clean_dets
-        results["clean"]["annotated_image"] = image_to_base64(clean_img)
-        # Perturbed detections
-        for pert_type in pert_types:
-            if pert_type:
-                perturbed_img = apply_perturbation(image_array, pert_type)
-                pert_dets = generate_detections(perturbed_img.shape)
-                pert_dets = [
-                    {**d, 'confidence': max(0, d['confidence'] - np.random.uniform(0, 0.1))}
-                    for d in pert_dets
-                ]
-                pert_dets = [d for d in pert_dets if d['confidence'] >= score_threshold]
-                annotated_pert = create_annotated_image(perturbed_img, pert_dets)
-                results["perturbed"][pert_type] = {
-                    "detections": pert_dets,
-                    "annotated_image": image_to_base64(annotated_pert)
-                }
-        return JSONResponse(results)
-    except Exception as e:
-        raise HTTPException(status_code=400, detail=str(e))
-@app.on_event("shutdown")
-async def shutdown_event():
-    """Cleanup on shutdown"""
-    print("\n" + "="*60)
-    print("🛑 Shutting down RoDLA API...")
-    print("="*60)
-if __name__ == "__main__":
-    uvicorn.run(
-        app,
-        host=API_HOST,
-        port=API_PORT,
-        log_level="info"
-    )

deployment/backend/backend_demo.py DELETED Viewed

@@ -1,366 +0,0 @@
-"""
-RoDLA Object Detection API - Demo/Lightweight Backend
-Simulates the full backend for testing when real model weights unavailable
-"""
-from fastapi import FastAPI, File, UploadFile, HTTPException, Form
-from fastapi.middleware.cors import CORSMiddleware
-from fastapi.responses import JSONResponse
-import uvicorn
-from pathlib import Path
-import json
-import base64
-import cv2
-import numpy as np
-from io import BytesIO
-from PIL import Image, ImageDraw, ImageFont
-import asyncio
-# Initialize FastAPI app
-app = FastAPI(
-    title="RoDLA Object Detection API (Demo Mode)",
-    description="RoDLA Document Layout Analysis API - Demo/Test Version",
-    version="2.1.0"
-)
-# Add CORS middleware
-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=["*"],
-    allow_credentials=True,
-    allow_methods=["*"],
-    allow_headers=["*"],
-)
-# Configuration
-API_HOST = "0.0.0.0"
-API_PORT = 8000
-OUTPUT_DIR = Path("outputs")
-OUTPUT_DIR.mkdir(exist_ok=True)
-# Model classes
-MODEL_CLASSES = [
-    'Title', 'Abstract', 'Introduction', 'Related Work', 'Methodology',
-    'Experiments', 'Results', 'Discussion', 'Conclusion', 'References',
-    'Text', 'Figure', 'Table', 'Header', 'Footer', 'Page Number', 'Caption'
-]
-# ============================================
-# HELPER FUNCTIONS
-# ============================================
-def generate_demo_detections(image_shape, num_detections=None):
-    """Generate realistic demo detections"""
-    if num_detections is None:
-        num_detections = np.random.randint(8, 20)
-    height, width = image_shape[:2]
-    detections = []
-    for i in range(num_detections):
-        x1 = np.random.randint(10, width - 200)
-        y1 = np.random.randint(10, height - 100)
-        x2 = x1 + np.random.randint(100, min(300, width - x1))
-        y2 = y1 + np.random.randint(50, min(200, height - y1))
-        detection = {
-            'class': np.random.choice(MODEL_CLASSES),
-            'confidence': float(np.random.uniform(0.5, 0.99)),
-            'box': {
-                'x1': int(x1),
-                'y1': int(y1),
-                'x2': int(x2),
-                'y2': int(y2)
-            }
-        }
-        detections.append(detection)
-    return detections
-def create_annotated_image(image_array, detections):
-    """Create annotated image with bounding boxes"""
-    # Convert to PIL Image
-    img = Image.fromarray(image_array.astype('uint8'))
-    draw = ImageDraw.Draw(img)
-    # Colors in teal/lime theme
-    box_color = (0, 255, 0)  # Lime green
-    text_color = (0, 255, 255)  # Cyan
-    for detection in detections:
-        box = detection['box']
-        x1, y1, x2, y2 = box['x1'], box['y1'], box['x2'], box['y2']
-        conf = detection['confidence']
-        class_name = detection['class']
-        # Draw box
-        draw.rectangle([x1, y1, x2, y2], outline=box_color, width=2)
-        # Draw label
-        label_text = f"{class_name} {conf*100:.0f}%"
-        draw.text((x1, y1-15), label_text, fill=text_color)
-    return np.array(img)
-def apply_perturbation(image_array, perturbation_type):
-    """Apply perturbation to image"""
-    result = image_array.copy()
-    if perturbation_type == 'blur':
-        result = cv2.GaussianBlur(result, (15, 15), 0)
-    elif perturbation_type == 'noise':
-        noise = np.random.normal(0, 25, result.shape)
-        result = np.clip(result.astype(float) + noise, 0, 255).astype(np.uint8)
-    elif perturbation_type == 'rotation':
-        h, w = result.shape[:2]
-        center = (w // 2, h // 2)
-        angle = np.random.uniform(-15, 15)
-        M = cv2.getRotationMatrix2D(center, angle, 1.0)
-        result = cv2.warpAffine(result, M, (w, h))
-    elif perturbation_type == 'scaling':
-        scale = np.random.uniform(0.8, 1.2)
-        h, w = result.shape[:2]
-        new_h, new_w = int(h * scale), int(w * scale)
-        result = cv2.resize(result, (new_w, new_h))
-        # Pad or crop to original size
-        if new_h > h or new_w > w:
-            result = result[:h, :w]
-        else:
-            pad_h = h - new_h
-            pad_w = w - new_w
-            result = cv2.copyMakeBorder(result, pad_h//2, pad_h-pad_h//2,
-                                       pad_w//2, pad_w-pad_w//2, cv2.BORDER_CONSTANT)
-    elif perturbation_type == 'perspective':
-        h, w = result.shape[:2]
-        pts1 = np.float32([[0, 0], [w, 0], [0, h], [w, h]])
-        pts2 = np.float32([
-            [np.random.randint(0, 30), np.random.randint(0, 30)],
-            [w - np.random.randint(0, 30), np.random.randint(0, 30)],
-            [np.random.randint(0, 30), h - np.random.randint(0, 30)],
-            [w - np.random.randint(0, 30), h - np.random.randint(0, 30)]
-        ])
-        M = cv2.getPerspectiveTransform(pts1, pts2)
-        result = cv2.warpPerspective(result, M, (w, h))
-    return result
-def image_to_base64(image_array):
-    """Convert image array to base64 string"""
-    img = Image.fromarray(image_array.astype('uint8'))
-    buffer = BytesIO()
-    img.save(buffer, format='PNG')
-    return base64.b64encode(buffer.getvalue()).decode()
-# ============================================
-# API ENDPOINTS
-# ============================================
-@app.on_event("startup")
-async def startup_event():
-    """Initialize on startup"""
-    print("="*60)
-    print("Starting RoDLA Document Layout Analysis API (DEMO)")
-    print("="*60)
-    print(f"🌐 Main API: http://{API_HOST}:{API_PORT}")
-    print(f"📚 Docs: http://localhost:{API_PORT}/docs")
-    print(f"📖 ReDoc: http://localhost:{API_PORT}/redoc")
-    print("\n🎯 Available Endpoints:")
-    print("   • GET  /api/health              - Health check")
-    print("   • GET  /api/model-info          - Model information")
-    print("   • POST /api/detect              - Standard detection")
-    print("   • GET  /api/perturbations/info  - Perturbation info")
-    print("   • POST /api/generate-perturbations - Generate perturbations")
-    print("   • POST /api/detect-with-perturbation - Detect with perturbations")
-    print("="*60)
-    print("✅ API Ready! (Demo Mode)\n")
-@app.get("/api/health")
-async def health_check():
-    """Health check endpoint"""
-    return JSONResponse({
-        "status": "healthy",
-        "mode": "demo",
-        "timestamp": str(Path.cwd())
-    })
-@app.get("/api/model-info")
-async def model_info():
-    """Get model information"""
-    return JSONResponse({
-        "model_name": "RoDLA InternImage-XL (Demo Mode)",
-        "paper": "RoDLA: Benchmarking the Robustness of Document Layout Analysis Models (CVPR 2024)",
-        "backbone": "InternImage-XL",
-        "detection_framework": "DINO with Channel Attention + Average Pooling",
-        "dataset": "M6Doc-P",
-        "max_detections_per_image": 300,
-        "demo_mode": True,
-        "state_of_the_art_performance": {
-            "clean_mAP": 70.0,
-            "perturbed_avg_mAP": 61.7,
-            "mRD_score": 147.6
-        }
-    })
-@app.post("/api/detect")
-async def detect(file: UploadFile = File(...), score_threshold: float = Form(0.3)):
-    """Standard detection endpoint"""
-    try:
-        # Read image
-        contents = await file.read()
-        image = Image.open(BytesIO(contents)).convert('RGB')
-        image_array = np.array(image)
-        # Generate demo detections
-        detections = generate_demo_detections(image_array.shape)
-        # Filter by threshold
-        detections = [d for d in detections if d['confidence'] >= score_threshold]
-        # Create annotated image
-        annotated = create_annotated_image(image_array, detections)
-        annotated_b64 = image_to_base64(annotated)
-        # Calculate class distribution
-        class_dist = {}
-        for det in detections:
-            cls = det['class']
-            class_dist[cls] = class_dist.get(cls, 0) + 1
-        return JSONResponse({
-            "detections": detections,
-            "class_distribution": class_dist,
-            "annotated_image": annotated_b64,
-            "metrics": {
-                "total_detections": len(detections),
-                "average_confidence": float(np.mean([d['confidence'] for d in detections]) if detections else 0),
-                "max_confidence": float(max([d['confidence'] for d in detections]) if detections else 0),
-                "min_confidence": float(min([d['confidence'] for d in detections]) if detections else 0)
-            }
-        })
-    except Exception as e:
-        raise HTTPException(status_code=400, detail=str(e))
-@app.get("/api/perturbations/info")
-async def perturbations_info():
-    """Get available perturbation types"""
-    return JSONResponse({
-        "available_perturbations": [
-            "blur",
-            "noise",
-            "rotation",
-            "scaling",
-            "perspective"
-        ],
-        "description": "Various document perturbations for robustness testing"
-    })
-@app.post("/api/generate-perturbations")
-async def generate_perturbations(
-    file: UploadFile = File(...),
-    perturbation_types: str = Form("blur,noise")
-):
-    """Generate and return perturbations"""
-    try:
-        # Read image
-        contents = await file.read()
-        image = Image.open(BytesIO(contents)).convert('RGB')
-        image_array = np.array(image)
-        # Parse perturbation types
-        pert_types = [p.strip() for p in perturbation_types.split(',')]
-        # Generate perturbations
-        results = {
-            "original": image_to_base64(image_array),
-            "perturbations": {}
-        }
-        for pert_type in pert_types:
-            if pert_type:
-                perturbed = apply_perturbation(image_array, pert_type)
-                results["perturbations"][pert_type] = image_to_base64(perturbed)
-        return JSONResponse(results)
-    except Exception as e:
-        raise HTTPException(status_code=400, detail=str(e))
-@app.post("/api/detect-with-perturbation")
-async def detect_with_perturbation(
-    file: UploadFile = File(...),
-    score_threshold: float = Form(0.3),
-    perturbation_types: str = Form("blur,noise")
-):
-    """Detect with perturbations"""
-    try:
-        # Read image
-        contents = await file.read()
-        image = Image.open(BytesIO(contents)).convert('RGB')
-        image_array = np.array(image)
-        # Parse perturbation types
-        pert_types = [p.strip() for p in perturbation_types.split(',')]
-        # Results for each perturbation
-        results = {
-            "clean": {},
-            "perturbed": {}
-        }
-        # Clean detection
-        clean_dets = generate_demo_detections(image_array.shape)
-        clean_dets = [d for d in clean_dets if d['confidence'] >= score_threshold]
-        clean_img = create_annotated_image(image_array, clean_dets)
-        results["clean"]["detections"] = clean_dets
-        results["clean"]["annotated_image"] = image_to_base64(clean_img)
-        # Perturbed detections
-        for pert_type in pert_types:
-            if pert_type:
-                perturbed_img = apply_perturbation(image_array, pert_type)
-                pert_dets = generate_demo_detections(perturbed_img.shape)
-                # Add slight confidence reduction for perturbed
-                pert_dets = [
-                    {**d, 'confidence': max(0, d['confidence'] - np.random.uniform(0, 0.1))}
-                    for d in pert_dets
-                ]
-                pert_dets = [d for d in pert_dets if d['confidence'] >= score_threshold]
-                annotated_pert = create_annotated_image(perturbed_img, pert_dets)
-                results["perturbed"][pert_type] = {
-                    "detections": pert_dets,
-                    "annotated_image": image_to_base64(annotated_pert)
-                }
-        return JSONResponse(results)
-    except Exception as e:
-        raise HTTPException(status_code=400, detail=str(e))
-@app.on_event("shutdown")
-async def shutdown_event():
-    """Cleanup on shutdown"""
-    print("\n" + "="*60)
-    print("🛑 Shutting down RoDLA API...")
-    print("="*60)
-if __name__ == "__main__":
-    uvicorn.run(
-        app,
-        host=API_HOST,
-        port=API_PORT,
-        log_level="info"
-    )

deployment/backend/backend_lite.py DELETED Viewed

@@ -1,618 +0,0 @@
-"""
-Lightweight RoDLA Backend - Pure PyTorch Implementation
-Bypasses MMCV/MMDET compiled extensions for CPU-only systems
-"""
-import os
-import sys
-import json
-import base64
-import traceback
-import subprocess
-from pathlib import Path
-from typing import Dict, List, Any, Optional, Tuple
-from io import BytesIO
-from datetime import datetime
-import numpy as np
-from PIL import Image
-import cv2
-import torch
-from fastapi import FastAPI, File, UploadFile, HTTPException, BackgroundTasks
-from fastapi.middleware.cors import CORSMiddleware
-from fastapi.responses import JSONResponse
-from pydantic import BaseModel
-import uvicorn
-# Try to import real perturbation functions
-try:
-    from perturbations.apply import (
-        apply_perturbation as real_apply_perturbation,
-        apply_multiple_perturbations,
-        get_perturbation_info as get_real_perturbation_info,
-        PERTURBATION_CATEGORIES
-    )
-    REAL_PERTURBATIONS_AVAILABLE = True
-    print("✅ Real perturbation module imported successfully")
-except Exception as e:
-    REAL_PERTURBATIONS_AVAILABLE = False
-    print(f"⚠️  Could not import real perturbations: {e}")
-    PERTURBATION_CATEGORIES = {}
-# ============================================================================
-# Configuration
-# ============================================================================
-class Config:
-    """Global configuration"""
-    API_PORT = 8000
-    MAX_UPLOAD_SIZE = 50 * 1024 * 1024  # 50MB
-    DEFAULT_SCORE_THRESHOLD = 0.3
-    MAX_DETECTIONS_PER_IMAGE = 300
-    REPO_ROOT = Path("/home/admin/CV/rodla-academic")
-    MODEL_CONFIG_PATH = REPO_ROOT / "model/configs/m6doc/rodla_internimage_xl_m6doc.py"
-    MODEL_WEIGHTS_PATH = REPO_ROOT / "finetuning_rodla/finetuning_rodla/checkpoints/rodla_internimage_xl_publaynet.pth"
-# ============================================================================
-# Global State
-# ============================================================================
-app = FastAPI(title="RoDLA Backend Lite", version="1.0.0")
-model_state = {
-    "loaded": False,
-    "error": None,
-    "model": None,
-    "model_type": "lightweight",
-    "device": "cpu"
-}
-# Add CORS middleware
-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=["*"],
-    allow_credentials=True,
-    allow_methods=["*"],
-    allow_headers=["*"],
-)
-# ============================================================================
-# Schemas
-# ============================================================================
-class DetectionResult(BaseModel):
-    class_id: int
-    class_name: str
-    confidence: float
-    bbox: Dict[str, float]  # {x, y, width, height}
-    area: float
-class AnalysisResponse(BaseModel):
-    success: bool
-    message: str
-    image_width: int
-    image_height: int
-    num_detections: int
-    detections: List[DetectionResult]
-    class_distribution: Dict[str, int]
-    processing_time_ms: float
-class PerturbationResponse(BaseModel):
-    success: bool
-    message: str
-    perturbation_type: str
-    original_image: str  # base64
-    perturbed_image: str  # base64
-class BatchAnalysisRequest(BaseModel):
-    threshold: float = Config.DEFAULT_SCORE_THRESHOLD
-    score_threshold: float = Config.DEFAULT_SCORE_THRESHOLD
-# ============================================================================
-# Simple Mock Model (Lightweight Detection)
-# ============================================================================
-class LightweightDetector:
-    """
-    Simple layout detection model that doesn't require MMCV/MMDET
-    Generates synthetic but realistic detections for document layout analysis
-    """
-    DOCUMENT_CLASSES = {
-        0: "Text",
-        1: "Title",
-        2: "Figure",
-        3: "Table",
-        4: "Header",
-        5: "Footer",
-        6: "List"
-    }
-    def __init__(self):
-        self.device = "cpu"
-        print(f"✅ Lightweight detector initialized (device: {self.device})")
-    def detect(self, image: np.ndarray, score_threshold: float = 0.3) -> List[Dict[str, Any]]:
-        """
-        Perform document layout detection on image
-        Returns list of detections with class, confidence, and bbox
-        """
-        height, width = image.shape[:2]
-        detections = []
-        # Simple heuristic: scan image for content regions
-        # Convert to grayscale
-        if len(image.shape) == 3:
-            gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
-        else:
-            gray = image
-        # Apply threshold to find content regions
-        _, binary = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY_INV)
-        # Find contours
-        contours, _ = cv2.findContours(binary, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
-        # Process top contours as regions
-        sorted_contours = sorted(contours, key=cv2.contourArea, reverse=True)[:15]
-        for idx, contour in enumerate(sorted_contours):
-            x, y, w, h = cv2.boundingRect(contour)
-            # Skip very small regions
-            if w < 10 or h < 10:
-                continue
-            # Filter regions that are too large (whole page)
-            if w > width * 0.95 or h > height * 0.95:
-                continue
-            # Assign class based on heuristics
-            aspect_ratio = w / h if h > 0 else 1
-            area_ratio = (w * h) / (width * height)
-            if aspect_ratio > 3:  # Wide -> likely title or figure caption
-                class_id = 1 if area_ratio < 0.15 else 2
-            elif aspect_ratio < 0.5:  # Tall -> likely list or table
-                class_id = 3 if area_ratio > 0.2 else 6
-            else:  # Regular -> text
-                class_id = 0
-            # Generate confidence based on region size and position
-            confidence = min(0.95, 0.4 + area_ratio)
-            if confidence >= score_threshold:
-                detections.append({
-                    "class_id": class_id,
-                    "class_name": self.DOCUMENT_CLASSES.get(class_id, "Unknown"),
-                    "confidence": float(confidence),
-                    "bbox": {
-                        "x": float(x / width),
-                        "y": float(y / height),
-                        "width": float(w / width),
-                        "height": float(h / height)
-                    },
-                    "area": float((w * h) / (width * height))
-                })
-        # If no detections found, add synthetic ones
-        if not detections:
-            detections = self._generate_synthetic_detections(width, height, score_threshold)
-        return detections[:Config.MAX_DETECTIONS_PER_IMAGE]
-    def _generate_synthetic_detections(self, width: int, height: int,
-                                      score_threshold: float) -> List[Dict[str, Any]]:
-        """Generate synthetic detections when contour detection fails"""
-        detections = []
-        # Title at top
-        detections.append({
-            "class_id": 1,
-            "class_name": "Title",
-            "confidence": 0.92,
-            "bbox": {"x": 0.05, "y": 0.05, "width": 0.9, "height": 0.1},
-            "area": 0.09
-        })
-        # Main text body
-        detections.append({
-            "class_id": 0,
-            "class_name": "Text",
-            "confidence": 0.88,
-            "bbox": {"x": 0.05, "y": 0.2, "width": 0.9, "height": 0.6},
-            "area": 0.54
-        })
-        # Side figure
-        detections.append({
-            "class_id": 2,
-            "class_name": "Figure",
-            "confidence": 0.85,
-            "bbox": {"x": 0.55, "y": 0.22, "width": 0.4, "height": 0.4},
-            "area": 0.16
-        })
-        return [d for d in detections if d["confidence"] >= score_threshold]
-# ============================================================================
-# Model Loading
-# ============================================================================
-def load_model():
-    """Load the detection model"""
-    global model_state
-    try:
-        print("\n" + "="*60)
-        print("🚀 Loading RoDLA Model (Lightweight Mode)")
-        print("="*60)
-        model_state["model"] = LightweightDetector()
-        model_state["loaded"] = True
-        model_state["error"] = None
-        print("✅ Model loaded successfully!")
-        print(f"   Device: {model_state['model'].device}")
-        print(f"   Type: Lightweight detector (no MMCV/MMDET required)")
-        print("="*60 + "\n")
-        return model_state["model"]
-    except Exception as e:
-        error_msg = f"Failed to load model: {str(e)}\n{traceback.format_exc()}"
-        print(f"❌ {error_msg}")
-        model_state["error"] = error_msg
-        model_state["loaded"] = False
-        raise
-# ============================================================================
-# Utility Functions
-# ============================================================================
-def encode_image_to_base64(image: np.ndarray) -> str:
-    """Convert numpy array to base64 string"""
-    _, buffer = cv2.imencode('.png', cv2.cvtColor(image, cv2.COLOR_RGB2BGR))
-    return base64.b64encode(buffer).decode('utf-8')
-def decode_base64_to_image(b64_str: str) -> np.ndarray:
-    """Convert base64 string to numpy array"""
-    buffer = base64.b64decode(b64_str)
-    image = Image.open(BytesIO(buffer)).convert('RGB')
-    return np.array(image)
-def apply_perturbation(image: np.ndarray, perturbation_type: str,
-                       degree: int = 2, **kwargs) -> np.ndarray:
-    """Apply perturbation using real backend if available, else fallback"""
-    if REAL_PERTURBATIONS_AVAILABLE:
-        try:
-            result, success, msg = real_apply_perturbation(image, perturbation_type, degree=degree)
-            if success:
-                return result
-            else:
-                print(f"⚠️  Real perturbation failed ({perturbation_type}): {msg}")
-        except Exception as e:
-            print(f"⚠️  Exception in real perturbation ({perturbation_type}): {e}")
-    # Fallback to simple perturbations
-    h, w = image.shape[:2]
-    if perturbation_type == "blur" or perturbation_type == "defocus":
-        kernel_size = [3, 5, 7][degree - 1]
-        return cv2.GaussianBlur(image, (kernel_size, kernel_size), 0)
-    elif perturbation_type == "noise" or perturbation_type == "speckle":
-        std = [10, 25, 50][degree - 1]
-        noise = np.random.normal(0, std, image.shape)
-        return np.clip(image.astype(float) + noise, 0, 255).astype(np.uint8)
-    elif perturbation_type == "rotation":
-        angle = [5, 15, 25][degree - 1]
-        center = (w // 2, h // 2)
-        M = cv2.getRotationMatrix2D(center, angle, 1.0)
-        return cv2.warpAffine(image, M, (w, h), borderValue=(255, 255, 255))
-    elif perturbation_type == "scaling":
-        scale = [0.9, 0.8, 0.7][degree - 1]
-        new_w, new_h = int(w * scale), int(h * scale)
-        resized = cv2.resize(image, (new_w, new_h))
-        canvas = np.full((h, w, 3), 255, dtype=np.uint8)
-        y_offset = (h - new_h) // 2
-        x_offset = (w - new_w) // 2
-        canvas[y_offset:y_offset+new_h, x_offset:x_offset+new_w] = resized
-        return canvas
-    elif perturbation_type == "perspective":
-        offset = [10, 20, 40][degree - 1]
-        pts1 = np.float32([[0, 0], [w, 0], [0, h], [w, h]])
-        pts2 = np.float32([
-            [offset, 0],
-            [w - offset, offset],
-            [0, h - offset],
-            [w - offset, h]
-        ])
-        M = cv2.getPerspectiveTransform(pts1, pts2)
-        return cv2.warpPerspective(image, M, (w, h), borderValue=(255, 255, 255))
-    else:
-        return image
-# ============================================================================
-# API Routes
-# ============================================================================
-@app.on_event("startup")
-async def startup_event():
-    """Initialize model on startup"""
-    try:
-        load_model()
-    except Exception as e:
-        print(f"⚠️  Startup error: {e}")
-@app.get("/api/health")
-async def health_check():
-    """Health check endpoint"""
-    return {
-        "status": "ok",
-        "model_loaded": model_state["loaded"],
-        "device": model_state["device"],
-        "model_type": model_state["model_type"]
-    }
-@app.get("/api/model-info")
-async def model_info():
-    """Get model information"""
-    return {
-        "name": "RoDLA Lightweight",
-        "version": "1.0.0",
-        "type": "Document Layout Analysis",
-        "loaded": model_state["loaded"],
-        "device": model_state["device"],
-        "framework": "PyTorch (Pure)",
-        "classes": LightweightDetector.DOCUMENT_CLASSES,
-        "supported_perturbations": ["blur", "noise", "rotation", "scaling", "perspective"]
-    }
-@app.post("/api/detect")
-async def detect(file: UploadFile = File(...), threshold: float = 0.3):
-    """Detect document layout in image"""
-    start_time = datetime.now()
-    try:
-        if not model_state["loaded"]:
-            raise HTTPException(status_code=500, detail="Model not loaded")
-        # Read image
-        contents = await file.read()
-        image = Image.open(BytesIO(contents)).convert('RGB')
-        image_np = np.array(image)
-        # Run detection
-        detections = model_state["model"].detect(image_np, score_threshold=threshold)
-        # Build response
-        class_distribution = {}
-        for det in detections:
-            class_name = det["class_name"]
-            class_distribution[class_name] = class_distribution.get(class_name, 0) + 1
-        processing_time = (datetime.now() - start_time).total_seconds() * 1000
-        return {
-            "success": True,
-            "message": "Detection completed",
-            "image_width": image_np.shape[1],
-            "image_height": image_np.shape[0],
-            "num_detections": len(detections),
-            "detections": detections,
-            "class_distribution": class_distribution,
-            "processing_time_ms": processing_time
-        }
-    except Exception as e:
-        print(f"❌ Detection error: {e}")
-        return {
-            "success": False,
-            "message": str(e),
-            "image_width": 0,
-            "image_height": 0,
-            "num_detections": 0,
-            "detections": [],
-            "class_distribution": {},
-            "processing_time_ms": 0
-        }
-@app.get("/api/perturbations/info")
-async def perturbation_info():
-    """Get information about available perturbations"""
-    return {
-        "total_perturbations": 12,
-        "categories": {
-            "blur": {
-                "types": ["defocus", "vibration"],
-                "description": "Blur effects simulating optical issues"
-            },
-            "noise": {
-                "types": ["speckle", "texture"],
-                "description": "Noise patterns and texture artifacts"
-            },
-            "content": {
-                "types": ["watermark", "background"],
-                "description": "Content additions like watermarks and backgrounds"
-            },
-            "inconsistency": {
-                "types": ["ink_holdout", "ink_bleeding", "illumination"],
-                "description": "Print quality issues and lighting variations"
-            },
-            "spatial": {
-                "types": ["rotation", "keystoning", "warping"],
-                "description": "Geometric transformations"
-            }
-        },
-        "all_types": [
-            "defocus", "vibration", "speckle", "texture",
-            "watermark", "background", "ink_holdout", "ink_bleeding",
-            "illumination", "rotation", "keystoning", "warping"
-        ],
-        "degree_levels": {
-            1: "Mild - Subtle effect",
-            2: "Moderate - Noticeable effect",
-            3: "Severe - Strong effect"
-        }
-    }
-@app.post("/api/generate-perturbations")
-async def generate_perturbations(file: UploadFile = File(...)):
-    """Generate perturbed versions of image with all 12 types × 3 degrees"""
-    try:
-        # Read image
-        contents = await file.read()
-        image = Image.open(BytesIO(contents)).convert('RGB')
-        image_np = np.array(image)
-        # Convert RGB to BGR for OpenCV
-        image_bgr = cv2.cvtColor(image_np, cv2.COLOR_RGB2BGR)
-        perturbations = {}
-        # Original
-        perturbations["original"] = {
-            "original": encode_image_to_base64(image_np)
-        }
-        # All 12 perturbation types
-        all_types = [
-            "defocus", "vibration", "speckle", "texture",
-            "watermark", "background", "ink_holdout", "ink_bleeding",
-            "illumination", "rotation", "keystoning", "warping"
-        ]
-        for ptype in all_types:
-            perturbations[ptype] = {}
-            for degree in [1, 2, 3]:
-                try:
-                    perturbed = apply_perturbation(image_bgr.copy(), ptype, degree)
-                    # Convert back to RGB for display
-                    if len(perturbed.shape) == 3 and perturbed.shape[2] == 3:
-                        perturbed_rgb = cv2.cvtColor(perturbed, cv2.COLOR_BGR2RGB)
-                    else:
-                        perturbed_rgb = perturbed
-                    perturbations[ptype][f"degree_{degree}"] = encode_image_to_base64(perturbed_rgb)
-                except Exception as e:
-                    print(f"⚠️  Warning: Failed to apply {ptype} degree {degree}: {e}")
-                    # Use original as fallback
-                    perturbations[ptype][f"degree_{degree}"] = encode_image_to_base64(image_np)
-        return {
-            "success": True,
-            "message": "Perturbations generated (12 types × 3 levels)",
-            "perturbations": perturbations,
-            "grid_info": {
-                "total_perturbations": 12,
-                "degree_levels": 3,
-                "total_images": 13  # 1 original + 12 types
-            }
-        }
-    except Exception as e:
-        print(f"❌ Perturbation error: {e}")
-        import traceback
-        traceback.print_exc()
-        return {
-            "success": False,
-            "message": str(e),
-            "perturbations": {}
-        }
-@app.post("/api/detect-with-perturbation")
-async def detect_with_perturbation(
-    file: UploadFile = File(...),
-    perturbation_type: str = "blur",
-    threshold: float = 0.3
-):
-    """Apply perturbation and detect"""
-    try:
-        # Read image
-        contents = await file.read()
-        image = Image.open(BytesIO(contents)).convert('RGB')
-        image_np = np.array(image)
-        # Apply perturbation
-        if perturbation_type == "blur":
-            perturbed = apply_perturbation(image_np, "blur", kernel_size=15)
-        elif perturbation_type == "noise":
-            perturbed = apply_perturbation(image_np, "noise", std=25)
-        elif perturbation_type == "rotation":
-            perturbed = apply_perturbation(image_np, "rotation", angle=15)
-        elif perturbation_type == "scaling":
-            perturbed = apply_perturbation(image_np, "scaling", scale=0.85)
-        elif perturbation_type == "perspective":
-            perturbed = apply_perturbation(image_np, "perspective", offset=20)
-        else:
-            perturbed = image_np
-        # Run detection
-        detections = model_state["model"].detect(perturbed, score_threshold=threshold)
-        class_distribution = {}
-        for det in detections:
-            class_name = det["class_name"]
-            class_distribution[class_name] = class_distribution.get(class_name, 0) + 1
-        return {
-            "success": True,
-            "message": "Detection with perturbation completed",
-            "perturbation_type": perturbation_type,
-            "image_width": perturbed.shape[1],
-            "image_height": perturbed.shape[0],
-            "num_detections": len(detections),
-            "detections": detections,
-            "class_distribution": class_distribution
-        }
-    except Exception as e:
-        print(f"❌ Detection with perturbation error: {e}")
-        return {
-            "success": False,
-            "message": str(e),
-            "perturbation_type": perturbation_type,
-            "num_detections": 0,
-            "detections": []
-        }
-# ============================================================================
-# Main
-# ============================================================================
-if __name__ == "__main__":
-    print("\n" + "🔷"*30)
-    print("🔷 RoDLA Lightweight Backend Starting...")
-    print("🔷"*30)
-    uvicorn.run(
-        app,
-        host="0.0.0.0",
-        port=Config.API_PORT,
-        log_level="info"
-    )

deployment/backend/{backend_two.py → backend_old.py} RENAMED Viewed

File without changes

deployment/backend/perturbations/spatial.py CHANGED Viewed

@@ -1,41 +1,49 @@
 import os.path
-from detectron2.data.transforms import RotationTransform
-from detectron2.data.detection_utils import transform_instance_annotations
 import numpy as np
-from detectron2.data.datasets import register_coco_instances
 from copy import deepcopy
 import os
 import cv2
-from detectron2.data.datasets.coco import convert_to_coco_json, convert_to_coco_dict
-from detectron2.data import MetadataCatalog, DatasetCatalog
 import imgaug.augmenters as iaa
 from imgaug.augmentables.bbs import BoundingBox, BoundingBoxesOnImage
 from imgaug.augmentables.polys import Polygon, PolygonsOnImage
 def apply_rotation(image, degree, annos=None):
     if degree == 0:
-        return image
     angle_low_list = [0, 5, 10]
     angle_high_list = [5, 10, 15]
     angle_high = angle_high_list[degree - 1]
     angle_low = angle_low_list[degree - 1]
     h, w = image.shape[:2]
     if angle_low == 0:
         rotation = np.random.choice(np.arange(-angle_high, angle_high+1))
     else:
         rotation = np.random.choice(np.concatenate([np.arange(-angle_high, -angle_low+1), np.arange(angle_low, angle_high+1)]))
-    rotation_transform = RotationTransform(h, w, rotation)
-    rotated_image = rotation_transform.apply_image(image)
     if annos is None:
         return rotated_image
-    rotated_annos = []
-    for anno in annos:
-        rotated_anno = transform_instance_annotations(anno, rotation_transform, (h, w))
-        for i, seg in enumerate(rotated_anno["segmentation"]):
-            rotated_anno["segmentation"][i] = seg.tolist()
-        rotated_annos.append(rotated_anno)
-    return rotated_image, rotated_annos
 def apply_warping(image, degree, annos=None):

 import os.path
 import numpy as np
 from copy import deepcopy
 import os
 import cv2
 import imgaug.augmenters as iaa
 from imgaug.augmentables.bbs import BoundingBox, BoundingBoxesOnImage
 from imgaug.augmentables.polys import Polygon, PolygonsOnImage
+# detectron2 imports are only used for annotation transformation (optional)
+try:
+    from detectron2.data.transforms import RotationTransform
+    from detectron2.data.detection_utils import transform_instance_annotations
+    from detectron2.data.datasets import register_coco_instances
+    from detectron2.data.datasets.coco import convert_to_coco_json, convert_to_coco_dict
+    from detectron2.data import MetadataCatalog, DatasetCatalog
+    HAS_DETECTRON2 = True
+except ImportError:
+    HAS_DETECTRON2 = False
 def apply_rotation(image, degree, annos=None):
     if degree == 0:
+        return image if annos is None else (image, annos)
     angle_low_list = [0, 5, 10]
     angle_high_list = [5, 10, 15]
     angle_high = angle_high_list[degree - 1]
     angle_low = angle_low_list[degree - 1]
     h, w = image.shape[:2]
     if angle_low == 0:
         rotation = np.random.choice(np.arange(-angle_high, angle_high+1))
     else:
         rotation = np.random.choice(np.concatenate([np.arange(-angle_high, -angle_low+1), np.arange(angle_low, angle_high+1)]))
+    # Use OpenCV for rotation instead of detectron2
+    center = (w // 2, h // 2)
+    rotation_matrix = cv2.getRotationMatrix2D(center, rotation, 1.0)
+    rotated_image = cv2.warpAffine(image, rotation_matrix, (w, h), borderValue=(255, 255, 255))
     if annos is None:
         return rotated_image
+    # For annotations, return original since we don't have detectron2
+    return rotated_image, annos
 def apply_warping(image, degree, annos=None):

deployment/backend/perturbations_simple.py ADDED Viewed

	@@ -0,0 +1,516 @@

+"""
+Perturbation Application Module - Using Common Libraries
+Applies 12 document degradation perturbations using PIL, OpenCV, NumPy, and SciPy
+"""
+import cv2
+import numpy as np
+from PIL import Image, ImageDraw, ImageFilter, ImageOps
+from typing import Optional, Tuple, List, Dict
+from scipy import ndimage
+from scipy.ndimage import gaussian_filter
+import random
+def encode_to_rgb(image: np.ndarray) -> np.ndarray:
+    """Ensure image is in RGB format"""
+    if len(image.shape) == 2:  # Grayscale
+        return cv2.cvtColor(image, cv2.COLOR_GRAY2RGB)
+    elif image.shape[2] == 4:  # RGBA
+        return cv2.cvtColor(image, cv2.COLOR_RGBA2RGB)
+    return image
+# ============================================================================
+# BLUR PERTURBATIONS
+# ============================================================================
+def apply_defocus(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
+    """
+    Apply defocus blur (Gaussian blur simulating out-of-focus camera)
+    degree: 1 (mild), 2 (moderate), 3 (severe)
+    """
+    if degree == 0:
+        return image, True, "No defocus"
+    try:
+        image = encode_to_rgb(image)
+        # Kernel sizes for different degrees
+        kernel_sizes = {1: 3, 2: 7, 3: 15}
+        kernel_size = kernel_sizes.get(degree, 15)
+        # Apply Gaussian blur
+        blurred = cv2.GaussianBlur(image, (kernel_size, kernel_size), 0)
+        return blurred, True, f"Defocus applied (kernel={kernel_size})"
+    except Exception as e:
+        return image, False, f"Defocus error: {str(e)}"
+def apply_vibration(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
+    """
+    Apply motion blur (vibration/camera shake effect)
+    degree: 1 (mild), 2 (moderate), 3 (severe)
+    """
+    if degree == 0:
+        return image, True, "No vibration"
+    try:
+        image = encode_to_rgb(image)
+        h, w = image.shape[:2]
+        # Motion blur kernel sizes
+        kernel_sizes = {1: 5, 2: 15, 3: 25}
+        kernel_size = kernel_sizes.get(degree, 25)
+        # Create motion blur kernel
+        kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (kernel_size, kernel_size))
+        kernel = kernel / kernel.sum()
+        # Apply motion blur
+        blurred = cv2.filter2D(image, -1, kernel)
+        return blurred, True, f"Vibration applied (kernel={kernel_size})"
+    except Exception as e:
+        return image, False, f"Vibration error: {str(e)}"
+# ============================================================================
+# NOISE PERTURBATIONS
+# ============================================================================
+def apply_speckle(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
+    """
+    Apply speckle noise (multiplicative noise)
+    degree: 1 (mild), 2 (moderate), 3 (severe)
+    """
+    if degree == 0:
+        return image, True, "No speckle"
+    try:
+        image = encode_to_rgb(image)
+        image_float = image.astype(np.float32) / 255.0
+        # Noise intensity
+        noise_levels = {1: 0.1, 2: 0.25, 3: 0.5}
+        noise_level = noise_levels.get(degree, 0.5)
+        # Generate speckle noise
+        speckle = np.random.normal(1, noise_level, image_float.shape)
+        noisy = image_float * speckle
+        # Clip values
+        noisy = np.clip(noisy, 0, 1)
+        noisy = (noisy * 255).astype(np.uint8)
+        return noisy, True, f"Speckle applied (intensity={noise_level})"
+    except Exception as e:
+        return image, False, f"Speckle error: {str(e)}"
+def apply_texture(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
+    """
+    Apply texture/grain noise (additive Gaussian noise)
+    degree: 1 (mild), 2 (moderate), 3 (severe)
+    """
+    if degree == 0:
+        return image, True, "No texture"
+    try:
+        image = encode_to_rgb(image)
+        image_float = image.astype(np.float32)
+        # Noise levels
+        noise_levels = {1: 10, 2: 25, 3: 50}
+        noise_level = noise_levels.get(degree, 50)
+        # Add Gaussian noise
+        noise = np.random.normal(0, noise_level, image_float.shape)
+        noisy = image_float + noise
+        # Clip values
+        noisy = np.clip(noisy, 0, 255).astype(np.uint8)
+        return noisy, True, f"Texture applied (std={noise_level})"
+    except Exception as e:
+        return image, False, f"Texture error: {str(e)}"
+# ============================================================================
+# CONTENT PERTURBATIONS
+# ============================================================================
+def apply_watermark(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
+    """
+    Add watermark text overlay
+    degree: 1 (subtle), 2 (noticeable), 3 (heavy)
+    """
+    if degree == 0:
+        return image, True, "No watermark"
+    try:
+        image = encode_to_rgb(image)
+        h, w = image.shape[:2]
+        # Convert to PIL for text drawing
+        pil_image = Image.fromarray(image)
+        draw = ImageDraw.Draw(pil_image, 'RGBA')
+        # Watermark parameters by degree
+        watermark_text = "WATERMARK" * degree
+        fontsize_list = {1: max(10, h // 20), 2: max(15, h // 15), 3: max(20, h // 10)}
+        fontsize = fontsize_list.get(degree, 20)
+        alpha_list = {1: 64, 2: 128, 3: 200}
+        alpha = alpha_list.get(degree, 200)
+        # Draw watermark multiple times
+        num_watermarks = {1: 1, 2: 3, 3: 5}.get(degree, 5)
+        for i in range(num_watermarks):
+            x = (w // (num_watermarks + 1)) * (i + 1)
+            y = h // 2
+            color = (255, 0, 0, alpha)
+            draw.text((x, y), watermark_text, fill=color)
+        return np.array(pil_image), True, f"Watermark applied (degree={degree})"
+    except Exception as e:
+        return image, False, f"Watermark error: {str(e)}"
+def apply_background(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
+    """
+    Add background patterns/textures
+    degree: 1 (subtle), 2 (noticeable), 3 (heavy)
+    """
+    if degree == 0:
+        return image, True, "No background"
+    try:
+        image = encode_to_rgb(image)
+        h, w = image.shape[:2]
+        # Create background pattern
+        pattern_intensity = {1: 0.1, 2: 0.2, 3: 0.35}.get(degree, 0.35)
+        # Generate random pattern
+        pattern = np.random.randint(0, 100, (h, w, 3), dtype=np.uint8)
+        pattern = cv2.GaussianBlur(pattern, (21, 21), 0)
+        # Blend with original image
+        result = cv2.addWeighted(image, 1.0, pattern, pattern_intensity, 0)
+        return result.astype(np.uint8), True, f"Background applied (intensity={pattern_intensity})"
+    except Exception as e:
+        return image, False, f"Background error: {str(e)}"
+# ============================================================================
+# INCONSISTENCY PERTURBATIONS
+# ============================================================================
+def apply_ink_holdout(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
+    """
+    Apply ink holdout (missing ink/text drop-out)
+    degree: 1 (few gaps), 2 (some gaps), 3 (many gaps)
+    """
+    if degree == 0:
+        return image, True, "No ink holdout"
+    try:
+        image = encode_to_rgb(image)
+        h, w = image.shape[:2]
+        # Create white mask to simulate missing ink
+        num_dropouts = {1: 3, 2: 8, 3: 15}.get(degree, 15)
+        result = image.copy()
+        for _ in range(num_dropouts):
+            # Random position and size
+            x = np.random.randint(0, w - 20)
+            y = np.random.randint(0, h - 20)
+            size = np.random.randint(10, 40)
+            # Create white rectangle (simulating ink dropout)
+            result[y:y+size, x:x+size] = [255, 255, 255]
+        return result, True, f"Ink holdout applied (dropouts={num_dropouts})"
+    except Exception as e:
+        return image, False, f"Ink holdout error: {str(e)}"
+def apply_ink_bleeding(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
+    """
+    Apply ink bleeding effect (ink spread/bleed)
+    degree: 1 (mild), 2 (moderate), 3 (severe)
+    """
+    if degree == 0:
+        return image, True, "No ink bleeding"
+    try:
+        image = encode_to_rgb(image)
+        # Convert to grayscale for processing
+        gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
+        # Dilate dark regions (simulating ink spread)
+        kernel_sizes = {1: 3, 2: 5, 3: 7}
+        kernel_size = kernel_sizes.get(degree, 7)
+        kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (kernel_size, kernel_size))
+        # Dilate to spread ink
+        dilated = cv2.dilate(gray, kernel, iterations=degree)
+        # Blend back with original
+        result = image.copy().astype(np.float32)
+        result[:,:,0] = cv2.addWeighted(image[:,:,0], 0.7, dilated, 0.3, 0)
+        result[:,:,1] = cv2.addWeighted(image[:,:,1], 0.7, dilated, 0.3, 0)
+        result[:,:,2] = cv2.addWeighted(image[:,:,2], 0.7, dilated, 0.3, 0)
+        return np.clip(result, 0, 255).astype(np.uint8), True, f"Ink bleeding applied (degree={degree})"
+    except Exception as e:
+        return image, False, f"Ink bleeding error: {str(e)}"
+def apply_illumination(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
+    """
+    Apply illumination variations (uneven lighting)
+    degree: 1 (subtle), 2 (moderate), 3 (severe)
+    """
+    if degree == 0:
+        return image, True, "No illumination"
+    try:
+        image = encode_to_rgb(image)
+        h, w = image.shape[:2]
+        # Create illumination pattern
+        intensity = {1: 0.15, 2: 0.3, 3: 0.5}.get(degree, 0.5)
+        # Create gradient-like illumination from corners
+        x = np.linspace(-1, 1, w)
+        y = np.linspace(-1, 1, h)
+        X, Y = np.meshgrid(x, y)
+        # Create vignette effect
+        illumination = 1 - intensity * (np.sqrt(X**2 + Y**2) / np.sqrt(2))
+        illumination = np.clip(illumination, 0, 1)
+        # Apply to each channel
+        result = image.astype(np.float32)
+        for c in range(3):
+            result[:,:,c] = result[:,:,c] * illumination
+        return np.clip(result, 0, 255).astype(np.uint8), True, f"Illumination applied (intensity={intensity})"
+    except Exception as e:
+        return image, False, f"Illumination error: {str(e)}"
+# ============================================================================
+# SPATIAL PERTURBATIONS
+# ============================================================================
+def apply_rotation(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
+    """
+    Apply rotation
+    degree: 1 (±5°), 2 (±10°), 3 (±15°)
+    """
+    if degree == 0:
+        return image, True, "No rotation"
+    try:
+        image = encode_to_rgb(image)
+        h, w = image.shape[:2]
+        # Angle ranges by degree
+        angle_ranges = {1: 5, 2: 10, 3: 15}
+        max_angle = angle_ranges.get(degree, 15)
+        # Random angle
+        angle = np.random.uniform(-max_angle, max_angle)
+        # Rotation matrix
+        center = (w // 2, h // 2)
+        rotation_matrix = cv2.getRotationMatrix2D(center, angle, 1.0)
+        # Apply rotation with white padding
+        rotated = cv2.warpAffine(image, rotation_matrix, (w, h), borderValue=(255, 255, 255))
+        return rotated, True, f"Rotation applied (angle={angle:.1f}°)"
+    except Exception as e:
+        return image, False, f"Rotation error: {str(e)}"
+def apply_keystoning(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
+    """
+    Apply keystoning effect (perspective distortion)
+    degree: 1 (subtle), 2 (moderate), 3 (severe)
+    """
+    if degree == 0:
+        return image, True, "No keystoning"
+    try:
+        image = encode_to_rgb(image)
+        h, w = image.shape[:2]
+        # Distortion amount
+        distortion = {1: w * 0.05, 2: w * 0.1, 3: w * 0.15}.get(degree, w * 0.15)
+        # Source corners
+        src_points = np.float32([
+            [0, 0],
+            [w - 1, 0],
+            [0, h - 1],
+            [w - 1, h - 1]
+        ])
+        # Destination corners (with perspective distortion)
+        dst_points = np.float32([
+            [distortion, 0],
+            [w - 1 - distortion * 0.5, 0],
+            [0, h - 1],
+            [w - 1, h - 1]
+        ])
+        # Get perspective transform
+        matrix = cv2.getPerspectiveTransform(src_points, dst_points)
+        warped = cv2.warpPerspective(image, matrix, (w, h), borderValue=(255, 255, 255))
+        return warped, True, f"Keystoning applied (distortion={distortion:.1f})"
+    except Exception as e:
+        return image, False, f"Keystoning error: {str(e)}"
+def apply_warping(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
+    """
+    Apply elastic/elastic deformation
+    degree: 1 (mild), 2 (moderate), 3 (severe)
+    """
+    if degree == 0:
+        return image, True, "No warping"
+    try:
+        image = encode_to_rgb(image)
+        h, w = image.shape[:2]
+        # Warping parameters
+        alpha_values = {1: 15, 2: 30, 3: 60}
+        sigma_values = {1: 3, 2: 5, 3: 8}
+        alpha = alpha_values.get(degree, 60)
+        sigma = sigma_values.get(degree, 8)
+        # Generate random displacement field
+        dx = np.random.randn(h, w) * sigma
+        dy = np.random.randn(h, w) * sigma
+        # Smooth displacement field
+        dx = gaussian_filter(dx, sigma=sigma) * alpha
+        dy = gaussian_filter(dy, sigma=sigma) * alpha
+        # Create coordinate grids
+        x, y = np.meshgrid(np.arange(w), np.arange(h))
+        # Apply displacement
+        x_warped = np.clip(x + dx, 0, w - 1).astype(np.float32)
+        y_warped = np.clip(y + dy, 0, h - 1).astype(np.float32)
+        # Remap image
+        warped = cv2.remap(image, x_warped, y_warped, cv2.INTER_LINEAR, borderValue=(255, 255, 255))
+        return warped, True, f"Warping applied (alpha={alpha}, sigma={sigma})"
+    except Exception as e:
+        return image, False, f"Warping error: {str(e)}"
+# ============================================================================
+# Main Perturbation Application
+# ============================================================================
+PERTURBATION_FUNCTIONS = {
+    # Blur
+    "defocus": apply_defocus,
+    "vibration": apply_vibration,
+    # Noise
+    "speckle": apply_speckle,
+    "texture": apply_texture,
+    # Content
+    "watermark": apply_watermark,
+    "background": apply_background,
+    # Inconsistency
+    "ink_holdout": apply_ink_holdout,
+    "ink_bleeding": apply_ink_bleeding,
+    "illumination": apply_illumination,
+    # Spatial
+    "rotation": apply_rotation,
+    "keystoning": apply_keystoning,
+    "warping": apply_warping,
+}
+def apply_perturbation(
+    image: np.ndarray,
+    perturbation_type: str,
+    degree: int = 1
+) -> Tuple[np.ndarray, bool, str]:
+    """
+    Apply a single perturbation to an image
+    Args:
+        image: Input image as numpy array (BGR or RGB)
+        perturbation_type: Type of perturbation (see PERTURBATION_FUNCTIONS)
+        degree: Severity level (1=mild, 2=moderate, 3=severe)
+    Returns:
+        Tuple of (result_image, success, message)
+    """
+    if perturbation_type not in PERTURBATION_FUNCTIONS:
+        return image, False, f"Unknown perturbation type: {perturbation_type}"
+    if degree < 0 or degree > 3:
+        return image, False, f"Invalid degree: {degree} (must be 0-3)"
+    func = PERTURBATION_FUNCTIONS[perturbation_type]
+    return func(image, degree)
+def apply_multiple_perturbations(
+    image: np.ndarray,
+    perturbations: List[Tuple[str, int]]
+) -> Tuple[np.ndarray, bool, str]:
+    """
+    Apply multiple perturbations in sequence
+    Args:
+        image: Input image
+        perturbations: List of (type, degree) tuples
+    Returns:
+        Tuple of (result_image, success, message)
+    """
+    result = image.copy()
+    messages = []
+    for ptype, degree in perturbations:
+        result, success, msg = apply_perturbation(result, ptype, degree)
+        messages.append(msg)
+        if not success:
+            return image, False, f"Failed: {msg}"
+    return result, True, " | ".join(messages)
+def get_perturbation_info() -> Dict:
+    """Get information about all available perturbations"""
+    return {
+        "total_perturbations": len(PERTURBATION_FUNCTIONS),
+        "types": list(PERTURBATION_FUNCTIONS.keys()),
+        "categories": {
+            "blur": ["defocus", "vibration"],
+            "noise": ["speckle", "texture"],
+            "content": ["watermark", "background"],
+            "inconsistency": ["ink_holdout", "ink_bleeding", "illumination"],
+            "spatial": ["rotation", "keystoning", "warping"]
+        }
+    }

deployment/backend/register_dino.py ADDED Viewed

	@@ -0,0 +1,68 @@

+"""
+Register DINO detector with MMDET if not already registered
+This allows loading RoDLA models without requiring DCNv3 compilation
+"""
+import sys
+from pathlib import Path
+def register_dino():
+    """Register DINO with MMDET model registry"""
+    try:
+        from mmdet.models.builder import DETECTORS, BACKBONES, NECKS, HEADS
+        # Check if already registered
+        if 'DINO' in DETECTORS.module_dict:
+            print("✅ DINO already registered in MMDET")
+            return True
+        print("⏳ Registering DINO detector...")
+        # Try to import and register custom models
+        # Use absolute path from /home/admin/CV/rodla-academic
+        REPO_ROOT = Path("/home/admin/CV/rodla-academic")
+        sys.path.insert(0, str(REPO_ROOT / "model"))
+        sys.path.insert(0, str(REPO_ROOT / "model" / "ops_dcnv3"))
+        try:
+            import mmdet_custom
+            if 'DINO' in DETECTORS.module_dict:
+                print("✅ DINO registered successfully from mmdet_custom")
+                return True
+            else:
+                print("⚠️  DINO not found in mmdet_custom registry")
+                return False
+        except ModuleNotFoundError as e:
+            if "DCNv3" in str(e):
+                print(f"⚠️  Cannot register DINO: DCNv3 module not available")
+                print(f"   Error: {e}")
+                return False
+            else:
+                print(f"❌ Error importing mmdet_custom: {e}")
+                return False
+    except Exception as e:
+        print(f"❌ Error registering DINO: {e}")
+        return False
+def try_load_with_dino_registration(config_path: str, checkpoint_path: str, device: str = "cpu"):
+    """Try to load a DINO model, registering it if necessary"""
+    from mmdet.apis import init_detector
+    # Try registering DINO first
+    dino_registered = register_dino()
+    if not dino_registered:
+        print("⚠️  DINO could not be registered")
+        print("   Will attempt to load anyway...")
+    # Try to load the model
+    try:
+        print(f"⏳ Loading model from {checkpoint_path}...")
+        model = init_detector(config_path, checkpoint_path, device=device)
+        print("✅ Model loaded successfully!")
+        return model
+    except Exception as e:
+        print(f"❌ Failed to load model: {e}")
+        return None

frontend/index.html CHANGED Viewed

@@ -106,12 +106,18 @@
             <!-- Action Buttons -->
             <section class="section button-section">
-                <button id="analyzeBtn" class="btn btn-primary" disabled>
                     [ANALYZE DOCUMENT]
                 </button>
                 <button id="resetBtn" class="btn btn-secondary">
                     [CLEAR ALL]
                 </button>
             </section>
             <!-- Status Section -->

             <!-- Action Buttons -->
             <section class="section button-section">
+                <button id="analyzeBtn" class="btn btn-primary" disabled title="(1) Upload image, (2) Make sure STANDARD mode is selected">
                     [ANALYZE DOCUMENT]
                 </button>
                 <button id="resetBtn" class="btn btn-secondary">
                     [CLEAR ALL]
                 </button>
+                <p id="modeHint" class="mode-hint" style="display: none; color: #00FF00; margin-top: 10px; font-size: 12px;">
+                    >>> Use [GENERATE PERTURBATIONS] button above to analyze with perturbations
+                </p>
+                <p id="standardModeHint" class="mode-hint" style="color: #00FF00; margin-top: 5px; font-size: 12px;">
+                    >>> STANDARD MODE: Upload an image and click [ANALYZE DOCUMENT] to detect layout
+                </p>
             </section>
             <!-- Status Section -->

frontend/script.js CHANGED Viewed

@@ -56,12 +56,30 @@ function setupEventListeners() {
             btn.classList.add('active');
             currentMode = btn.dataset.mode;
-            // Toggle perturbation options
             const pertOptions = document.getElementById('perturbationOptions');
             if (currentMode === 'perturbation') {
                 pertOptions.style.display = 'block';
             } else {
                 pertOptions.style.display = 'none';
             }
         });
     });
@@ -98,7 +116,12 @@ function handleFileSelect(file) {
     currentFile = file;
     showPreview(file);
-    document.getElementById('analyzeBtn').disabled = false;
 }
 function showPreview(file) {
@@ -121,39 +144,6 @@ function showPreview(file) {
 // ANALYSIS
 // ============================================
-async function handleAnalysis() {
-    if (!currentFile) {
-        showError('Please select an image first.');
-        return;
-    }
-    const analysisType = currentMode === 'standard' ? 'Standard Detection' : 'Perturbation Analysis';
-    updateStatus(`> INITIATING ${analysisType.toUpperCase()}...`);
-    showStatus();
-    hideError();
-    try {
-        const startTime = Date.now();
-        const results = await runAnalysis();
-        const processingTime = Date.now() - startTime;
-        lastResults = {
-            ...results,
-            processingTime: processingTime,
-            timestamp: new Date().toISOString(),
-            mode: currentMode,
-            fileName: currentFile.name
-        };
-        displayResults(results, processingTime);
-        hideStatus();
-    } catch (error) {
-        console.error('[ERROR]', error);
-        showError(`Analysis failed: ${error.message}`);
-        hideStatus();
-    }
-}
 async function handleAnalysis() {
     if (!currentFile) {
         showError('Please select an image first.');
@@ -178,8 +168,12 @@ async function handleAnalysis() {
         const processingTime = Date.now() - startTime;
         lastResults = {
             ...results,
             processingTime: processingTime,
             timestamp: new Date().toISOString(),
             mode: currentMode,
@@ -202,36 +196,72 @@ async function runAnalysis() {
     const threshold = parseFloat(document.getElementById('confidenceThreshold').value);
     formData.append('score_threshold', threshold);
-    if (currentMode === 'perturbation') {
-        // Get selected perturbation types
-        const perturbationTypes = [];
-        document.querySelectorAll('.checkbox-label input[type="checkbox"]:checked').forEach(checkbox => {
-            perturbationTypes.push(checkbox.value);
-        });
-        if (perturbationTypes.length === 0) {
-            throw new Error('Please select at least one perturbation type.');
-        }
-        formData.append('perturbation_types', perturbationTypes.join(','));
-        updateStatus('> APPLYING PERTURBATIONS...');
-        return await fetch(`${API_BASE_URL}/detect-with-perturbation`, {
-            method: 'POST',
-            body: formData
-        }).then(r => {
-            if (!r.ok) throw new Error(`API Error: ${r.status}`);
-            return r.json();
-        });
-    } else {
-        updateStatus('> RUNNING STANDARD DETECTION...');
-        return await fetch(`${API_BASE_URL}/detect`, {
             method: 'POST',
             body: formData
-        }).then(r => {
-            if (!r.ok) throw new Error(`API Error: ${r.status}`);
-            return r.json();
         });
     }
 }
@@ -291,16 +321,27 @@ function displayPerturbations(results) {
     }
     let html = `<div style="font-size: 0.9em; color: #00FFFF; margin-bottom: 15px; padding: 10px; border: 1px dashed #00FFFF;">
-        TOTAL: 12 Perturbation Types × 3 Degree Levels (1=Mild, 2=Moderate, 3=Severe)
     </div>`;
     // Add original
     html += `
         <div class="perturbation-grid-section">
             <div class="perturbation-type-label">[ORIGINAL IMAGE]</div>
             <div style="padding: 10px;">
                 <img src="data:image/png;base64,${results.perturbations.original.original}"
-                     alt="Original" class="perturbation-preview-image" style="width: 200px; height: auto;">
             </div>
         </div>
     `;
@@ -337,13 +378,24 @@ function displayPerturbations(results) {
                     const degreeLabel = ['MILD', 'MODERATE', 'SEVERE'][degree - 1];
                     if (results.perturbations[ptype][degreeKey]) {
                         html += `
                             <div style="text-align: center;">
                                 <div style="color: #00FFFF; font-size: 0.8em; margin-bottom: 5px;">DEG ${degree}: ${degreeLabel}</div>
                                 <img src="data:image/png;base64,${results.perturbations[ptype][degreeKey]}"
                                      alt="${ptype} degree ${degree}"
                                      class="perturbation-preview-image"
-                                     style="width: 150px; height: auto; border: 1px solid #008080; padding: 2px;">
                             </div>
                         `;
                     }
@@ -357,6 +409,33 @@ function displayPerturbations(results) {
     });
     container.innerHTML = html;
     section.style.display = 'block';
     section.scrollIntoView({ behavior: 'smooth' });
 }
@@ -376,11 +455,17 @@ function displayResults(results, processingTime) {
     document.getElementById('detectionCount').textContent = detections.length;
     document.getElementById('avgConfidence').textContent = `${avgConfidence}%`;
-    document.getElementById('processingTime').textContent = `${processingTime}ms`;
-    // Display image
-    if (results.annotated_image) {
-        document.getElementById('resultImage').src = `data:image/png;base64,${results.annotated_image}`;
     }
     // Class distribution
@@ -390,13 +475,114 @@ function displayResults(results, processingTime) {
     displayDetectionsTable(detections);
     // Metrics
-    displayMetrics(results.metrics || {});
     // Show results section
     document.getElementById('resultsSection').style.display = 'block';
     document.getElementById('resultsSection').scrollIntoView({ behavior: 'smooth' });
 }
 function displayClassDistribution(distribution) {
     const chart = document.getElementById('classChart');
@@ -429,30 +615,44 @@ function displayDetectionsTable(detections) {
     const tbody = document.getElementById('detectionsTableBody');
     if (detections.length === 0) {
-        tbody.innerHTML = '<tr><td colspan="4" class="no-data">NO DETECTIONS</td></tr>';
         return;
     }
     let html = '';
     detections.slice(0, 50).forEach((det, idx) => {
-        const box = det.box || {};
-        const x1 = box.x1 ? box.x1.toFixed(0) : '?';
-        const y1 = box.y1 ? box.y1.toFixed(0) : '?';
-        const x2 = box.x2 ? box.x2.toFixed(0) : '?';
-        const y2 = box.y2 ? box.y2.toFixed(0) : '?';
         html += `
             <tr>
                 <td>${idx + 1}</td>
-                <td>${det.class || 'Unknown'}</td>
-                <td>${(det.confidence * 100).toFixed(1)}%</td>
-                <td>[${x1},${y1},${x2},${y2}]</td>
             </tr>
         `;
     });
     if (detections.length > 50) {
-        html += `<tr><td colspan="4" class="no-data">... and ${detections.length - 50} more</td></tr>`;
     }
     tbody.innerHTML = html;
@@ -658,5 +858,76 @@ async function checkBackendStatus() {
 // UTILITY FUNCTIONS
 // ============================================
 console.log('[RODLA] Frontend loaded successfully. Ready for analysis.');
 console.log('[RODLA] Demo mode available if backend is unavailable.');

             btn.classList.add('active');
             currentMode = btn.dataset.mode;
+            // Toggle perturbation options and hint
             const pertOptions = document.getElementById('perturbationOptions');
+            const modeHint = document.getElementById('modeHint');
+            const standardModeHint = document.getElementById('standardModeHint');
+            const analyzeBtn = document.getElementById('analyzeBtn');
             if (currentMode === 'perturbation') {
+                // PERTURBATION MODE - allow analysis of original or perturbation images
                 pertOptions.style.display = 'block';
+                modeHint.style.display = 'block';
+                standardModeHint.style.display = 'none';
+                analyzeBtn.style.opacity = currentFile ? '1' : '0.5';
+                analyzeBtn.style.cursor = currentFile ? 'pointer' : 'not-allowed';
+                analyzeBtn.disabled = !currentFile;
+                analyzeBtn.title = 'Click to generate perturbations, then click on any image to analyze it';
             } else {
+                // STANDARD MODE
                 pertOptions.style.display = 'none';
+                modeHint.style.display = 'none';
+                standardModeHint.style.display = 'block';
+                analyzeBtn.style.opacity = currentFile ? '1' : '0.5';
+                analyzeBtn.style.cursor = currentFile ? 'pointer' : 'not-allowed';
+                analyzeBtn.disabled = !currentFile;
+                analyzeBtn.title = 'Click to analyze the document layout';
             }
         });
     });
     currentFile = file;
     showPreview(file);
+    // Enable analyze button only if in standard mode
+    const analyzeBtn = document.getElementById('analyzeBtn');
+    if (currentMode === 'standard') {
+        analyzeBtn.disabled = false;
+    }
 }
 function showPreview(file) {
 // ANALYSIS
 // ============================================
 async function handleAnalysis() {
     if (!currentFile) {
         showError('Please select an image first.');
         const processingTime = Date.now() - startTime;
+        // Read original image as base64 for annotation
+        const originalImageBase64 = await readFileAsBase64(currentFile);
         lastResults = {
             ...results,
+            original_image: originalImageBase64,
             processingTime: processingTime,
             timestamp: new Date().toISOString(),
             mode: currentMode,
     const threshold = parseFloat(document.getElementById('confidenceThreshold').value);
     formData.append('score_threshold', threshold);
+    // Only standard detection mode
+    updateStatus('> RUNNING STANDARD DETECTION...');
+    return await fetch(`${API_BASE_URL}/detect`, {
+        method: 'POST',
+        body: formData
+    }).then(r => {
+        if (!r.ok) throw new Error(`API Error: ${r.status}`);
+        return r.json();
+    });
+}
+async function analyzePerturbationImage(imageBase64, perturbationType, degree) {
+    // Analyze a specific perturbation image
+    updateStatus(`> ANALYZING ${perturbationType.toUpperCase()} (DEGREE ${degree})...`);
+    showStatus();
+    hideError();
+    try {
+        const startTime = Date.now();
+        // Convert base64 to blob and create file
+        const binaryString = atob(imageBase64);
+        const bytes = new Uint8Array(binaryString.length);
+        for (let i = 0; i < binaryString.length; i++) {
+            bytes[i] = binaryString.charCodeAt(i);
+        }
+        const blob = new Blob([bytes], { type: 'image/png' });
+        const file = new File([blob], `${perturbationType}_degree_${degree}.png`, { type: 'image/png' });
+        // Create form data
+        const formData = new FormData();
+        formData.append('file', file);
+        const threshold = parseFloat(document.getElementById('confidenceThreshold').value);
+        formData.append('score_threshold', threshold);
+        // Send to backend
+        const response = await fetch(`${API_BASE_URL}/detect`, {
             method: 'POST',
             body: formData
         });
+        if (!response.ok) {
+            throw new Error(`API Error: ${response.status}`);
+        }
+        const results = await response.json();
+        const processingTime = Date.now() - startTime;
+        // Store results with perturbation info
+        lastResults = {
+            ...results,
+            original_image: imageBase64,
+            processingTime: processingTime,
+            timestamp: new Date().toISOString(),
+            mode: 'perturbation',
+            perturbation_type: perturbationType,
+            perturbation_degree: degree,
+            fileName: `${perturbationType}_degree_${degree}.png`
+        };
+        displayResults(results, processingTime);
+        hideStatus();
+    } catch (error) {
+        console.error('[ERROR]', error);
+        showError(`Perturbation analysis failed: ${error.message}`);
+        hideStatus();
     }
 }
     }
     let html = `<div style="font-size: 0.9em; color: #00FFFF; margin-bottom: 15px; padding: 10px; border: 1px dashed #00FFFF;">
+        TOTAL: 12 Perturbation Types × 3 Degree Levels (1=Mild, 2=Moderate, 3=Severe) - CLICK ON ANY IMAGE TO ANALYZE
     </div>`;
+    // Store all perturbation images for clickable analysis
+    const perturbationImages = [];
     // Add original
+    perturbationImages.push({
+        name: 'original',
+        image: results.perturbations.original.original
+    });
     html += `
         <div class="perturbation-grid-section">
             <div class="perturbation-type-label">[ORIGINAL IMAGE]</div>
             <div style="padding: 10px;">
                 <img src="data:image/png;base64,${results.perturbations.original.original}"
+                     alt="Original" class="perturbation-preview-image"
+                     data-perturbation="original" data-degree="0"
+                     style="width: 200px; height: auto; cursor: pointer; border: 2px solid transparent; transition: all 0.2s;"
+                     title="Click to analyze this image">
             </div>
         </div>
     `;
                     const degreeLabel = ['MILD', 'MODERATE', 'SEVERE'][degree - 1];
                     if (results.perturbations[ptype][degreeKey]) {
+                        perturbationImages.push({
+                            name: ptype,
+                            degree: degree,
+                            image: results.perturbations[ptype][degreeKey]
+                        });
                         html += `
                             <div style="text-align: center;">
                                 <div style="color: #00FFFF; font-size: 0.8em; margin-bottom: 5px;">DEG ${degree}: ${degreeLabel}</div>
                                 <img src="data:image/png;base64,${results.perturbations[ptype][degreeKey]}"
                                      alt="${ptype} degree ${degree}"
                                      class="perturbation-preview-image"
+                                     data-perturbation="${ptype}"
+                                     data-degree="${degree}"
+                                     style="width: 150px; height: auto; border: 2px solid #008080; padding: 2px; cursor: pointer; transition: all 0.2s;"
+                                     title="Click to analyze this perturbation"
+                                     onmouseover="this.style.borderColor='#00FF00'; this.style.boxShadow='0 0 10px #00FF00';"
+                                     onmouseout="this.style.borderColor='#008080'; this.style.boxShadow='none';">
                             </div>
                         `;
                     }
     });
     container.innerHTML = html;
+    // Add click handlers to perturbation images
+    const perturbationImgs = container.querySelectorAll('[data-perturbation]');
+    perturbationImgs.forEach(img => {
+        img.addEventListener('click', async function() {
+            const perturbationType = this.dataset.perturbation;
+            const degree = this.dataset.degree;
+            // Find the image data
+            let imageBase64 = null;
+            if (perturbationType === 'original') {
+                imageBase64 = results.perturbations.original.original;
+            } else {
+                const degreeKey = `degree_${degree}`;
+                imageBase64 = results.perturbations[perturbationType][degreeKey];
+            }
+            if (!imageBase64) {
+                showError('Failed to load image for analysis');
+                return;
+            }
+            // Convert base64 to File object and analyze
+            await analyzePerturbationImage(imageBase64, perturbationType, degree);
+        });
+    });
     section.style.display = 'block';
     section.scrollIntoView({ behavior: 'smooth' });
 }
     document.getElementById('detectionCount').textContent = detections.length;
     document.getElementById('avgConfidence').textContent = `${avgConfidence}%`;
+    document.getElementById('processingTime').textContent = `${processingTime.toFixed(0)}ms`;
+    // Draw annotated image with bounding boxes
+    if (lastResults && lastResults.original_image) {
+        drawAnnotatedImage(lastResults.original_image, detections, results.image_width, results.image_height);
+    } else {
+        // Fallback: try to use previewImage
+        const previewImg = document.getElementById('previewImage');
+        if (previewImg && previewImg.src) {
+            drawAnnotatedImageFromSrc(previewImg.src, detections, results.image_width, results.image_height);
+        }
     }
     // Class distribution
     displayDetectionsTable(detections);
     // Metrics
+    displayMetrics(results, processingTime);
     // Show results section
     document.getElementById('resultsSection').style.display = 'block';
     document.getElementById('resultsSection').scrollIntoView({ behavior: 'smooth' });
 }
+function drawAnnotatedImage(imageBase64, detections, imgWidth, imgHeight) {
+    // Draw bounding boxes on image and display
+    const canvas = document.createElement('canvas');
+    const ctx = canvas.getContext('2d');
+    // Load image
+    const img = new Image();
+    img.onload = () => {
+        canvas.width = img.width;
+        canvas.height = img.height;
+        ctx.drawImage(img, 0, 0);
+        // Draw bounding boxes
+        detections.forEach((det, idx) => {
+            const bbox = det.bbox || {};
+            // Convert normalized coordinates to pixel coordinates
+            const x = bbox.x * img.width;
+            const y = bbox.y * img.height;
+            const w = bbox.width * img.width;
+            const h = bbox.height * img.height;
+            // Draw box
+            ctx.strokeStyle = '#00FF00';
+            ctx.lineWidth = 2;
+            ctx.strokeRect(x, y, w, h);
+            // Draw label
+            const label = `${det.class_name || 'Unknown'} (${(det.confidence * 100).toFixed(1)}%)`;
+            const fontSize = Math.max(12, Math.min(18, Math.floor(img.height / 30)));
+            ctx.font = `bold ${fontSize}px monospace`;
+            ctx.fillStyle = '#000000';
+            ctx.fillRect(x, y - fontSize - 5, ctx.measureText(label).width + 10, fontSize + 5);
+            ctx.fillStyle = '#00FF00';
+            ctx.fillText(label, x + 5, y - 5);
+        });
+        // Display canvas as image
+        const resultImage = document.getElementById('resultImage');
+        resultImage.src = canvas.toDataURL('image/png');
+        resultImage.style.display = 'block';
+    };
+    img.src = `data:image/png;base64,${imageBase64}`;
+}
+function drawAnnotatedImageFromSrc(imageSrc, detections, imgWidth, imgHeight) {
+    // Draw bounding boxes on image from data URL
+    const canvas = document.createElement('canvas');
+    const ctx = canvas.getContext('2d');
+    const img = new Image();
+    img.onload = () => {
+        canvas.width = img.width;
+        canvas.height = img.height;
+        ctx.drawImage(img, 0, 0);
+        // Draw bounding boxes with colors based on class
+        const colors = ['#00FF00', '#00FFFF', '#FF00FF', '#FFFF00', '#FF6600', '#00FF99'];
+        detections.forEach((det, idx) => {
+            const bbox = det.bbox || {};
+            // Convert normalized coordinates to pixel coordinates
+            const x = bbox.x * img.width;
+            const y = bbox.y * img.height;
+            const w = bbox.width * img.width;
+            const h = bbox.height * img.height;
+            // Select color
+            const color = colors[idx % colors.length];
+            // Draw box
+            ctx.strokeStyle = color;
+            ctx.lineWidth = 2;
+            ctx.strokeRect(x, y, w, h);
+            // Draw label background
+            const label = `${idx + 1}. ${det.class_name || 'Unknown'} (${(det.confidence * 100).toFixed(1)}%)`;
+            const fontSize = 14;
+            ctx.font = `bold ${fontSize}px monospace`;
+            const textWidth = ctx.measureText(label).width;
+            ctx.fillStyle = 'rgba(0, 0, 0, 0.7)';
+            ctx.fillRect(x, y - fontSize - 8, textWidth + 8, fontSize + 6);
+            ctx.fillStyle = color;
+            ctx.fillText(label, x + 4, y - 4);
+        });
+        // Display canvas as image
+        const resultImage = document.getElementById('resultImage');
+        resultImage.src = canvas.toDataURL('image/png');
+        resultImage.style.display = 'block';
+        resultImage.style.maxWidth = '100%';
+        resultImage.style.height = 'auto';
+        resultImage.style.border = '2px solid #00FF00';
+    };
+    img.src = imageSrc;
+}
 function displayClassDistribution(distribution) {
     const chart = document.getElementById('classChart');
     const tbody = document.getElementById('detectionsTableBody');
     if (detections.length === 0) {
+        tbody.innerHTML = '<tr><td colspan="5" class="no-data">NO DETECTIONS</td></tr>';
         return;
     }
     let html = '';
     detections.slice(0, 50).forEach((det, idx) => {
+        // Handle different bbox formats
+        const bbox = det.bbox || det.box || {};
+        // Convert normalized coordinates to pixel coordinates
+        let x = '?', y = '?', w = '?', h = '?';
+        if (bbox.x !== undefined && bbox.y !== undefined && bbox.width !== undefined && bbox.height !== undefined) {
+            x = bbox.x.toFixed(3);
+            y = bbox.y.toFixed(3);
+            w = bbox.width.toFixed(3);
+            h = bbox.height.toFixed(3);
+        } else if (bbox.x1 !== undefined && bbox.y1 !== undefined && bbox.x2 !== undefined && bbox.y2 !== undefined) {
+            x = bbox.x1.toFixed(0);
+            y = bbox.y1.toFixed(0);
+            w = (bbox.x2 - bbox.x1).toFixed(0);
+            h = (bbox.y2 - bbox.y1).toFixed(0);
+        }
+        const className = det.class_name || det.class || 'Unknown';
+        const confidence = det.confidence ? (det.confidence * 100).toFixed(1) : '0.0';
         html += `
             <tr>
                 <td>${idx + 1}</td>
+                <td>${className}</td>
+                <td>${confidence}%</td>
+                <td title="x: ${x}, y: ${y}, w: ${w}, h: ${h}">[${x.substring(0,5)}, ${y.substring(0,5)}, ${w.substring(0,5)}, ${h.substring(0,5)}]</td>
             </tr>
         `;
     });
     if (detections.length > 50) {
+        html += `<tr><td colspan="5" class="no-data">... and ${detections.length - 50} more</td></tr>`;
     }
     tbody.innerHTML = html;
 // UTILITY FUNCTIONS
 // ============================================
+function readFileAsBase64(file) {
+    return new Promise((resolve, reject) => {
+        const reader = new FileReader();
+        reader.onload = () => {
+            const result = reader.result;
+            // Extract base64 data without the data:image/png;base64, prefix
+            const base64 = result.split(',')[1];
+            resolve(base64);
+        };
+        reader.onerror = reject;
+        reader.readAsDataURL(file);
+    });
+}
+function displayMetrics(results, processingTime) {
+    const metricsDiv = document.getElementById('metricsBox');
+    if (!metricsDiv) return;
+    const detections = results.detections || [];
+    const confidences = detections.map(d => d.confidence || 0);
+    const avgConfidence = confidences.length > 0
+        ? (confidences.reduce((a, b) => a + b) / confidences.length * 100).toFixed(1)
+        : 0;
+    const maxConfidence = confidences.length > 0
+        ? (Math.max(...confidences) * 100).toFixed(1)
+        : 0;
+    const minConfidence = confidences.length > 0
+        ? (Math.min(...confidences) * 100).toFixed(1)
+        : 0;
+    // Determine detection mode
+    let detectionMode = 'HEURISTIC (CPU Fallback)';
+    let modelType = 'Heuristic Layout Detection';
+    if (results.detection_mode === 'mmdet') {
+        detectionMode = 'MMDET Neural Network';
+        modelType = 'DINO (InternImage-XL)';
+    }
+    const metricsHTML = `
+        <div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); gap: 12px;">
+            <div style="background: #1a1a1a; border: 2px solid #00FF00; border-radius: 4px; padding: 12px;">
+                <div style="color: #00FFFF; font-size: 12px; font-weight: bold;">DETECTION MODE</div>
+                <div style="color: #00FF00; font-size: 14px; margin-top: 4px;">${detectionMode}</div>
+            </div>
+            <div style="background: #1a1a1a; border: 2px solid #00FF00; border-radius: 4px; padding: 12px;">
+                <div style="color: #00FFFF; font-size: 12px; font-weight: bold;">MODEL TYPE</div>
+                <div style="color: #00FF00; font-size: 14px; margin-top: 4px;">${modelType}</div>
+            </div>
+            <div style="background: #1a1a1a; border: 2px solid #00FF00; border-radius: 4px; padding: 12px;">
+                <div style="color: #00FFFF; font-size: 12px; font-weight: bold;">PROCESSING TIME</div>
+                <div style="color: #00FF00; font-size: 14px; margin-top: 4px;">${processingTime.toFixed(0)}ms</div>
+            </div>
+            <div style="background: #1a1a1a; border: 2px solid #00FF00; border-radius: 4px; padding: 12px;">
+                <div style="color: #00FFFF; font-size: 12px; font-weight: bold;">AVG CONFIDENCE</div>
+                <div style="color: #00FF00; font-size: 14px; margin-top: 4px;">${avgConfidence}%</div>
+            </div>
+            <div style="background: #1a1a1a; border: 2px solid #00FF00; border-radius: 4px; padding: 12px;">
+                <div style="color: #00FFFF; font-size: 12px; font-weight: bold;">MAX CONFIDENCE</div>
+                <div style="color: #00FF00; font-size: 14px; margin-top: 4px;">${maxConfidence}%</div>
+            </div>
+            <div style="background: #1a1a1a; border: 2px solid #00FF00; border-radius: 4px; padding: 12px;">
+                <div style="color: #00FFFF; font-size: 12px; font-weight: bold;">MIN CONFIDENCE</div>
+                <div style="color: #00FF00; font-size: 14px; margin-top: 4px;">${minConfidence}%</div>
+            </div>
+        </div>
+    `;
+    metricsDiv.innerHTML = metricsHTML;
+}
 console.log('[RODLA] Frontend loaded successfully. Ready for analysis.');
 console.log('[RODLA] Demo mode available if backend is unavailable.');

setup.sh ADDED Viewed

	@@ -0,0 +1,59 @@

+#!/bin/bash
+# Exit immediately if a command exits with a non-zero status
+set -e
+# --- Configuration ---
+ENV_NAME="RoDLA"
+ENV_PATH="./$ENV_NAME"
+# URLs for PyTorch/Detectron2 wheels
+TORCH_VERSION="1.11.0+cu113"
+TORCH_URL="https://download.pytorch.org/whl/cu113/torch_stable.html"
+DETECTRON2_VERSION="cu113/torch1.11"
+DETECTRON2_URL="https://dl.fbaipublicfiles.com/detectron2/wheels/$DETECTRON2_VERSION/index.html"
+DCNV3_URL="https://github.com/OpenGVLab/InternImage/releases/download/whl_files/DCNv3-1.0+cu113torch1.11.0-cp37-cp37m-linux_x86_64.whl"
+# Check if the environment exists and activate it
+if [ ! -d "$ENV_PATH" ]; then
+    echo "❌ Error: Virtual environment '$ENV_NAME' not found at '$ENV_PATH'."
+    echo "Please ensure you have created the environment using 'python3.7 -m venv $ENV_NAME' first."
+    exit 1
+fi
+echo "--- 🛠️ Activating Virtual Environment: $ENV_NAME ---"
+# Deactivate if active, then activate the target environment
+# We use the full path to pip/python for reliability instead of 'source' which only affects the current shell session.
+export PATH="$ENV_PATH/bin:$PATH"
+# Check if the activation worked by checking the 'which python' command
+if ! command -v python | grep -q "$ENV_PATH"; then
+    echo "❌ Failed to set environment path. Aborting."
+    exit 1
+fi
+echo "--- 🗑️ Uninstalling Old PyTorch Packages (if present) ---"
+# Use the environment's pip (now in $PATH)
+pip uninstall torch torchvision torchaudio -y || true
+echo "--- 📦 Installing PyTorch 1.11.0+cu113 and Core Dependencies ---"
+# Note: We are using the correct PyTorch 1.11.0 versions that match the DCNv3 wheel.
+pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 -f "$TORCH_URL"
+echo "--- 📦 Installing OpenMMLab and Other Benchmarking Dependencies ---"
+pip install -U openmim
+# Ensure the full path to python is used for detectron2 (though it should be the venv python now)
+python -m pip install detectron2 -f "$DETECTRON2_URL"
+mim install mmcv-full==1.5.0
+pip install timm==0.6.11 mmdet==2.28.1
+pip install Pillow==9.5.0
+pip install opencv-python termcolor yacs pyyaml scipy
+echo "--- 🚀 Installing Compatible DCNv3 Wheel ---"
+pip install "$DCNV3_URL"
+echo "--- ✅ Setup Complete ---"
+echo "The $ENV_NAME environment is configured. To use it, run:"
+echo "source $ENV_PATH/bin/activate"

start.sh DELETED Viewed

@@ -1,143 +0,0 @@
-#!/bin/bash
-# RoDLA Complete Startup Script
-# Starts both frontend and backend services
-set -e
-# Colors
-RED='\033[0;31m'
-GREEN='\033[0;32m'
-YELLOW='\033[1;33m'
-BLUE='\033[0;34m'
-NC='\033[0m' # No Color
-# Header
-echo -e "${BLUE}╔════════════════════════════════════════════════════════════╗${NC}"
-echo -e "${BLUE}║        RoDLA DOCUMENT LAYOUT ANALYSIS - 90s Edition      ║${NC}"
-echo -e "${BLUE}║            Startup Script (Frontend + Backend)           ║${NC}"
-echo -e "${BLUE}╚════════════════════════════════════════════════════════════╝${NC}"
-echo ""
-# Get script directory
-SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
-cd "$SCRIPT_DIR"
-# Check if required directories exist
-if [ ! -d "deployment/backend" ]; then
-    echo -e "${RED}ERROR: deployment/backend directory not found${NC}"
-    exit 1
-fi
-if [ ! -d "frontend" ]; then
-    echo -e "${RED}ERROR: frontend directory not found${NC}"
-    exit 1
-fi
-# Check if Python is available
-if ! command -v python3 &> /dev/null; then
-    echo -e "${RED}ERROR: Python 3 is not installed${NC}"
-    exit 1
-fi
-echo -e "${GREEN}✓ System check passed${NC}"
-echo ""
-# Function to handle Ctrl+C
-cleanup() {
-    echo ""
-    echo -e "${YELLOW}Shutting down RoDLA...${NC}"
-    kill $BACKEND_PID 2>/dev/null || true
-    kill $FRONTEND_PID 2>/dev/null || true
-    echo -e "${GREEN}✓ Services stopped${NC}"
-    exit 0
-}
-# Set trap for Ctrl+C
-trap cleanup SIGINT
-# Check ports
-check_port() {
-    if lsof -Pi :$1 -sTCP:LISTEN -t >/dev/null 2>&1 ; then
-        return 0
-    else
-        return 1
-    fi
-}
-# Start Backend
-echo -e "${BLUE}[1/2] Starting Backend API (port 8000)...${NC}"
-if check_port 8000; then
-    echo -e "${YELLOW}⚠ Port 8000 is already in use${NC}"
-    read -p "Continue anyway? (y/n) " -n 1 -r
-    echo
-    if [[ ! $REPLY =~ ^[Yy]$ ]]; then
-        exit 1
-    fi
-fi
-cd "$SCRIPT_DIR/deployment/backend"
-python3 backend.py > /tmp/rodla_backend.log 2>&1 &
-BACKEND_PID=$!
-echo -e "${GREEN}✓ Backend started (PID: $BACKEND_PID)${NC}"
-sleep 2
-# Check if backend started successfully
-if ! kill -0 $BACKEND_PID 2>/dev/null; then
-    echo -e "${RED}✗ Backend failed to start${NC}"
-    echo -e "${RED}Check logs: cat /tmp/rodla_backend.log${NC}"
-    exit 1
-fi
-# Start Frontend
-echo -e "${BLUE}[2/2] Starting Frontend Server (port 8080)...${NC}"
-if check_port 8080; then
-    echo -e "${YELLOW}⚠ Port 8080 is already in use${NC}"
-    read -p "Continue anyway? (y/n) " -n 1 -r
-    echo
-    if [[ ! $REPLY =~ ^[Yy]$ ]]; then
-        kill $BACKEND_PID
-        exit 1
-    fi
-fi
-cd "$SCRIPT_DIR/frontend"
-python3 server.py > /tmp/rodla_frontend.log 2>&1 &
-FRONTEND_PID=$!
-echo -e "${GREEN}✓ Frontend started (PID: $FRONTEND_PID)${NC}"
-sleep 1
-# Summary
-echo ""
-echo -e "${BLUE}════════════════════════════════════════════════════════════${NC}"
-echo -e "${GREEN}✓ RoDLA System is Ready!${NC}"
-echo -e "${BLUE}════════════════════════════════════════════════════════════${NC}"
-echo ""
-echo -e "${YELLOW}Access Points:${NC}"
-echo -e "  🌐 Frontend:   ${BLUE}http://localhost:8080${NC}"
-echo -e "  🔌 Backend:    ${BLUE}http://localhost:8000${NC}"
-echo -e "  📚 API Docs:   ${BLUE}http://localhost:8000/docs${NC}"
-echo ""
-echo -e "${YELLOW}Services:${NC}"
-echo -e "  Backend PID: $BACKEND_PID"
-echo -e "  Frontend PID: $FRONTEND_PID"
-echo ""
-echo -e "${YELLOW}Logs:${NC}"
-echo -e "  Backend:  ${BLUE}tail -f /tmp/rodla_backend.log${NC}"
-echo -e "  Frontend: ${BLUE}tail -f /tmp/rodla_frontend.log${NC}"
-echo ""
-echo -e "${YELLOW}Usage:${NC}"
-echo -e "  1. Open ${BLUE}http://localhost:8080${NC} in your browser"
-echo -e "  2. Upload a document image"
-echo -e "  3. Select analysis mode (Standard or Perturbation)"
-echo -e "  4. Click [ANALYZE DOCUMENT]"
-echo -e "  5. Download results"
-echo ""
-echo -e "${YELLOW}Exit:${NC}"
-echo -e "  Press ${BLUE}Ctrl+C${NC} to stop all services"
-echo ""
-# Keep running
-wait