zeeshan commited on
Commit
0c00a14
Β·
1 Parent(s): c4ed69a

Inference Fix

Browse files
deployment/backend/README.md DELETED
@@ -1,2292 +0,0 @@
1
- # RoDLA Document Layout Analysis API
2
-
3
- <div align="center">
4
-
5
- ![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)
6
- ![FastAPI](https://img.shields.io/badge/FastAPI-0.100+-green.svg)
7
- ![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-red.svg)
8
- ![License](https://img.shields.io/badge/License-MIT-yellow.svg)
9
- ![CVPR](https://img.shields.io/badge/CVPR-2024-purple.svg)
10
-
11
- **A Production-Ready API for Robust Document Layout Analysis**
12
-
13
- [Features](#-features) β€’ [Installation](#-installation) β€’ [Quick Start](#-quick-start) β€’ [API Reference](#-api-reference) β€’ [Architecture](#-architecture) β€’ [Metrics](#-metrics-system)
14
-
15
- </div>
16
-
17
- ---
18
-
19
- ## πŸ“‹ Table of Contents
20
-
21
- 1. [Overview](#-overview)
22
- 2. [Features](#-features)
23
- 3. [System Requirements](#-system-requirements)
24
- 4. [Installation](#-installation)
25
- 5. [Quick Start](#-quick-start)
26
- 6. [Project Structure](#-project-structure)
27
- 7. [Architecture Deep Dive](#-architecture-deep-dive)
28
- 8. [Configuration](#-configuration)
29
- 9. [API Reference](#-api-reference)
30
- 10. [Metrics System](#-metrics-system)
31
- 11. [Visualization Engine](#-visualization-engine)
32
- 12. [Services Layer](#-services-layer)
33
- 13. [Utilities Reference](#-utilities-reference)
34
- 14. [Error Handling](#-error-handling)
35
- 15. [Performance Optimization](#-performance-optimization)
36
- 16. [Security Considerations](#-security-considerations)
37
- 17. [Testing](#-testing)
38
- 18. [Deployment](#-deployment)
39
- 19. [Troubleshooting](#-troubleshooting)
40
- 20. [Contributing](#-contributing)
41
- 21. [Citation](#-citation)
42
- 22. [License](#-license)
43
-
44
- ---
45
-
46
- ## 🎯 Overview
47
-
48
- ### What is RoDLA?
49
-
50
- RoDLA (Robust Document Layout Analysis) is a state-of-the-art deep learning model for detecting and classifying layout elements in document images. Published at **CVPR 2024**, it focuses on robustness to various perturbations including noise, blur, and geometric distortions.
51
-
52
- ### What is this API?
53
-
54
- This repository provides a **production-ready FastAPI wrapper** around the RoDLA model, featuring:
55
-
56
- - RESTful API endpoints for document analysis
57
- - Comprehensive metrics calculation (20+ metrics)
58
- - Automated visualization generation (8 chart types)
59
- - Robustness assessment based on the RoDLA paper
60
- - Human-readable interpretation of results
61
- - Modular, maintainable code architecture
62
-
63
- ### Key Statistics
64
-
65
- | Metric | Value |
66
- |--------|-------|
67
- | Clean mAP (M6Doc) | 70.0% |
68
- | Perturbed Average mAP | 61.7% |
69
- | mRD Score | 147.6 |
70
- | Max Detections/Image | 300 |
71
- | Supported Classes | 74 (M6Doc) |
72
-
73
- ---
74
-
75
- ## ✨ Features
76
-
77
- ### Core Capabilities
78
-
79
- | Feature | Description |
80
- |---------|-------------|
81
- | πŸ” **Multi-class Detection** | Detect 74+ document element types |
82
- | πŸ“Š **Comprehensive Metrics** | 20+ analytical metrics per image |
83
- | πŸ“ˆ **Auto Visualization** | 8 chart types generated automatically |
84
- | πŸ›‘οΈ **Robustness Analysis** | mPE and mRD estimation |
85
- | 🧠 **Smart Interpretation** | Human-readable analysis summaries |
86
- | ⚑ **GPU Acceleration** | CUDA support for fast inference |
87
- | πŸ“ **Flexible Output** | JSON, annotated images, or both |
88
-
89
- ### Document Element Types
90
-
91
- The model can detect various document elements including:
92
-
93
- ```
94
- Text Elements: Structural Elements: Visual Elements:
95
- β”œβ”€β”€ Paragraph β”œβ”€β”€ Header β”œβ”€β”€ Figure
96
- β”œβ”€β”€ Title β”œβ”€β”€ Footer β”œβ”€β”€ Table
97
- β”œβ”€β”€ Caption β”œβ”€β”€ Page Number β”œβ”€β”€ Chart
98
- β”œβ”€β”€ List β”œβ”€β”€ Section β”œβ”€β”€ Logo
99
- β”œβ”€β”€ Footnote β”œβ”€β”€ Column β”œβ”€β”€ Stamp
100
- └── Abstract └── Margin └── Equation
101
- ```
102
-
103
- ---
104
-
105
- ## πŸ’» System Requirements
106
-
107
- ### Hardware Requirements
108
-
109
- | Component | Minimum | Recommended |
110
- |-----------|---------|-------------|
111
- | CPU | 4 cores | 8+ cores |
112
- | RAM | 16 GB | 32 GB |
113
- | GPU | 8 GB VRAM | 16+ GB VRAM |
114
- | Storage | 10 GB | 20 GB |
115
-
116
- ### Software Requirements
117
-
118
- | Software | Version |
119
- |----------|---------|
120
- | Python | 3.8 - 3.10 |
121
- | CUDA | 11.7+ |
122
- | cuDNN | 8.5+ |
123
- | OS | Linux (Ubuntu 20.04+) / WSL2 |
124
-
125
- ### Python Dependencies
126
-
127
- ```
128
- # Core Framework
129
- fastapi>=0.100.0
130
- uvicorn>=0.23.0
131
- python-multipart>=0.0.6
132
-
133
- # ML/Deep Learning
134
- torch>=2.0.0
135
- mmdet>=3.0.0
136
- mmcv>=2.0.0
137
-
138
- # Data Processing
139
- numpy>=1.24.0
140
- pillow>=9.5.0
141
-
142
- # Visualization
143
- matplotlib>=3.7.0
144
- seaborn>=0.12.0
145
-
146
- # Utilities
147
- pydantic>=2.0.0
148
- ```
149
-
150
- ---
151
-
152
- ## πŸš€ Installation
153
-
154
- ### Step 1: Clone the Repository
155
-
156
- ```bash
157
- git clone https://github.com/yourusername/rodla-api.git
158
- cd rodla-api
159
- ```
160
-
161
- ### Step 2: Create Virtual Environment
162
-
163
- ```bash
164
- # Using conda (recommended)
165
- conda create -n rodla python=3.9
166
- conda activate rodla
167
-
168
- # Or using venv
169
- python -m venv venv
170
- source venv/bin/activate # Linux/Mac
171
- .\venv\Scripts\activate # Windows
172
- ```
173
-
174
- ### Step 3: Install PyTorch with CUDA
175
-
176
- ```bash
177
- # For CUDA 11.8
178
- pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
179
-
180
- # For CUDA 12.1
181
- pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
182
- ```
183
-
184
- ### Step 4: Install MMDetection
185
-
186
- ```bash
187
- pip install -U openmim
188
- mim install mmengine
189
- mim install mmcv>=2.0.0
190
- mim install mmdet>=3.0.0
191
- ```
192
-
193
- ### Step 5: Install Project Dependencies
194
-
195
- ```bash
196
- pip install -r requirements.txt
197
- ```
198
-
199
- ### Step 6: Download Model Weights
200
-
201
- ```bash
202
- # Download from official source
203
- wget https://path-to-weights/rodla_internimage_xl_m6doc.pth -O weights/rodla_internimage_xl_m6doc.pth
204
- ```
205
-
206
- ### Step 7: Configure Paths
207
-
208
- Edit `config/settings.py`:
209
-
210
- ```python
211
- REPO_ROOT = Path("/path/to/your/RoDLA")
212
- MODEL_CONFIG = REPO_ROOT / "model/configs/m6doc/rodla_internimage_xl_m6doc.py"
213
- MODEL_WEIGHTS = REPO_ROOT / "rodla_internimage_xl_m6doc.pth"
214
- ```
215
-
216
- ---
217
-
218
- ## ⚑ Quick Start
219
-
220
- ### Starting the Server
221
-
222
- ```bash
223
- # Development mode
224
- python backend.py
225
-
226
- # Production mode with uvicorn
227
- uvicorn backend:app --host 0.0.0.0 --port 8000 --workers 1
228
- ```
229
-
230
- ### Making Your First Request
231
-
232
- ```bash
233
- # Using curl
234
- curl -X POST "http://localhost:8000/api/detect" \
235
- -H "accept: application/json" \
236
- -F "file=@document.jpg" \
237
- -F "score_thr=0.3"
238
-
239
- # Get model information
240
- curl http://localhost:8000/api/model-info
241
- ```
242
-
243
- ### Python Client Example
244
-
245
- ```python
246
- import requests
247
-
248
- # Upload and analyze document
249
- with open("document.pdf", "rb") as f:
250
- response = requests.post(
251
- "http://localhost:8000/api/detect",
252
- files={"file": f},
253
- data={
254
- "score_thr": "0.3",
255
- "return_image": "false",
256
- "generate_visualizations": "true"
257
- }
258
- )
259
-
260
- result = response.json()
261
- print(f"Detected {result['core_results']['summary']['total_detections']} elements")
262
- ```
263
-
264
- ---
265
-
266
- ## πŸ“ Project Structure
267
-
268
- ```
269
- deployment/
270
- β”œβ”€β”€ backend.py # πŸš€ Main FastAPI application entry point
271
- β”œβ”€β”€ requirements.txt # πŸ“¦ Python dependencies
272
- β”œβ”€β”€ README.md # πŸ“– This documentation
273
- β”‚
274
- β”œβ”€β”€ config/ # βš™οΈ Configuration Layer
275
- β”‚ β”œβ”€β”€ __init__.py # Package initializer
276
- β”‚ └── settings.py # All configuration constants
277
- β”‚
278
- β”œβ”€β”€ core/ # 🧠 Core Application Layer
279
- β”‚ β”œβ”€β”€ __init__.py # Package initializer
280
- β”‚ β”œβ”€β”€ model_loader.py # Singleton model management
281
- β”‚ └── dependencies.py # FastAPI dependency injection
282
- β”‚
283
- β”œβ”€β”€ api/ # 🌐 API Layer
284
- β”‚ β”œβ”€β”€ __init__.py # Package initializer
285
- β”‚ β”œβ”€β”€ routes.py # API endpoint definitions
286
- β”‚ └── schemas.py # Pydantic request/response models
287
- β”‚
288
- β”œβ”€β”€ services/ # πŸ”§ Business Logic Layer
289
- β”‚ β”œβ”€β”€ __init__.py # Package initializer
290
- β”‚ β”œβ”€β”€ detection.py # Core detection logic
291
- β”‚ β”œβ”€β”€ processing.py # Result aggregation
292
- β”‚ β”œβ”€β”€ visualization.py # Chart generation (350+ lines)
293
- β”‚ └── interpretation.py # Human-readable insights
294
- β”‚
295
- β”œβ”€β”€ utils/ # πŸ› οΈ Utility Layer
296
- β”‚ β”œβ”€β”€ __init__.py # Package initializer
297
- β”‚ β”œβ”€β”€ helpers.py # General helper functions
298
- β”‚ β”œβ”€β”€ serialization.py # JSON conversion utilities
299
- β”‚ └── metrics/ # Metrics calculation modules
300
- β”‚ β”œβ”€β”€ __init__.py # Metrics package initializer
301
- β”‚ β”œβ”€β”€ core.py # Core detection metrics
302
- β”‚ β”œβ”€β”€ rodla.py # RoDLA-specific metrics
303
- β”‚ β”œβ”€β”€ spatial.py # Spatial distribution analysis
304
- β”‚ └── quality.py # Quality & complexity metrics
305
- β”‚
306
- └── outputs/ # πŸ“€ Output Directory
307
- β”œβ”€β”€ *.json # Detection results
308
- └── *.png # Visualization images
309
- ```
310
-
311
- ### File Count Summary
312
-
313
- | Layer | Files | Purpose |
314
- |-------|-------|---------|
315
- | Config | 2 | Configuration management |
316
- | Core | 3 | Model and dependency management |
317
- | API | 3 | HTTP endpoints and schemas |
318
- | Services | 5 | Business logic implementation |
319
- | Utils | 7 | Helper functions and metrics |
320
- | **Total** | **21** | Complete modular architecture |
321
-
322
- ---
323
-
324
- ## πŸ—οΈ Architecture Deep Dive
325
-
326
- ### Layered Architecture
327
-
328
- ```
329
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
330
- β”‚ CLIENT LAYER β”‚
331
- β”‚ (Web Browser / API Clients) β”‚
332
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
333
- β”‚ HTTP Requests
334
- β–Ό
335
- β”Œβ”€οΏ½οΏ½β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
336
- β”‚ API LAYER β”‚
337
- β”‚ api/routes.py β”‚
338
- β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
339
- β”‚ β”‚ GET /model-info β”‚ β”‚ POST /api/detect β”‚ β”‚
340
- β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
341
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
342
- β”‚ Validated Requests
343
- β–Ό
344
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
345
- β”‚ SERVICES LAYER β”‚
346
- β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
347
- β”‚ β”‚ detection.py β”‚ β”‚processing.py β”‚ β”‚ visualization.py β”‚β”‚
348
- β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚β”‚
349
- β”‚ β”‚ β€’ Inference β”‚ β”‚ β€’ Aggregate β”‚ β”‚ β€’ 8 Chart Types β”‚β”‚
350
- β”‚ β”‚ β€’ Processing β”‚ β”‚ β€’ Save JSON β”‚ β”‚ β€’ Base64 Encoding β”‚β”‚
351
- β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
352
- β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
353
- β”‚ β”‚ interpretation.py β”‚ β”‚
354
- β”‚ β”‚ β€’ Human-readable insights β”‚ β”‚
355
- β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
356
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
357
- β”‚ Data Processing
358
- β–Ό
359
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
360
- β”‚ UTILITIES LAYER β”‚
361
- β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
362
- β”‚ β”‚ utils/metrics/ β”‚β”‚
363
- β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚β”‚
364
- β”‚ β”‚ β”‚ core.py β”‚ β”‚rodla.py β”‚ β”‚spatial. β”‚ β”‚ quality.py β”‚ β”‚β”‚
365
- β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ py β”‚ β”‚ β”‚ β”‚β”‚
366
- β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚β”‚
367
- β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
368
- β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
369
- β”‚ β”‚ helpers.py β”‚ β”‚ serialization.py β”‚ β”‚
370
- β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
371
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
372
- β”‚ Model Operations
373
- β–Ό
374
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
375
- β”‚ CORE LAYER β”‚
376
- β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
377
- β”‚ β”‚ model_loader.py β”‚ β”‚ dependencies.py β”‚ β”‚
378
- β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
379
- β”‚ β”‚ β€’ Singleton Pattern β”‚ β”‚ β€’ FastAPI DI β”‚ β”‚
380
- β”‚ β”‚ β€’ GPU Management β”‚ β”‚ β€’ Model Injection β”‚ β”‚
381
- β”‚ β”‚ β€’ Lazy Loading β”‚ β”‚ β”‚ β”‚
382
- β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
383
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
384
- β”‚ Configuration
385
- β–Ό
386
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
387
- β”‚ CONFIG LAYER β”‚
388
- β”‚ config/settings.py β”‚
389
- β”‚ β€’ Paths β€’ Constants β€’ Baseline Metrics β€’ Thresholds β”‚
390
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
391
- ```
392
-
393
- ### Design Patterns Used
394
-
395
- | Pattern | Location | Purpose |
396
- |---------|----------|---------|
397
- | **Singleton** | `model_loader.py` | Single model instance |
398
- | **Factory** | `visualization.py` | Create multiple chart types |
399
- | **Dependency Injection** | `dependencies.py` | Inject model into routes |
400
- | **Repository** | `processing.py` | Abstract data persistence |
401
- | **Facade** | `routes.py` | Simplify complex subsystems |
402
- | **Strategy** | `metrics/` | Interchangeable metric algorithms |
403
-
404
- ### Data Flow Diagram
405
-
406
- ```
407
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
408
- β”‚ Image │───▢│ Upload │───▢│ Temp │───▢│ Model β”‚
409
- β”‚ File β”‚ β”‚ Handler β”‚ β”‚ File β”‚ β”‚ Inferenceβ”‚
410
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
411
- β”‚
412
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
413
- β–Ό
414
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
415
- β”‚ Raw │───▢│ Process │───▢│ Calculate│───▢│ Generate β”‚
416
- β”‚ Results β”‚ β”‚Detectionsβ”‚ β”‚ Metrics β”‚ β”‚ Viz β”‚
417
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
418
- β”‚
419
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
420
- β–Ό
421
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
422
- β”‚ Generate │───▢│ Assemble │───▢│ JSON β”‚
423
- β”‚ Interp. β”‚ β”‚ Response β”‚ β”‚ Response β”‚
424
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
425
- ```
426
-
427
- ---
428
-
429
- ## βš™οΈ Configuration
430
-
431
- ### config/settings.py
432
-
433
- This file centralizes all configuration parameters.
434
-
435
- ```python
436
- """
437
- Configuration Settings Module
438
- =============================
439
- All application constants and configuration in one place.
440
- """
441
-
442
- from pathlib import Path
443
-
444
- # =============================================================================
445
- # PATH CONFIGURATION
446
- # =============================================================================
447
-
448
- # Root directory of the RoDLA model repository
449
- REPO_ROOT = Path("/mnt/d/MyStuff/University/Current/CV/Project/RoDLA")
450
-
451
- # Model configuration file path
452
- MODEL_CONFIG = REPO_ROOT / "model/configs/m6doc/rodla_internimage_xl_m6doc.py"
453
-
454
- # Pre-trained model weights path
455
- MODEL_WEIGHTS = REPO_ROOT / "rodla_internimage_xl_m6doc.pth"
456
-
457
- # Output directory for results and visualizations
458
- OUTPUT_DIR = Path("outputs")
459
-
460
- # =============================================================================
461
- # MODEL CONFIGURATION
462
- # =============================================================================
463
-
464
- # Default confidence threshold for detections
465
- DEFAULT_SCORE_THRESHOLD = 0.3
466
-
467
- # Maximum number of detections per image
468
- MAX_DETECTIONS = 300
469
-
470
- # Model metadata
471
- MODEL_INFO = {
472
- "name": "RoDLA InternImage-XL",
473
- "paper": "RoDLA: Benchmarking the Robustness of Document Layout Analysis Models",
474
- "conference": "CVPR 2024",
475
- "backbone": "InternImage-XL",
476
- "framework": "DINO with Channel Attention + Average Pooling",
477
- "dataset": "M6Doc-P"
478
- }
479
-
480
- # =============================================================================
481
- # BASELINE PERFORMANCE METRICS
482
- # =============================================================================
483
-
484
- # Clean performance baselines from the RoDLA paper (mAP scores)
485
- BASELINE_MAP = {
486
- "M6Doc": 70.0, # Main evaluation dataset
487
- "PubLayNet": 96.0, # Scientific documents
488
- "DocLayNet": 80.5 # Diverse document types
489
- }
490
-
491
- # State-of-the-art performance metrics
492
- SOTA_PERFORMANCE = {
493
- "clean_mAP": 70.0,
494
- "perturbed_avg_mAP": 61.7,
495
- "mRD_score": 147.6
496
- }
497
-
498
- # =============================================================================
499
- # ANALYSIS THRESHOLDS
500
- # =============================================================================
501
-
502
- # Size distribution thresholds (as percentage of image area)
503
- SIZE_THRESHOLDS = {
504
- "tiny": 0.005, # < 0.5% of image
505
- "small": 0.02, # 0.5% - 2%
506
- "medium": 0.1, # 2% - 10%
507
- "large": 1.0 # >= 10%
508
- }
509
-
510
- # Confidence level thresholds
511
- CONFIDENCE_THRESHOLDS = {
512
- "very_high": 0.9,
513
- "high": 0.8,
514
- "medium": 0.6,
515
- "low": 0.4
516
- }
517
-
518
- # Robustness assessment thresholds
519
- ROBUSTNESS_THRESHOLDS = {
520
- "mPE_low": 20,
521
- "mPE_medium": 40,
522
- "mRD_excellent": 100,
523
- "mRD_good": 150,
524
- "cv_stable": 0.15,
525
- "cv_moderate": 0.30
526
- }
527
-
528
- # Complexity scoring weights
529
- COMPLEXITY_WEIGHTS = {
530
- "class_diversity": 30,
531
- "detection_count": 30,
532
- "density": 20,
533
- "clustering": 20
534
- }
535
-
536
- # =============================================================================
537
- # API CONFIGURATION
538
- # =============================================================================
539
-
540
- # CORS settings
541
- CORS_ORIGINS = ["*"] # Restrict in production
542
- CORS_METHODS = ["*"]
543
- CORS_HEADERS = ["*"]
544
-
545
- # API metadata
546
- API_TITLE = "RoDLA Object Detection API"
547
- API_VERSION = "1.0.0"
548
- API_DESCRIPTION = "Production-ready API for Robust Document Layout Analysis"
549
-
550
- # =============================================================================
551
- # VISUALIZATION CONFIGURATION
552
- # =============================================================================
553
-
554
- # Figure sizes for different chart types
555
- FIGURE_SIZES = {
556
- "bar_chart": (12, 6),
557
- "histogram": (10, 6),
558
- "heatmap": (10, 8),
559
- "boxplot": (12, 6),
560
- "scatter": (10, 6),
561
- "pie": (8, 8)
562
- }
563
-
564
- # Color schemes
565
- COLOR_SCHEMES = {
566
- "primary": "steelblue",
567
- "secondary": "forestgreen",
568
- "accent": "coral",
569
- "heatmap": "YlOrRd",
570
- "scatter": "viridis"
571
- }
572
-
573
- # DPI for saved images
574
- VISUALIZATION_DPI = 100
575
- ```
576
-
577
- ### Environment Variables
578
-
579
- For production deployments, use environment variables:
580
-
581
- ```bash
582
- # .env file
583
- RODLA_REPO_ROOT=/path/to/RoDLA
584
- RODLA_MODEL_CONFIG=model/configs/m6doc/rodla_internimage_xl_m6doc.py
585
- RODLA_MODEL_WEIGHTS=rodla_internimage_xl_m6doc.pth
586
- RODLA_OUTPUT_DIR=outputs
587
- RODLA_DEFAULT_THRESHOLD=0.3
588
- RODLA_API_HOST=0.0.0.0
589
- RODLA_API_PORT=8000
590
- ```
591
-
592
- ---
593
-
594
- ## 🌐 API Reference
595
-
596
- ### Endpoints Overview
597
-
598
- | Method | Endpoint | Description |
599
- |--------|----------|-------------|
600
- | GET | `/api/model-info` | Get model metadata |
601
- | POST | `/api/detect` | Analyze document image |
602
- | GET | `/health` | Health check (if implemented) |
603
- | GET | `/docs` | Swagger UI documentation |
604
- | GET | `/redoc` | ReDoc documentation |
605
-
606
- ---
607
-
608
- ### GET /api/model-info
609
-
610
- Returns comprehensive information about the loaded model.
611
-
612
- #### Request
613
-
614
- ```http
615
- GET /api/model-info HTTP/1.1
616
- Host: localhost:8000
617
- ```
618
-
619
- #### Response
620
-
621
- ```json
622
- {
623
- "model_name": "RoDLA InternImage-XL",
624
- "paper": "RoDLA: Benchmarking the Robustness of Document Layout Analysis Models (CVPR 2024)",
625
- "num_classes": 74,
626
- "classes": [
627
- "paragraph", "title", "figure", "table", "caption",
628
- "header", "footer", "page_number", "list", "abstract",
629
- // ... additional classes
630
- ],
631
- "backbone": "InternImage-XL",
632
- "detection_framework": "DINO with Channel Attention + Average Pooling",
633
- "dataset": "M6Doc-P",
634
- "max_detections_per_image": 300,
635
- "state_of_the_art_performance": {
636
- "clean_mAP": 70.0,
637
- "perturbed_avg_mAP": 61.7,
638
- "mRD_score": 147.6
639
- }
640
- }
641
- ```
642
-
643
- #### Error Responses
644
-
645
- | Status | Description |
646
- |--------|-------------|
647
- | 500 | Model not loaded |
648
-
649
- ---
650
-
651
- ### POST /api/detect
652
-
653
- Analyzes a document image and returns comprehensive detection results.
654
-
655
- #### Request
656
-
657
- ```http
658
- POST /api/detect HTTP/1.1
659
- Host: localhost:8000
660
- Content-Type: multipart/form-data
661
-
662
- file: <binary image data>
663
- score_thr: "0.3"
664
- return_image: "false"
665
- save_json: "true"
666
- generate_visualizations: "true"
667
- ```
668
-
669
- #### Parameters
670
-
671
- | Parameter | Type | Default | Description |
672
- |-----------|------|---------|-------------|
673
- | `file` | File | Required | Image file (JPEG, PNG, etc.) |
674
- | `score_thr` | string | "0.3" | Confidence threshold (0.0-1.0) |
675
- | `return_image` | string | "false" | Return annotated image instead of JSON |
676
- | `save_json` | string | "true" | Save results to disk |
677
- | `generate_visualizations` | string | "true" | Generate visualization charts |
678
-
679
- #### Response (JSON Mode)
680
-
681
- ```json
682
- {
683
- "success": true,
684
- "timestamp": "2024-01-15T10:30:45.123456",
685
- "filename": "document.jpg",
686
-
687
- "image_info": {
688
- "width": 2480,
689
- "height": 3508,
690
- "aspect_ratio": 0.707,
691
- "total_pixels": 8699840
692
- },
693
-
694
- "detection_config": {
695
- "score_threshold": 0.3,
696
- "model": "RoDLA InternImage-XL",
697
- "framework": "DINO with Robustness Enhancement",
698
- "max_detections": 300
699
- },
700
-
701
- "core_results": {
702
- "summary": {
703
- "total_detections": 47,
704
- "unique_classes": 12,
705
- "average_confidence": 0.7823,
706
- "median_confidence": 0.8156,
707
- "min_confidence": 0.3012,
708
- "max_confidence": 0.9876,
709
- "coverage_percentage": 68.45,
710
- "average_detection_area": 126543.21
711
- },
712
- "detections": [/* top 20 detections */]
713
- },
714
-
715
- "rodla_metrics": {
716
- "note": "Estimated metrics...",
717
- "estimated_mPE": 18.45,
718
- "estimated_mRD": 87.32,
719
- "confidence_std": 0.1234,
720
- "confidence_range": 0.6864,
721
- "robustness_score": 56.34,
722
- "interpretation": {
723
- "mPE_level": "low",
724
- "mRD_level": "excellent",
725
- "overall_robustness": "medium"
726
- }
727
- },
728
-
729
- "spatial_analysis": {
730
- "horizontal_distribution": {...},
731
- "vertical_distribution": {...},
732
- "quadrant_distribution": {...},
733
- "size_distribution": {...},
734
- "density_metrics": {...}
735
- },
736
-
737
- "class_analysis": {
738
- "paragraph": {
739
- "count": 15,
740
- "percentage": 31.91,
741
- "confidence_stats": {...},
742
- "area_stats": {...},
743
- "aspect_ratio_stats": {...}
744
- },
745
- // ... other classes
746
- },
747
-
748
- "confidence_analysis": {
749
- "distribution": {...},
750
- "binned_distribution": {...},
751
- "percentages": {...},
752
- "entropy": 2.3456
753
- },
754
-
755
- "robustness_indicators": {
756
- "stability_score": 87.65,
757
- "coefficient_of_variation": 0.1234,
758
- "high_confidence_ratio": 0.7234,
759
- "prediction_consistency": "high",
760
- "model_certainty": "medium",
761
- "robustness_rating": {
762
- "rating": "good",
763
- "score": 72.34
764
- }
765
- },
766
-
767
- "layout_complexity": {
768
- "class_diversity": 12,
769
- "total_elements": 47,
770
- "detection_density": 5.41,
771
- "average_element_distance": 234.56,
772
- "complexity_score": 58.23,
773
- "complexity_level": "moderate",
774
- "layout_characteristics": {
775
- "is_dense": true,
776
- "is_diverse": true,
777
- "is_structured": false
778
- }
779
- },
780
-
781
- "quality_metrics": {
782
- "overlap_analysis": {...},
783
- "size_consistency": {...},
784
- "detection_quality_score": 82.45
785
- },
786
-
787
- "visualizations": {
788
- "class_distribution": "data:image/png;base64,...",
789
- "confidence_distribution": "data:image/png;base64,...",
790
- "spatial_heatmap": "data:image/png;base64,...",
791
- "confidence_by_class": "data:image/png;base64,...",
792
- "area_vs_confidence": "data:image/png;base64,...",
793
- "quadrant_distribution": "data:image/png;base64,...",
794
- "size_distribution": "data:image/png;base64,...",
795
- "top_classes_confidence": "data:image/png;base64,..."
796
- },
797
-
798
- "interpretation": {
799
- "overview": "Document Analysis Summary...",
800
- "top_elements": "The most common elements are...",
801
- "rodla_analysis": "RoDLA Robustness Analysis...",
802
- "layout_complexity": "Layout Complexity...",
803
- "key_findings": [...],
804
- "perturbation_assessment": "...",
805
- "recommendations": [...],
806
- "confidence_summary": {...}
807
- },
808
-
809
- "all_detections": [/* complete detection list */]
810
- }
811
- ```
812
-
813
- #### Response (Image Mode)
814
-
815
- When `return_image=true`, returns the annotated image directly:
816
-
817
- ```http
818
- HTTP/1.1 200 OK
819
- Content-Type: image/jpeg
820
- Content-Disposition: attachment; filename="annotated_document.jpg"
821
-
822
- <binary image data>
823
- ```
824
-
825
- #### Error Responses
826
-
827
- | Status | Description |
828
- |--------|-------------|
829
- | 400 | Invalid file type (not an image) |
830
- | 500 | Model inference failed |
831
- | 500 | Visualization generation failed |
832
-
833
- ---
834
-
835
- ## πŸ“Š Metrics System
836
-
837
- ### Metrics Architecture
838
-
839
- ```
840
- utils/metrics/
841
- β”œβ”€β”€ __init__.py # Exports all metric functions
842
- β”œβ”€β”€ core.py # Core detection metrics
843
- β”œβ”€β”€ rodla.py # RoDLA-specific robustness metrics
844
- β”œβ”€β”€ spatial.py # Spatial distribution analysis
845
- └── quality.py # Quality and complexity metrics
846
- ```
847
-
848
- ### Core Metrics (utils/metrics/core.py)
849
-
850
- #### `calculate_core_metrics(detections, img_width, img_height)`
851
-
852
- Computes fundamental detection statistics.
853
-
854
- | Metric | Type | Description |
855
- |--------|------|-------------|
856
- | `total_detections` | int | Number of detected elements |
857
- | `unique_classes` | int | Number of distinct element types |
858
- | `average_confidence` | float | Mean confidence score |
859
- | `median_confidence` | float | Median confidence score |
860
- | `min_confidence` | float | Lowest confidence |
861
- | `max_confidence` | float | Highest confidence |
862
- | `coverage_percentage` | float | % of image covered by detections |
863
- | `average_detection_area` | float | Mean area per detection |
864
-
865
- #### `calculate_class_metrics(detections)`
866
-
867
- Per-class statistical analysis.
868
-
869
- ```python
870
- {
871
- "paragraph": {
872
- "count": 15,
873
- "percentage": 31.91,
874
- "confidence_stats": {
875
- "mean": 0.8234,
876
- "std": 0.0876,
877
- "min": 0.6543,
878
- "max": 0.9654
879
- },
880
- "area_stats": {
881
- "mean": 125432.5,
882
- "std": 45678.2,
883
- "total": 1881487.5
884
- },
885
- "aspect_ratio_stats": {
886
- "mean": 2.345,
887
- "orientation": "horizontal" # horizontal/vertical/square
888
- }
889
- }
890
- }
891
- ```
892
-
893
- #### `calculate_confidence_metrics(detections)`
894
-
895
- Detailed confidence distribution analysis.
896
-
897
- | Component | Description |
898
- |-----------|-------------|
899
- | `distribution` | Statistical measures (mean, median, std, quartiles) |
900
- | `binned_distribution` | Count per confidence range |
901
- | `percentages` | Percentage per confidence range |
902
- | `entropy` | Shannon entropy of distribution |
903
-
904
- **Confidence Bins:**
905
- - Very High: 0.9 - 1.0
906
- - High: 0.8 - 0.9
907
- - Medium: 0.6 - 0.8
908
- - Low: 0.4 - 0.6
909
- - Very Low: 0.0 - 0.4
910
-
911
- ---
912
-
913
- ### RoDLA Metrics (utils/metrics/rodla.py)
914
-
915
- These metrics are specific to the RoDLA paper's robustness evaluation framework.
916
-
917
- #### `calculate_rodla_metrics(detections, core_metrics)`
918
-
919
- Estimates perturbation effects and robustness degradation.
920
-
921
- | Metric | Formula | Interpretation |
922
- |--------|---------|----------------|
923
- | `estimated_mPE` | `(conf_std Γ— 100) + (conf_range Γ— 50)` | Mean Perturbation Effect |
924
- | `estimated_mRD` | `(degradation / mPE) Γ— 100` | Mean Robustness Degradation |
925
- | `robustness_score` | `(1 - mRD/200) Γ— 100` | Overall robustness (0-100) |
926
-
927
- **mPE Interpretation:**
928
- ```
929
- low: mPE < 20 β†’ Minimal perturbation effect
930
- medium: 20 ≀ mPE < 40 β†’ Moderate perturbation
931
- high: mPE β‰₯ 40 β†’ Significant perturbation
932
- ```
933
-
934
- **mRD Interpretation:**
935
- ```
936
- excellent: mRD < 100 β†’ Highly robust
937
- good: 100 ≀ mRD < 150 β†’ Acceptable robustness
938
- needs_improvement: mRD β‰₯ 150 β†’ Robustness concerns
939
- ```
940
-
941
- #### `calculate_robustness_indicators(detections, core_metrics)`
942
-
943
- Stability and consistency metrics.
944
-
945
- ```python
946
- {
947
- "stability_score": 87.65, # (1 - CV) Γ— 100
948
- "coefficient_of_variation": 0.12, # std / mean
949
- "high_confidence_ratio": 0.72, # % with conf β‰₯ 0.8
950
- "prediction_consistency": "high", # Based on CV
951
- "model_certainty": "medium", # Based on avg conf
952
- "robustness_rating": {
953
- "rating": "good", # excellent/good/fair/poor
954
- "score": 72.34 # Composite score
955
- }
956
- }
957
- ```
958
-
959
- **Robustness Rating Formula:**
960
- ```
961
- score = (avg_conf Γ— 40) + ((1 - CV) Γ— 30) + (high_conf_ratio Γ— 30)
962
-
963
- Rating:
964
- - excellent: score β‰₯ 80
965
- - good: 60 ≀ score < 80
966
- - fair: 40 ≀ score < 60
967
- - poor: score < 40
968
- ```
969
-
970
- ---
971
-
972
- ### Spatial Metrics (utils/metrics/spatial.py)
973
-
974
- #### `calculate_spatial_analysis(detections, img_width, img_height)`
975
-
976
- Comprehensive spatial distribution analysis.
977
-
978
- ##### Horizontal Distribution
979
- ```python
980
- {
981
- "mean": 1240.5, # Mean x-coordinate
982
- "std": 456.7, # Standard deviation
983
- "skewness": -0.234, # Distribution asymmetry
984
- "left_third": 12, # Count in left 33%
985
- "center_third": 25, # Count in center 33%
986
- "right_third": 10 # Count in right 33%
987
- }
988
- ```
989
-
990
- ##### Vertical Distribution
991
- ```python
992
- {
993
- "mean": 1754.2, # Mean y-coordinate
994
- "std": 892.4, # Standard deviation
995
- "skewness": 0.156, # Distribution asymmetry
996
- "top_third": 8, # Count in top 33%
997
- "middle_third": 22, # Count in middle 33%
998
- "bottom_third": 17 # Count in bottom 33%
999
- }
1000
- ```
1001
-
1002
- ##### Quadrant Distribution
1003
- ```
1004
- Document divided into 4 quadrants:
1005
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
1006
- β”‚ Q1 β”‚ Q2 β”‚
1007
- β”‚(top-L) β”‚(top-R) β”‚
1008
- β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
1009
- β”‚ Q3 β”‚ Q4 β”‚
1010
- β”‚(bot-L) β”‚(bot-R) β”‚
1011
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
1012
- ```
1013
-
1014
- ##### Size Distribution
1015
- | Category | Threshold | Description |
1016
- |----------|-----------|-------------|
1017
- | tiny | < 0.5% of image | Very small elements |
1018
- | small | 0.5% - 2% | Small elements |
1019
- | medium | 2% - 10% | Medium elements |
1020
- | large | β‰₯ 10% | Large elements |
1021
-
1022
- ##### Density Metrics
1023
- ```python
1024
- {
1025
- "average_nearest_neighbor_distance": 234.56, # pixels
1026
- "spatial_clustering_score": 0.67 # 0-1, higher = more clustered
1027
- }
1028
- ```
1029
-
1030
- ---
1031
-
1032
- ### Quality Metrics (utils/metrics/quality.py)
1033
-
1034
- #### `calculate_layout_complexity(detections, img_width, img_height)`
1035
-
1036
- Quantifies document structure complexity.
1037
-
1038
- **Complexity Score Formula:**
1039
- ```
1040
- score = (class_diversity / 20) Γ— 30 # Max 20 classes
1041
- + min(detections / 50, 1) Γ— 30 # Detection count
1042
- + min(density / 10, 1) Γ— 20 # Spatial density
1043
- + (1 - min(avg_dist / 500, 1)) Γ— 20 # Clustering
1044
- ```
1045
-
1046
- **Complexity Levels:**
1047
- | Level | Score Range | Description |
1048
- |-------|-------------|-------------|
1049
- | simple | < 30 | Basic document layout |
1050
- | moderate | 30 - 60 | Average complexity |
1051
- | complex | β‰₯ 60 | Complex multi-element layout |
1052
-
1053
- **Layout Characteristics:**
1054
- ```python
1055
- {
1056
- "is_dense": True, # density > 5 elements/megapixel
1057
- "is_diverse": True, # unique_classes β‰₯ 10
1058
- "is_structured": False # avg_distance < 200 pixels
1059
- }
1060
- ```
1061
-
1062
- #### `calculate_quality_metrics(detections, img_width, img_height)`
1063
-
1064
- Detection quality assessment.
1065
-
1066
- ##### Overlap Analysis
1067
- ```python
1068
- {
1069
- "total_overlapping_pairs": 5, # Number of overlapping detection pairs
1070
- "overlap_percentage": 10.64, # % of detections with overlaps
1071
- "average_iou": 0.1234 # Mean IoU of overlapping pairs
1072
- }
1073
- ```
1074
-
1075
- ##### Size Consistency
1076
- ```python
1077
- {
1078
- "coefficient_of_variation": 0.876, # std/mean of areas
1079
- "consistency_level": "medium" # high (<0.5), medium (0.5-1), low (>1)
1080
- }
1081
- ```
1082
-
1083
- ##### Detection Quality Score
1084
- ```
1085
- score = (1 - min(overlap_% / 100, 1)) Γ— 50 + (1 - min(size_cv, 1)) Γ— 50
1086
- ```
1087
-
1088
- ---
1089
-
1090
- ## πŸ“ˆ Visualization Engine
1091
-
1092
- ### services/visualization.py
1093
-
1094
- The visualization engine generates 8 distinct chart types, each providing unique insights into the detection results.
1095
-
1096
- ### Chart Types
1097
-
1098
- #### 1. Class Distribution Bar Chart
1099
- ```
1100
- Purpose: Show count of detections per class
1101
- Type: Vertical bar chart
1102
- Features:
1103
- - Sorted by count (descending)
1104
- - Value labels on bars
1105
- - Rotated x-axis labels for readability
1106
- - Grid lines for easy reading
1107
- ```
1108
-
1109
- #### 2. Confidence Distribution Histogram
1110
- ```
1111
- Purpose: Show distribution of confidence scores
1112
- Type: Histogram with 20 bins
1113
- Features:
1114
- - Mean line (red dashed)
1115
- - Median line (orange dashed)
1116
- - Legend with exact values
1117
- - Grid lines
1118
- ```
1119
-
1120
- #### 3. Spatial Distribution Heatmap
1121
- ```
1122
- Purpose: Visualize where detections are concentrated
1123
- Type: 2D histogram heatmap
1124
- Features:
1125
- - YlOrRd colormap (yellow to red)
1126
- - Colorbar showing density
1127
- - Axes showing pixel coordinates
1128
- ```
1129
-
1130
- #### 4. Confidence by Class Box Plot
1131
- ```
1132
- Purpose: Compare confidence distributions across classes
1133
- Type: Box plot
1134
- Features:
1135
- - Top 10 classes by count
1136
- - Sample sizes in labels
1137
- - Median, quartiles, outliers
1138
- - Light blue boxes
1139
- ```
1140
-
1141
- #### 5. Area vs Confidence Scatter Plot
1142
- ```
1143
- Purpose: Examine relationship between size and confidence
1144
- Type: Scatter plot
1145
- Features:
1146
- - Color-coded by confidence (viridis)
1147
- - Colorbar showing scale
1148
- - Grid for reading values
1149
- ```
1150
-
1151
- #### 6. Quadrant Distribution Pie Chart
1152
- ```
1153
- Purpose: Show spatial distribution by quadrant
1154
- Type: Pie chart
1155
- Features:
1156
- - 4 segments (Q1-Q4)
1157
- - Percentage labels
1158
- - Element counts in labels
1159
- - Distinct colors per quadrant
1160
- ```
1161
-
1162
- #### 7. Size Distribution Bar Chart
1163
- ```
1164
- Purpose: Show distribution of detection sizes
1165
- Type: Vertical bar chart
1166
- Features:
1167
- - 4 categories (tiny, small, medium, large)
1168
- - Distinct color per category
1169
- - Value labels on bars
1170
- ```
1171
-
1172
- #### 8. Top Classes by Average Confidence
1173
- ```
1174
- Purpose: Identify most confidently detected classes
1175
- Type: Horizontal bar chart
1176
- Features:
1177
- - Top 15 classes
1178
- - Sorted by confidence
1179
- - Value labels
1180
- - Coral color scheme
1181
- ```
1182
-
1183
- ### Technical Implementation
1184
-
1185
- ```python
1186
- def generate_comprehensive_visualizations(
1187
- detections: List[dict],
1188
- class_metrics: dict,
1189
- confidence_metrics: dict,
1190
- spatial_metrics: dict,
1191
- img_width: int,
1192
- img_height: int
1193
- ) -> dict:
1194
- """
1195
- Generate all visualization types.
1196
-
1197
- Returns:
1198
- Dictionary with base64-encoded PNG images
1199
- """
1200
- visualizations = {}
1201
-
1202
- # Each visualization wrapped in try-except for isolation
1203
- try:
1204
- fig, ax = plt.subplots(figsize=(12, 6))
1205
- # ... chart generation code ...
1206
- visualizations['chart_name'] = fig_to_base64(fig)
1207
- plt.close(fig) # Prevent memory leaks
1208
- except Exception as e:
1209
- print(f"Error generating chart: {e}")
1210
-
1211
- return visualizations
1212
- ```
1213
-
1214
- ### Base64 Encoding
1215
-
1216
- ```python
1217
- def fig_to_base64(fig) -> str:
1218
- """Convert matplotlib figure to base64 data URI."""
1219
- buffer = BytesIO()
1220
- fig.savefig(buffer, format='png', dpi=100, bbox_inches='tight')
1221
- buffer.seek(0)
1222
- image_base64 = base64.b64encode(buffer.read()).decode()
1223
- buffer.close()
1224
- return f"data:image/png;base64,{image_base64}"
1225
- ```
1226
-
1227
- ### Usage in HTML
1228
-
1229
- ```html
1230
- <img src="{{ visualizations.class_distribution }}" alt="Class Distribution">
1231
- ```
1232
-
1233
- ---
1234
-
1235
- ## πŸ”§ Services Layer
1236
-
1237
- ### services/detection.py
1238
-
1239
- Core detection logic and result processing.
1240
-
1241
- #### `process_detections(result, score_thr=0.3)`
1242
-
1243
- Converts raw model output to structured format.
1244
-
1245
- **Input:** Raw MMDetection result (list of arrays per class)
1246
-
1247
- **Output:** List of detection dictionaries
1248
-
1249
- ```python
1250
- [
1251
- {
1252
- "class_id": 0,
1253
- "class_name": "paragraph",
1254
- "bbox": {
1255
- "x1": 100.5, "y1": 200.3,
1256
- "x2": 500.8, "y2": 350.2,
1257
- "width": 400.3, "height": 149.9,
1258
- "center_x": 300.65, "center_y": 275.25
1259
- },
1260
- "confidence": 0.9234,
1261
- "area": 60005.0,
1262
- "aspect_ratio": 2.67
1263
- },
1264
- // ... more detections
1265
- ]
1266
- ```
1267
-
1268
- **Processing Steps:**
1269
- 1. Iterate through class results
1270
- 2. Filter by confidence threshold
1271
- 3. Extract coordinates and calculate derived values
1272
- 4. Sort by confidence (descending)
1273
-
1274
- ---
1275
-
1276
- ### services/processing.py
1277
-
1278
- Result aggregation and persistence.
1279
-
1280
- #### `aggregate_results(...)`
1281
-
1282
- Assembles the complete response object.
1283
-
1284
- ```python
1285
- def aggregate_results(
1286
- detections: List[dict],
1287
- core_metrics: dict,
1288
- rodla_metrics: dict,
1289
- spatial_metrics: dict,
1290
- class_metrics: dict,
1291
- confidence_metrics: dict,
1292
- robustness_indicators: dict,
1293
- layout_complexity: dict,
1294
- quality_metrics: dict,
1295
- visualizations: dict,
1296
- interpretation: dict,
1297
- file_info: dict,
1298
- config: dict
1299
- ) -> dict:
1300
- """Combine all analysis results into final response."""
1301
- return {
1302
- "success": True,
1303
- "timestamp": datetime.now().isoformat(),
1304
- # ... all components ...
1305
- }
1306
- ```
1307
-
1308
- #### `save_results(results, filename, output_dir)`
1309
-
1310
- Persists results to disk.
1311
-
1312
- ```python
1313
- def save_results(results: dict, filename: str, output_dir: Path) -> Path:
1314
- """
1315
- Save results as JSON file.
1316
-
1317
- - Removes visualizations to reduce file size
1318
- - Converts numpy types to Python native
1319
- - Saves visualizations as separate PNG files
1320
- """
1321
- json_path = output_dir / f"rodla_results_{filename}.json"
1322
- # ... save logic ...
1323
- return json_path
1324
- ```
1325
-
1326
- ---
1327
-
1328
- ### services/interpretation.py
1329
-
1330
- Human-readable insight generation.
1331
-
1332
- #### `generate_comprehensive_interpretation(...)`
1333
-
1334
- Creates natural language analysis of results.
1335
-
1336
- **Output Sections:**
1337
-
1338
- | Section | Description |
1339
- |---------|-------------|
1340
- | `overview` | High-level summary paragraph |
1341
- | `top_elements` | Description of most common elements |
1342
- | `rodla_analysis` | Robustness assessment summary |
1343
- | `layout_complexity` | Complexity analysis text |
1344
- | `key_findings` | List of important observations |
1345
- | `perturbation_assessment` | Perturbation effect analysis |
1346
- | `recommendations` | Actionable suggestions |
1347
- | `confidence_summary` | Confidence level summary |
1348
-
1349
- **Example Output:**
1350
-
1351
- ```python
1352
- {
1353
- "overview": """Document Analysis Summary:
1354
- Detected 47 layout elements across 12 different classes.
1355
- The model achieved an average confidence of 78.2%, indicating
1356
- medium certainty in predictions. The detected elements cover
1357
- 68.5% of the document area.""",
1358
-
1359
- "key_findings": [
1360
- "βœ“ Excellent detection confidence - model is highly certain",
1361
- "βœ“ High document coverage - most of the page contains elements",
1362
- "β„Ή Complex document structure with diverse element types"
1363
- ],
1364
-
1365
- "recommendations": [
1366
- "No specific recommendations - detection quality is good"
1367
- ]
1368
- }
1369
- ```
1370
-
1371
- ---
1372
-
1373
- ## πŸ› οΈ Utilities Reference
1374
-
1375
- ### utils/helpers.py
1376
-
1377
- General-purpose helper functions.
1378
-
1379
- #### Mathematical Functions
1380
-
1381
- | Function | Purpose | Formula |
1382
- |----------|---------|---------|
1383
- | `calculate_skewness(data)` | Distribution asymmetry | `mean(((x - ΞΌ) / Οƒ)Β³)` |
1384
- | `calculate_entropy(values)` | Information content | `-Ξ£(p Γ— logβ‚‚(p))` |
1385
- | `calculate_avg_nn_distance(xs, ys)` | Average nearest neighbor | Mean of min distances |
1386
- | `calculate_clustering_score(xs, ys)` | Spatial clustering | `1 - (std / mean)` |
1387
- | `calculate_iou(bbox1, bbox2)` | Intersection over Union | `intersection / union` |
1388
-
1389
- #### Utility Functions
1390
-
1391
- ```python
1392
- def calculate_detection_overlaps(detections: List[dict]) -> dict:
1393
- """
1394
- Find all overlapping detection pairs.
1395
-
1396
- Returns:
1397
- {
1398
- 'count': int, # Number of overlapping pairs
1399
- 'percentage': float, # % of detections with overlaps
1400
- 'avg_iou': float # Mean IoU of overlaps
1401
- }
1402
- """
1403
- ```
1404
-
1405
- ---
1406
-
1407
- ### utils/serialization.py
1408
-
1409
- JSON conversion utilities.
1410
-
1411
- #### `convert_to_json_serializable(obj)`
1412
-
1413
- Recursively converts numpy types to Python native types.
1414
-
1415
- **Conversions:**
1416
- | NumPy Type | Python Type |
1417
- |------------|-------------|
1418
- | `np.integer` | `int` |
1419
- | `np.floating` | `float` |
1420
- | `np.ndarray` | `list` |
1421
- | `np.bool_` | `bool` |
1422
-
1423
- ```python
1424
- def convert_to_json_serializable(obj):
1425
- """
1426
- Recursively convert numpy types for JSON serialization.
1427
-
1428
- Handles:
1429
- - Dictionaries (recursive)
1430
- - Lists (recursive)
1431
- - NumPy scalars and arrays
1432
- - Native Python types (pass-through)
1433
- """
1434
- if isinstance(obj, dict):
1435
- return {k: convert_to_json_serializable(v) for k, v in obj.items()}
1436
- elif isinstance(obj, list):
1437
- return [convert_to_json_serializable(item) for item in obj]
1438
- elif isinstance(obj, np.integer):
1439
- return int(obj)
1440
- elif isinstance(obj, np.floating):
1441
- return float(obj)
1442
- elif isinstance(obj, np.ndarray):
1443
- return obj.tolist()
1444
- elif isinstance(obj, np.bool_):
1445
- return bool(obj)
1446
- return obj
1447
- ```
1448
-
1449
- ---
1450
-
1451
- ## ⚠️ Error Handling
1452
-
1453
- ### Exception Hierarchy
1454
-
1455
- ```
1456
- Exception
1457
- β”œβ”€β”€ HTTPException (FastAPI)
1458
- β”‚ β”œβ”€β”€ 400 Bad Request
1459
- β”‚ β”‚ └── Invalid file type
1460
- β”‚ └── 500 Internal Server Error
1461
- β”‚ β”œβ”€β”€ Model not loaded
1462
- β”‚ β”œβ”€β”€ Inference failed
1463
- β”‚ └── Processing error
1464
- └── Standard Exceptions
1465
- β”œβ”€β”€ FileNotFoundError
1466
- β”œβ”€β”€ ValueError
1467
- └── RuntimeError
1468
- ```
1469
-
1470
- ### Error Handling Strategy
1471
-
1472
- ```python
1473
- @app.post("/api/detect")
1474
- async def detect_objects(...):
1475
- tmp_path = None
1476
-
1477
- try:
1478
- # Main processing logic
1479
- ...
1480
-
1481
- except HTTPException:
1482
- # Re-raise HTTP exceptions unchanged
1483
- if tmp_path and os.path.exists(tmp_path):
1484
- os.unlink(tmp_path)
1485
- raise
1486
-
1487
- except Exception as e:
1488
- # Handle unexpected errors
1489
- if tmp_path and os.path.exists(tmp_path):
1490
- os.unlink(tmp_path)
1491
-
1492
- # Log full traceback
1493
- import traceback
1494
- traceback.print_exc()
1495
-
1496
- # Return structured error response
1497
- return JSONResponse(
1498
- {"success": False, "error": str(e)},
1499
- status_code=500
1500
- )
1501
- ```
1502
-
1503
- ### Visualization Error Isolation
1504
-
1505
- Each visualization is wrapped individually to prevent cascade failures:
1506
-
1507
- ```python
1508
- for viz_name, viz_func in visualization_functions.items():
1509
- try:
1510
- visualizations[viz_name] = viz_func()
1511
- except Exception as e:
1512
- print(f"Error generating {viz_name}: {e}")
1513
- visualizations[viz_name] = None
1514
- ```
1515
-
1516
- ### Resource Cleanup
1517
-
1518
- Temporary files are always cleaned up:
1519
-
1520
- ```python
1521
- finally:
1522
- if tmp_path and os.path.exists(tmp_path):
1523
- os.unlink(tmp_path)
1524
- ```
1525
-
1526
- ---
1527
-
1528
- ## ⚑ Performance Optimization
1529
-
1530
- ### GPU Memory Management
1531
-
1532
- ```python
1533
- # At startup - clear GPU cache
1534
- if torch.cuda.is_available():
1535
- torch.cuda.empty_cache()
1536
- gc.collect()
1537
-
1538
- # Monitor memory usage
1539
- print(f"GPU Memory: {torch.cuda.memory_allocated(0) / 1024**3:.2f} GB")
1540
- ```
1541
-
1542
- ### Memory-Efficient Visualizations
1543
-
1544
- ```python
1545
- # Always close figures after encoding
1546
- fig, ax = plt.subplots()
1547
- # ... generate chart ...
1548
- base64_str = fig_to_base64(fig)
1549
- plt.close(fig) # IMPORTANT: Prevents memory leaks
1550
- ```
1551
-
1552
- ### Response Size Optimization
1553
-
1554
- ```python
1555
- # Remove large base64 images from saved JSON
1556
- json_results = {k: v for k, v in results.items() if k != "visualizations"}
1557
-
1558
- # Save visualizations as separate files
1559
- for viz_name, viz_data in visualizations.items():
1560
- save_visualization(viz_data, f"{filename}_{viz_name}.png")
1561
- ```
1562
-
1563
- ### Lazy Model Loading
1564
-
1565
- ```python
1566
- # Model loaded once at startup, reused for all requests
1567
- @app.on_event("startup")
1568
- async def startup_event():
1569
- global model
1570
- model = init_detector(config, weights, device)
1571
- ```
1572
-
1573
- ### Performance Benchmarks
1574
-
1575
- | Operation | Time (GPU) | Time (CPU) |
1576
- |-----------|------------|------------|
1577
- | Model loading | 10-15s | 20-30s |
1578
- | Single inference | 0.3-0.5s | 2-5s |
1579
- | Metrics calculation | 0.1-0.2s | 0.1-0.2s |
1580
- | Visualization generation | 1-2s | 1-2s |
1581
- | **Total per request** | **1.5-3s** | **4-8s** |
1582
-
1583
- ---
1584
-
1585
- ## πŸ”’ Security Considerations
1586
-
1587
- ### Current Security Status
1588
-
1589
- | Aspect | Status | Risk | Recommendation |
1590
- |--------|--------|------|----------------|
1591
- | Authentication | ❌ None | High | Add API key auth |
1592
- | CORS | ⚠️ Permissive | Medium | Restrict origins |
1593
- | Rate Limiting | ❌ None | Medium | Add throttling |
1594
- | Input Validation | ⚠️ Basic | Low | Add size limits |
1595
- | Path Handling | ⚠️ Hardcoded | Low | Use env vars |
1596
-
1597
- ### Recommended Security Enhancements
1598
-
1599
- #### API Key Authentication
1600
-
1601
- ```python
1602
- from fastapi import Security
1603
- from fastapi.security.api_key import APIKeyHeader
1604
-
1605
- API_KEY = os.environ.get("RODLA_API_KEY")
1606
- api_key_header = APIKeyHeader(name="X-API-Key")
1607
-
1608
- async def verify_api_key(api_key: str = Security(api_key_header)):
1609
- if api_key != API_KEY:
1610
- raise HTTPException(403, "Invalid API key")
1611
- return api_key
1612
-
1613
- @app.post("/api/detect")
1614
- async def detect_objects(
1615
- ...,
1616
- api_key: str = Depends(verify_api_key)
1617
- ):
1618
- ...
1619
- ```
1620
-
1621
- #### Rate Limiting
1622
-
1623
- ```python
1624
- from slowapi import Limiter
1625
- from slowapi.util import get_remote_address
1626
-
1627
- limiter = Limiter(key_func=get_remote_address)
1628
- app.state.limiter = limiter
1629
-
1630
- @app.post("/api/detect")
1631
- @limiter.limit("10/minute")
1632
- async def detect_objects(...):
1633
- ...
1634
- ```
1635
-
1636
- #### File Size Limits
1637
-
1638
- ```python
1639
- MAX_FILE_SIZE = 10 * 1024 * 1024 # 10MB
1640
-
1641
- @app.post("/api/detect")
1642
- async def detect_objects(file: UploadFile = File(...)):
1643
- content = await file.read()
1644
- if len(content) > MAX_FILE_SIZE:
1645
- raise HTTPException(413, "File too large")
1646
- ...
1647
- ```
1648
-
1649
- #### Restricted CORS
1650
-
1651
- ```python
1652
- app.add_middleware(
1653
- CORSMiddleware,
1654
- allow_origins=["https://yourdomain.com"],
1655
- allow_methods=["GET", "POST"],
1656
- allow_headers=["X-API-Key", "Content-Type"],
1657
- )
1658
- ```
1659
-
1660
- ---
1661
-
1662
- ## πŸ§ͺ Testing
1663
-
1664
- ### Test Structure
1665
-
1666
- ```
1667
- tests/
1668
- β”œβ”€β”€ __init__.py
1669
- β”œβ”€β”€ conftest.py # Pytest fixtures
1670
- β”œβ”€β”€ test_api/
1671
- β”‚ β”œβ”€β”€ test_routes.py # Endpoint tests
1672
- β”‚ └── test_schemas.py # Pydantic model tests
1673
- β”œβ”€β”€ test_services/
1674
- β”‚ β”œβ”€β”€ test_detection.py # Detection logic tests
1675
- β”‚ β”œβ”€β”€ test_processing.py # Processing tests
1676
- β”‚ └── test_visualization.py # Chart generation tests
1677
- β”œβ”€β”€ test_utils/
1678
- β”‚ β”œβ”€β”€ test_helpers.py # Helper function tests
1679
- β”‚ β”œβ”€β”€ test_metrics.py # Metrics calculation tests
1680
- β”‚ └── test_serialization.py # Serialization tests
1681
- └── test_integration/
1682
- └── test_full_pipeline.py # End-to-end tests
1683
- ```
1684
-
1685
- ### Running Tests
1686
-
1687
- ```bash
1688
- # Run all tests
1689
- pytest
1690
-
1691
- # Run with coverage
1692
- pytest --cov=. --cov-report=html
1693
-
1694
- # Run specific test file
1695
- pytest tests/test_utils/test_metrics.py
1696
-
1697
- # Run with verbose output
1698
- pytest -v
1699
-
1700
- # Run only fast tests (no model loading)
1701
- pytest -m "not slow"
1702
- ```
1703
-
1704
- ### Example Test Cases
1705
-
1706
- ```python
1707
- # tests/test_utils/test_helpers.py
1708
-
1709
- import pytest
1710
- import numpy as np
1711
- from utils.helpers import calculate_iou, calculate_skewness
1712
-
1713
- class TestCalculateIoU:
1714
- def test_complete_overlap(self):
1715
- bbox1 = {'x1': 0, 'y1': 0, 'x2': 100, 'y2': 100, 'width': 100, 'height': 100}
1716
- bbox2 = {'x1': 0, 'y1': 0, 'x2': 100, 'y2': 100, 'width': 100, 'height': 100}
1717
- assert calculate_iou(bbox1, bbox2) == 1.0
1718
-
1719
- def test_no_overlap(self):
1720
- bbox1 = {'x1': 0, 'y1': 0, 'x2': 50, 'y2': 50, 'width': 50, 'height': 50}
1721
- bbox2 = {'x1': 100, 'y1': 100, 'x2': 150, 'y2': 150, 'width': 50, 'height': 50}
1722
- assert calculate_iou(bbox1, bbox2) == 0.0
1723
-
1724
- def test_partial_overlap(self):
1725
- bbox1 = {'x1': 0, 'y1': 0, 'x2': 100, 'y2': 100, 'width': 100, 'height': 100}
1726
- bbox2 = {'x1': 50, 'y1': 50, 'x2': 150, 'y2': 150, 'width': 100, 'height': 100}
1727
- iou = calculate_iou(bbox1, bbox2)
1728
- assert 0 < iou < 1
1729
-
1730
- class TestCalculateSkewness:
1731
- def test_symmetric_distribution(self):
1732
- data = [1, 2, 3, 4, 5]
1733
- skew = calculate_skewness(data)
1734
- assert abs(skew) < 0.1 # Nearly symmetric
1735
-
1736
- def test_right_skewed(self):
1737
- data = [1, 1, 1, 1, 10]
1738
- skew = calculate_skewness(data)
1739
- assert skew > 0 # Positive skew
1740
- ```
1741
-
1742
- ### Mocking the Model
1743
-
1744
- ```python
1745
- # tests/conftest.py
1746
-
1747
- import pytest
1748
- from unittest.mock import Mock, patch
1749
-
1750
- @pytest.fixture
1751
- def mock_model():
1752
- """Create a mock detection model."""
1753
- model = Mock()
1754
- model.CLASSES = ['paragraph', 'title', 'figure', 'table']
1755
- return model
1756
-
1757
- @pytest.fixture
1758
- def mock_detections():
1759
- """Sample detection results."""
1760
- return [
1761
- {
1762
- 'class_id': 0,
1763
- 'class_name': 'paragraph',
1764
- 'bbox': {'x1': 100, 'y1': 100, 'x2': 500, 'y2': 300,
1765
- 'width': 400, 'height': 200, 'center_x': 300, 'center_y': 200},
1766
- 'confidence': 0.95,
1767
- 'area': 80000,
1768
- 'aspect_ratio': 2.0
1769
- }
1770
- ]
1771
- ```
1772
-
1773
- ---
1774
-
1775
- ## 🚒 Deployment
1776
-
1777
- ### Development Server
1778
-
1779
- ```bash
1780
- python backend.py
1781
- # or
1782
- uvicorn backend:app --reload --host 0.0.0.0 --port 8000
1783
- ```
1784
-
1785
- ### Production with Gunicorn
1786
-
1787
- ```bash
1788
- gunicorn backend:app -w 1 -k uvicorn.workers.UvicornWorker \
1789
- --bind 0.0.0.0:8000 \
1790
- --timeout 120 \
1791
- --keep-alive 5
1792
- ```
1793
-
1794
- **Note:** Use `workers=1` for GPU models to avoid memory issues.
1795
-
1796
- ### Docker Deployment
1797
-
1798
- ```dockerfile
1799
- # Dockerfile
1800
- FROM nvidia/cuda:11.8-cudnn8-runtime-ubuntu22.04
1801
-
1802
- # Install Python
1803
- RUN apt-get update && apt-get install -y python3.9 python3-pip
1804
-
1805
- # Set working directory
1806
- WORKDIR /app
1807
-
1808
- # Copy requirements first for caching
1809
- COPY requirements.txt .
1810
- RUN pip install --no-cache-dir -r requirements.txt
1811
-
1812
- # Copy application code
1813
- COPY . .
1814
-
1815
- # Create output directory
1816
- RUN mkdir -p outputs
1817
-
1818
- # Expose port
1819
- EXPOSE 8000
1820
-
1821
- # Run application
1822
- CMD ["uvicorn", "backend:app", "--host", "0.0.0.0", "--port", "8000"]
1823
- ```
1824
-
1825
- ```yaml
1826
- # docker-compose.yml
1827
- version: '3.8'
1828
-
1829
- services:
1830
- rodla-api:
1831
- build: .
1832
- ports:
1833
- - "8000:8000"
1834
- volumes:
1835
- - ./outputs:/app/outputs
1836
- - ./weights:/app/weights
1837
- deploy:
1838
- resources:
1839
- reservations:
1840
- devices:
1841
- - driver: nvidia
1842
- count: 1
1843
- capabilities: [gpu]
1844
- environment:
1845
- - RODLA_API_KEY=${RODLA_API_KEY}
1846
- restart: unless-stopped
1847
- ```
1848
-
1849
- ### Kubernetes Deployment
1850
-
1851
- ```yaml
1852
- # k8s/deployment.yaml
1853
- apiVersion: apps/v1
1854
- kind: Deployment
1855
- metadata:
1856
- name: rodla-api
1857
- spec:
1858
- replicas: 1
1859
- selector:
1860
- matchLabels:
1861
- app: rodla-api
1862
- template:
1863
- metadata:
1864
- labels:
1865
- app: rodla-api
1866
- spec:
1867
- containers:
1868
- - name: rodla-api
1869
- image: your-registry/rodla-api:latest
1870
- ports:
1871
- - containerPort: 8000
1872
- resources:
1873
- limits:
1874
- nvidia.com/gpu: 1
1875
- memory: "16Gi"
1876
- requests:
1877
- memory: "8Gi"
1878
- volumeMounts:
1879
- - name: outputs
1880
- mountPath: /app/outputs
1881
- volumes:
1882
- - name: outputs
1883
- persistentVolumeClaim:
1884
- claimName: rodla-outputs-pvc
1885
- ```
1886
-
1887
- ### Nginx Reverse Proxy
1888
-
1889
- ```nginx
1890
- # /etc/nginx/sites-available/rodla-api
1891
- upstream rodla_backend {
1892
- server 127.0.0.1:8000;
1893
- }
1894
-
1895
- server {
1896
- listen 80;
1897
- server_name api.yourdomain.com;
1898
-
1899
- client_max_body_size 50M;
1900
-
1901
- location / {
1902
- proxy_pass http://rodla_backend;
1903
- proxy_set_header Host $host;
1904
- proxy_set_header X-Real-IP $remote_addr;
1905
- proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
1906
- proxy_read_timeout 120s;
1907
- }
1908
- }
1909
- ```
1910
-
1911
- ---
1912
-
1913
- ## πŸ”§ Troubleshooting
1914
-
1915
- ### Common Issues
1916
-
1917
- #### Model Loading Failures
1918
-
1919
- **Symptom:** `RuntimeError: CUDA out of memory`
1920
-
1921
- **Solutions:**
1922
- ```bash
1923
- # Clear GPU memory before starting
1924
- nvidia-smi --gpu-reset
1925
-
1926
- # Or in Python
1927
- import torch
1928
- torch.cuda.empty_cache()
1929
-
1930
- # Check available GPU memory
1931
- nvidia-smi
1932
- ```
1933
-
1934
- **Symptom:** `ModuleNotFoundError: No module named 'mmdet'`
1935
-
1936
- **Solution:**
1937
- ```bash
1938
- pip install -U openmim
1939
- mim install mmengine mmcv mmdet
1940
- ```
1941
-
1942
- **Symptom:** `FileNotFoundError: Config file not found`
1943
-
1944
- **Solution:**
1945
- ```python
1946
- # Check paths in config/settings.py
1947
- from pathlib import Path
1948
- print(Path(MODEL_CONFIG).exists()) # Should be True
1949
- print(Path(MODEL_WEIGHTS).exists()) # Should be True
1950
- ```
1951
-
1952
- ---
1953
-
1954
- #### Inference Errors
1955
-
1956
- **Symptom:** `RuntimeError: Input type and weight type should be the same`
1957
-
1958
- **Solution:**
1959
- ```python
1960
- # Ensure model and input are on same device
1961
- model = model.to('cuda')
1962
- # or
1963
- model = model.to('cpu')
1964
- ```
1965
-
1966
- **Symptom:** `ValueError: could not broadcast input array`
1967
-
1968
- **Solution:**
1969
- ```python
1970
- # Check image dimensions
1971
- from PIL import Image
1972
- img = Image.open(image_path)
1973
- print(f"Image size: {img.size}") # Should be reasonable dimensions
1974
- ```
1975
-
1976
- ---
1977
-
1978
- #### Visualization Errors
1979
-
1980
- **Symptom:** `RuntimeError: main thread is not in main loop`
1981
-
1982
- **Solution:**
1983
- ```python
1984
- # Set matplotlib backend before importing pyplot
1985
- import matplotlib
1986
- matplotlib.use('Agg') # Non-interactive backend
1987
- import matplotlib.pyplot as plt
1988
- ```
1989
-
1990
- **Symptom:** Memory usage grows with each request
1991
-
1992
- **Solution:**
1993
- ```python
1994
- # Always close figures after use
1995
- fig, ax = plt.subplots()
1996
- # ... plotting code ...
1997
- plt.savefig(buffer, format='png')
1998
- plt.close(fig) # CRITICAL: Prevents memory leak
1999
- plt.close('all') # Nuclear option if needed
2000
- ```
2001
-
2002
- ---
2003
-
2004
- #### API Errors
2005
-
2006
- **Symptom:** `422 Unprocessable Entity`
2007
-
2008
- **Cause:** Invalid request format
2009
-
2010
- **Solution:**
2011
- ```bash
2012
- # Correct multipart form data format
2013
- curl -X POST "http://localhost:8000/api/detect" \
2014
- -H "accept: application/json" \
2015
- -F "file=@image.jpg;type=image/jpeg" \
2016
- -F "score_thr=0.3"
2017
- ```
2018
-
2019
- **Symptom:** `413 Request Entity Too Large`
2020
-
2021
- **Solution:**
2022
- ```python
2023
- # Increase upload limit in FastAPI
2024
- from fastapi import FastAPI, File, UploadFile
2025
-
2026
- app = FastAPI()
2027
-
2028
- # Or configure in nginx
2029
- # client_max_body_size 50M;
2030
- ```
2031
-
2032
- ---
2033
-
2034
- ### Debugging Tips
2035
-
2036
- #### Enable Debug Logging
2037
-
2038
- ```python
2039
- import logging
2040
-
2041
- logging.basicConfig(level=logging.DEBUG)
2042
- logger = logging.getLogger(__name__)
2043
-
2044
- # In your code
2045
- logger.debug(f"Processing image: {filename}")
2046
- logger.debug(f"Detections found: {len(detections)}")
2047
- ```
2048
-
2049
- #### GPU Monitoring
2050
-
2051
- ```bash
2052
- # Real-time GPU monitoring
2053
- watch -n 1 nvidia-smi
2054
-
2055
- # Or use gpustat
2056
- pip install gpustat
2057
- gpustat -i 1
2058
- ```
2059
-
2060
- #### Memory Profiling
2061
-
2062
- ```python
2063
- # Install memory profiler
2064
- pip install memory_profiler
2065
-
2066
- # Use decorator
2067
- from memory_profiler import profile
2068
-
2069
- @profile
2070
- def detect_objects(...):
2071
- ...
2072
- ```
2073
-
2074
- #### Request Timing
2075
-
2076
- ```python
2077
- import time
2078
-
2079
- @app.post("/api/detect")
2080
- async def detect_objects(...):
2081
- start_time = time.time()
2082
-
2083
- # ... processing ...
2084
-
2085
- elapsed = time.time() - start_time
2086
- logger.info(f"Request completed in {elapsed:.2f}s")
2087
- ```
2088
-
2089
- ---
2090
-
2091
- ### Health Checks
2092
-
2093
- ```python
2094
- # Add health check endpoint
2095
- @app.get("/health")
2096
- async def health_check():
2097
- return {
2098
- "status": "healthy",
2099
- "model_loaded": model is not None,
2100
- "gpu_available": torch.cuda.is_available(),
2101
- "gpu_memory_used": f"{torch.cuda.memory_allocated(0) / 1024**3:.2f} GB"
2102
- if torch.cuda.is_available() else "N/A"
2103
- }
2104
- ```
2105
-
2106
- ---
2107
-
2108
- ## 🀝 Contributing
2109
-
2110
- ### Getting Started
2111
-
2112
- 1. Fork the repository
2113
- 2. Create a feature branch: `git checkout -b feature/amazing-feature`
2114
- 3. Make your changes
2115
- 4. Run tests: `pytest`
2116
- 5. Commit: `git commit -m 'Add amazing feature'`
2117
- 6. Push: `git push origin feature/amazing-feature`
2118
- 7. Open a Pull Request
2119
-
2120
- ### Code Style
2121
-
2122
- ```bash
2123
- # Install development dependencies
2124
- pip install black isort flake8 mypy
2125
-
2126
- # Format code
2127
- black .
2128
- isort .
2129
-
2130
- # Check style
2131
- flake8 .
2132
-
2133
- # Type checking
2134
- mypy .
2135
- ```
2136
-
2137
- ### Pre-commit Hooks
2138
-
2139
- ```yaml
2140
- # .pre-commit-config.yaml
2141
- repos:
2142
- - repo: https://github.com/psf/black
2143
- rev: 23.7.0
2144
- hooks:
2145
- - id: black
2146
- - repo: https://github.com/pycqa/isort
2147
- rev: 5.12.0
2148
- hooks:
2149
- - id: isort
2150
- - repo: https://github.com/pycqa/flake8
2151
- rev: 6.1.0
2152
- hooks:
2153
- - id: flake8
2154
- ```
2155
-
2156
- ```bash
2157
- pip install pre-commit
2158
- pre-commit install
2159
- ```
2160
-
2161
- ### Adding New Metrics
2162
-
2163
- 1. Create function in appropriate module under `utils/metrics/`
2164
- 2. Export from `utils/metrics/__init__.py`
2165
- 3. Call from `services/processing.py`
2166
- 4. Add to response schema in `api/schemas.py`
2167
- 5. Document in this README
2168
- 6. Add tests in `tests/test_utils/test_metrics.py`
2169
-
2170
- ### Adding New Visualizations
2171
-
2172
- 1. Add function in `services/visualization.py`
2173
- 2. Call from `generate_comprehensive_visualizations()`
2174
- 3. Handle errors with try-except
2175
- 4. Always close figures with `plt.close(fig)`
2176
- 5. Document chart type in this README
2177
-
2178
- ---
2179
-
2180
- ## πŸ“š Citation
2181
-
2182
- If you use this API or the RoDLA model in your research, please cite:
2183
-
2184
- ```bibtex
2185
- @inproceedings{rodla2024cvpr,
2186
- title={RoDLA: Benchmarking the Robustness of Document Layout Analysis Models},
2187
- author={Author Names},
2188
- booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision
2189
- and Pattern Recognition (CVPR)},
2190
- year={2024}
2191
- }
2192
- ```
2193
-
2194
- ### Related Publications
2195
-
2196
- ```bibtex
2197
- @article{internimage2023,
2198
- title={InternImage: Exploring Large-Scale Vision Foundation Models
2199
- with Deformable Convolutions},
2200
- author={Wang et al.},
2201
- journal={CVPR},
2202
- year={2023}
2203
- }
2204
-
2205
- @article{dino2022,
2206
- title={DINO: DETR with Improved DeNoising Anchor Boxes
2207
- for End-to-End Object Detection},
2208
- author={Zhang et al.},
2209
- journal={ICLR},
2210
- year={2023}
2211
- }
2212
- ```
2213
-
2214
- ---
2215
-
2216
- ## πŸ“„ License
2217
-
2218
- This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
2219
-
2220
- ```
2221
- MIT License
2222
-
2223
- Copyright (c) 2024 [Your Name]
2224
-
2225
- Permission is hereby granted, free of charge, to any person obtaining a copy
2226
- of this software and associated documentation files (the "Software"), to deal
2227
- in the Software without restriction, including without limitation the rights
2228
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
2229
- copies of the Software, and to permit persons to whom the Software is
2230
- furnished to do so, subject to the following conditions:
2231
-
2232
- The above copyright notice and this permission notice shall be included in all
2233
- copies or substantial portions of the Software.
2234
-
2235
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
2236
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
2237
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
2238
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
2239
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
2240
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
2241
- SOFTWARE.
2242
- ```
2243
-
2244
- ---
2245
-
2246
- ## πŸ“ž Support
2247
-
2248
- ### Getting Help
2249
-
2250
- - **Documentation:** This README
2251
- - **Issues:** [GitHub Issues](https://github.com/yourusername/rodla-api/issues)
2252
- - **Discussions:** [GitHub Discussions](https://github.com/yourusername/rodla-api/discussions)
2253
-
2254
- ### Reporting Bugs
2255
-
2256
- When reporting bugs, please include:
2257
-
2258
- 1. Operating system and version
2259
- 2. Python version
2260
- 3. GPU model and driver version
2261
- 4. Complete error traceback
2262
- 5. Minimal reproducible example
2263
- 6. Input image (if possible)
2264
-
2265
- ### Feature Requests
2266
-
2267
- We welcome feature requests! Please:
2268
-
2269
- 1. Check existing issues first
2270
- 2. Describe the use case
2271
- 3. Explain expected behavior
2272
- 4. Provide examples if possible
2273
-
2274
- ---
2275
-
2276
- ## πŸ™ Acknowledgments
2277
-
2278
- - **RoDLA Authors** - For the original model and research
2279
- - **MMDetection Team** - For the detection framework
2280
- - **InternImage Team** - For the backbone architecture
2281
- - **FastAPI** - For the excellent web framework
2282
- - **Open Source Community** - For countless contributions
2283
-
2284
- ---
2285
-
2286
- <div align="center">
2287
-
2288
- **Built with ❀️ for Document Analysis**
2289
-
2290
- [⬆ Back to Top](#rodla-document-layout-analysis-api)
2291
-
2292
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
deployment/backend/README_Version_TWO.md DELETED
The diff for this file is too large to render. See raw diff
 
deployment/backend/README_Version_Three.md DELETED
The diff for this file is too large to render. See raw diff
 
deployment/backend/backend.py CHANGED
@@ -1,98 +1,666 @@
1
  """
2
- RoDLA Object Detection API - Refactored Main Backend
3
- Clean separation of concerns with modular components
4
- Now with Perturbation Support!
5
  """
6
- from fastapi import FastAPI
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  from fastapi.middleware.cors import CORSMiddleware
 
8
  import uvicorn
9
- from pathlib import Path
10
 
11
- # Import configuration
12
- from config.settings import (
13
- API_TITLE, API_HOST, API_PORT,
14
- CORS_ORIGINS, CORS_METHODS, CORS_HEADERS,
15
- OUTPUT_DIR, PERTURBATION_OUTPUT_DIR # NEW
16
- )
17
 
18
- # Import core functionality
19
- from core.model_loader import load_model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
- # Import API routes
22
- from api.routes import router
23
 
24
- # Initialize FastAPI app
25
- app = FastAPI(
26
- title=API_TITLE,
27
- description="RoDLA Document Layout Analysis API with comprehensive metrics and perturbation testing",
28
- version="2.1.0" # Bumped version for perturbation feature
29
- )
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  # Add CORS middleware
32
  app.add_middleware(
33
  CORSMiddleware,
34
- allow_origins=CORS_ORIGINS,
35
  allow_credentials=True,
36
- allow_methods=CORS_METHODS,
37
- allow_headers=CORS_HEADERS,
38
  )
39
 
40
- # Include API routes
41
- app.include_router(router)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
 
 
 
 
 
44
  @app.on_event("startup")
45
  async def startup_event():
46
- """Initialize model and create directories on startup"""
47
  try:
48
- print("="*60)
49
- print("Starting RoDLA Document Layout Analysis API")
50
- print("="*60)
51
-
52
- # Create output directories
53
- print("πŸ“ Creating output directories...")
54
- OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
55
- PERTURBATION_OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
56
- print(f" βœ“ Main output: {OUTPUT_DIR}")
57
- print(f" βœ“ Perturbations: {PERTURBATION_OUTPUT_DIR}")
58
-
59
- # Load model
60
- print("\nπŸ”§ Loading RoDLA model...")
61
  load_model()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
- print("\n" + "="*60)
64
- print("βœ… API Ready!")
65
- print("="*60)
66
- print(f"🌐 Main API: http://{API_HOST}:{API_PORT}")
67
- print(f"πŸ“š Docs: http://{API_HOST}:{API_PORT}/docs")
68
- print(f"πŸ“– ReDoc: http://{API_HOST}:{API_PORT}/redoc")
69
- print("\n🎯 Available Endpoints:")
70
- print(" β€’ GET /api/model-info - Model information")
71
- print(" β€’ POST /api/detect - Standard detection")
72
- print(" β€’ GET /api/perturbations/info - Perturbation info (NEW)")
73
- print(" β€’ POST /api/perturb - Apply perturbations (NEW)")
74
- print(" β€’ POST /api/detect-with-perturbation - Detect with perturbations (NEW)")
75
- print("="*60)
 
 
 
 
 
 
 
 
76
 
77
  except Exception as e:
78
- print(f"❌ Startup failed: {e}")
79
- import traceback
80
- traceback.print_exc()
81
- raise e
 
 
 
 
 
 
 
 
 
82
 
83
 
84
- @app.on_event("shutdown")
85
- async def shutdown_event():
86
- """Cleanup on shutdown"""
87
- print("\n" + "="*60)
88
- print("πŸ›‘ Shutting down RoDLA API...")
89
- print("="*60)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
 
 
 
 
91
 
92
  if __name__ == "__main__":
 
 
 
 
 
 
 
 
93
  uvicorn.run(
94
  app,
95
- host=API_HOST,
96
- port=API_PORT,
97
  log_level="info"
98
- )
 
1
  """
2
+ RoDLA Backend - Production Version
3
+ Uses real InternImage-XL weights and all 12 perturbation types with 3 degree levels
4
+ MMDET disabled if MMCV extensions unavailable - perturbations always functional
5
  """
6
+
7
+ import os
8
+ import sys
9
+ import json
10
+ import base64
11
+ import traceback
12
+ from pathlib import Path
13
+ from typing import Dict, List, Any, Optional, Tuple
14
+ from io import BytesIO
15
+ from datetime import datetime
16
+
17
+ import numpy as np
18
+ from PIL import Image
19
+ import cv2
20
+
21
+ from fastapi import FastAPI, File, UploadFile, HTTPException
22
  from fastapi.middleware.cors import CORSMiddleware
23
+ from pydantic import BaseModel
24
  import uvicorn
 
25
 
26
+ # ============================================================================
27
+ # Configuration
28
+ # ============================================================================
 
 
 
29
 
30
+ class Config:
31
+ """Global configuration"""
32
+ API_PORT = 8000
33
+ REPO_ROOT = Path("/home/admin/CV/rodla-academic")
34
+ MODEL_CONFIG_PATH = REPO_ROOT / "model/configs/m6doc/rodla_internimage_xl_m6doc.py"
35
+ MODEL_WEIGHTS_PATH = REPO_ROOT / "finetuning_rodla/finetuning_rodla/checkpoints/rodla_internimage_xl_publaynet.pth"
36
+ PERTURBATIONS_DIR = REPO_ROOT / "deployment/backend/perturbations"
37
+
38
+ # Automatically use GPU if available, otherwise CPU
39
+ @staticmethod
40
+ def get_device():
41
+ import torch
42
+ if torch.cuda.is_available():
43
+ return "cuda:0"
44
+ else:
45
+ return "cpu"
46
 
 
 
47
 
48
+ # ============================================================================
49
+ # Global State
50
+ # ============================================================================
51
+
52
+ app = FastAPI(title="RoDLA Production Backend", version="3.0.0")
53
+
54
+ # Detect device
55
+ import torch
56
+ DEVICE = "cuda:0" if torch.cuda.is_available() else "cpu"
57
+
58
+ model_state = {
59
+ "loaded": False,
60
+ "model": None,
61
+ "error": None,
62
+ "model_type": "RoDLA InternImage-XL (MMDET)",
63
+ "device": DEVICE,
64
+ "mmdet_available": False
65
+ }
66
 
67
  # Add CORS middleware
68
  app.add_middleware(
69
  CORSMiddleware,
70
+ allow_origins=["*"],
71
  allow_credentials=True,
72
+ allow_methods=["*"],
73
+ allow_headers=["*"],
74
  )
75
 
76
+
77
+ # ============================================================================
78
+ # M6Doc Dataset Classes
79
+ # ============================================================================
80
+
81
+ LAYOUT_CLASS_MAP = {
82
+ i: "Text" for i in range(75)
83
+ }
84
+ # Simplified mapping to layout elements
85
+ for i in range(75):
86
+ if i in [1, 2, 3, 4, 5]:
87
+ LAYOUT_CLASS_MAP[i] = "Title"
88
+ elif i in [6, 7]:
89
+ LAYOUT_CLASS_MAP[i] = "List"
90
+ elif i in [8, 9]:
91
+ LAYOUT_CLASS_MAP[i] = "Figure"
92
+ elif i in [10, 11]:
93
+ LAYOUT_CLASS_MAP[i] = "Table"
94
+ elif i in [12, 13, 14]:
95
+ LAYOUT_CLASS_MAP[i] = "Header"
96
+
97
+
98
+ # ============================================================================
99
+ # Utility Functions
100
+ # ============================================================================
101
+
102
+ def encode_image_to_base64(image: np.ndarray) -> str:
103
+ """Convert numpy array to base64 string"""
104
+ if len(image.shape) == 3 and image.shape[2] == 3:
105
+ # Ensure RGB order
106
+ if isinstance(image.flat[0], np.uint8):
107
+ image_to_encode = image
108
+ else:
109
+ image_to_encode = (image * 255).astype(np.uint8)
110
+ else:
111
+ image_to_encode = image
112
+
113
+ _, buffer = cv2.imencode('.png', image_to_encode)
114
+ return base64.b64encode(buffer).decode('utf-8')
115
+
116
+
117
+ def heuristic_detect(image_np: np.ndarray) -> List[Dict]:
118
+ """Enhanced heuristic-based detection when MMDET is unavailable
119
+ Uses multiple edge detection methods and texture analysis"""
120
+ h, w = image_np.shape[:2]
121
+ detections = []
122
+
123
+ # Convert to grayscale for analysis
124
+ gray = cv2.cvtColor(image_np, cv2.COLOR_RGB2GRAY)
125
+
126
+ # Try multiple edge detection methods for better coverage
127
+ edges1 = cv2.Canny(gray, 50, 150)
128
+ edges2 = cv2.Canny(gray, 30, 100)
129
+
130
+ # Combine edges
131
+ edges = cv2.bitwise_or(edges1, edges2)
132
+
133
+ # Apply morphological operations to connect nearby edges
134
+ kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
135
+ edges = cv2.morphologyEx(edges, cv2.MORPH_CLOSE, kernel)
136
+
137
+ # Find contours
138
+ contours, _ = cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
139
+
140
+ # Also try watershed/connected components for text detection
141
+ blur = cv2.GaussianBlur(gray, (5, 5), 0)
142
+ _, binary = cv2.threshold(blur, 127, 255, cv2.THRESH_BINARY)
143
+
144
+ # Find connected components
145
+ num_labels, labels = cv2.connectedComponents(binary)
146
+
147
+ # Process contours to create pseudo-detections
148
+ processed_boxes = set()
149
+ for contour in contours:
150
+ x, y, cw, ch = cv2.boundingRect(contour)
151
+
152
+ # Skip if too small or too large
153
+ if cw < 15 or ch < 15 or cw > w * 0.98 or ch > h * 0.98:
154
+ continue
155
+
156
+ area_ratio = (cw * ch) / (w * h)
157
+ if area_ratio < 0.0005 or area_ratio > 0.9:
158
+ continue
159
+
160
+ # Skip if box is too similar to already processed boxes
161
+ box_key = (round(x/10)*10, round(y/10)*10, round(cw/10)*10, round(ch/10)*10)
162
+ if box_key in processed_boxes:
163
+ continue
164
+ processed_boxes.add(box_key)
165
+
166
+ # Analyze content to determine class
167
+ roi = gray[y:y+ch, x:x+cw]
168
+ roi_blur = cv2.GaussianBlur(roi, (5, 5), 0)
169
+ roi_edges = cv2.Canny(roi_blur, 50, 150)
170
+ edge_density = np.sum(roi_edges > 0) / roi.size
171
+
172
+ aspect_ratio = cw / (ch + 1e-6)
173
+
174
+ # Classification logic
175
+ if aspect_ratio > 2.5 or (aspect_ratio > 2 and edge_density < 0.05):
176
+ # Wide with sparse edges = likely figure/table
177
+ class_name = "Figure"
178
+ class_id = 8
179
+ confidence = 0.6 + 0.35 * (1 - min(area_ratio / 0.5, 1.0))
180
+ elif aspect_ratio < 0.3:
181
+ # Narrow = likely list or table column
182
+ class_name = "List"
183
+ class_id = 6
184
+ confidence = 0.55 + 0.4 * (1 - min(area_ratio / 0.3, 1.0))
185
+ elif edge_density > 0.15:
186
+ # High edge density = likely table or complex content
187
+ class_name = "Table"
188
+ class_id = 10
189
+ confidence = 0.5 + 0.4 * edge_density
190
+ else:
191
+ # Default = text content
192
+ class_name = "Text"
193
+ class_id = 50
194
+ confidence = 0.5 + 0.4 * (1 - min(area_ratio / 0.3, 1.0))
195
+
196
+ # Ensure confidence in [0, 1]
197
+ confidence = min(max(confidence, 0.3), 0.95)
198
+
199
+ detections.append({
200
+ "class_id": class_id,
201
+ "class_name": class_name,
202
+ "confidence": float(confidence),
203
+ "bbox": {
204
+ "x": float(x / w),
205
+ "y": float(y / h),
206
+ "width": float(cw / w),
207
+ "height": float(ch / h)
208
+ },
209
+ "area": float(area_ratio)
210
+ })
211
+
212
+ # Sort by confidence and keep top 30
213
+ detections.sort(key=lambda x: x["confidence"], reverse=True)
214
+ return detections[:30]
215
+
216
+
217
+ # ============================================================================
218
+ # Model Loading
219
+ # ============================================================================
220
+
221
+ def load_model():
222
+ """Load the RoDLA model with actual weights"""
223
+ global model_state
224
+
225
+ print("\n" + "="*70)
226
+ print("πŸš€ Loading RoDLA InternImage-XL with Real Weights")
227
+ print("="*70)
228
+
229
+ # Verify weight file exists
230
+ if not Config.MODEL_WEIGHTS_PATH.exists():
231
+ error_msg = f"Weights not found: {Config.MODEL_WEIGHTS_PATH}"
232
+ print(f"❌ {error_msg}")
233
+ model_state["loaded"] = False
234
+ model_state["error"] = error_msg
235
+ return None
236
+
237
+ weights_size = Config.MODEL_WEIGHTS_PATH.stat().st_size / (1024**3)
238
+ print(f"βœ… Weights file: {Config.MODEL_WEIGHTS_PATH}")
239
+ print(f" Size: {weights_size:.2f}GB")
240
+
241
+ # Verify config exists
242
+ if not Config.MODEL_CONFIG_PATH.exists():
243
+ error_msg = f"Config not found: {Config.MODEL_CONFIG_PATH}"
244
+ print(f"❌ {error_msg}")
245
+ model_state["loaded"] = False
246
+ model_state["error"] = error_msg
247
+ return None
248
+
249
+ print(f"βœ… Config file: {Config.MODEL_CONFIG_PATH}")
250
+ print(f"πŸ“ Device: {model_state['device']}")
251
+
252
+ if model_state["device"] == "cpu":
253
+ print("⚠️ WARNING: DCNv3 (used in InternImage backbone) only supports CUDA")
254
+ print(" CPU inference is NOT available. Using heuristic fallback.")
255
+
256
+ # Try to import and load MMDET
257
+ try:
258
+ print("⏳ Setting up model environment...")
259
+ import torch
260
+
261
+ # Import and use DINO registration helper
262
+ from register_dino import try_load_with_dino_registration
263
+
264
+ print("⏳ Loading model from weights (this will take ~30-60 seconds)...")
265
+ print(" File: 3.8GB checkpoint...")
266
+
267
+ model = try_load_with_dino_registration(
268
+ str(Config.MODEL_CONFIG_PATH),
269
+ str(Config.MODEL_WEIGHTS_PATH),
270
+ device=model_state["device"]
271
+ )
272
+
273
+ if model is not None:
274
+ # Set model to evaluation mode
275
+ model.eval()
276
+
277
+ model_state["model"] = model
278
+ model_state["loaded"] = True
279
+ model_state["mmdet_available"] = True
280
+ model_state["error"] = None
281
+
282
+ print("βœ… RoDLA Model loaded successfully!")
283
+ print(" Model set to evaluation mode (eval())")
284
+ print(" Ready for inference with real 3.8GB weights")
285
+ print("="*70 + "\n")
286
+ return model
287
+ else:
288
+ raise Exception("Model loading returned None")
289
+
290
+ except Exception as e:
291
+ error_msg = f"Failed to load model: {str(e)}"
292
+ print(f"❌ {error_msg}")
293
+ print(f" Traceback: {traceback.format_exc()}")
294
+
295
+ model_state["loaded"] = False
296
+ model_state["mmdet_available"] = False
297
+ model_state["error"] = error_msg
298
+ print(" Backend will run in HYBRID mode:")
299
+ print(" - Detection: Enhanced heuristic-based (contour analysis)")
300
+ print(" - Perturbations: Real module with all 12 types")
301
+ print("="*70 + "\n")
302
+ return None
303
+
304
+
305
+ def run_inference(image_np: np.ndarray, threshold: float = 0.3) -> List[Dict]:
306
+ """Run detection on image (MMDET if available, else heuristic)"""
307
+
308
+ if model_state["mmdet_available"] and model_state["model"] is not None:
309
+ try:
310
+ import torch
311
+ from mmdet.apis import inference_detector
312
+
313
+ # Ensure model is in eval mode for inference
314
+ model = model_state["model"]
315
+ model.eval()
316
+
317
+ # Disable gradients for inference (saves memory and speeds up)
318
+ with torch.no_grad():
319
+ # Convert to BGR for inference
320
+ image_bgr = cv2.cvtColor(image_np, cv2.COLOR_RGB2BGR)
321
+ h, w = image_np.shape[:2]
322
+
323
+ # Run inference with loaded model
324
+ result = inference_detector(model, image_bgr)
325
+
326
+ detections = []
327
+
328
+ if result is not None:
329
+ # Handle different result formats
330
+ if hasattr(result, 'pred_instances'):
331
+ # Newer MMDET format
332
+ bboxes = result.pred_instances.bboxes.cpu().numpy()
333
+ scores = result.pred_instances.scores.cpu().numpy()
334
+ labels = result.pred_instances.labels.cpu().numpy()
335
+ elif isinstance(result, tuple) and len(result) > 0:
336
+ # Legacy format: (bbox_results, segm_results, ...)
337
+ bbox_results = result[0]
338
+ if isinstance(bbox_results, list):
339
+ # List of arrays per class
340
+ for class_id, class_bboxes in enumerate(bbox_results):
341
+ if class_bboxes.size == 0:
342
+ continue
343
+ for box in class_bboxes:
344
+ x1, y1, x2, y2, score = box
345
+ bw = x2 - x1
346
+ bh = y2 - y1
347
+
348
+ class_name = LAYOUT_CLASS_MAP.get(class_id, f"Class_{class_id}")
349
+
350
+ detections.append({
351
+ "class_id": class_id,
352
+ "class_name": class_name,
353
+ "confidence": float(score),
354
+ "bbox": {
355
+ "x": float(x1 / w),
356
+ "y": float(y1 / h),
357
+ "width": float(bw / w),
358
+ "height": float(bh / h)
359
+ },
360
+ "area": float((bw * bh) / (w * h))
361
+ })
362
+ # Skip the pred_instances path for legacy format
363
+ detections.sort(key=lambda x: x["confidence"], reverse=True)
364
+ return detections[:100]
365
+
366
+ # Handle pred_instances format
367
+ if 'bboxes' in locals():
368
+ for bbox, score, label in zip(bboxes, scores, labels):
369
+ if score < threshold:
370
+ continue
371
+
372
+ x1, y1, x2, y2 = bbox
373
+ bw = x2 - x1
374
+ bh = y2 - y1
375
+
376
+ class_id = int(label)
377
+ class_name = LAYOUT_CLASS_MAP.get(class_id, f"Class_{class_id}")
378
+
379
+ detections.append({
380
+ "class_id": class_id,
381
+ "class_name": class_name,
382
+ "confidence": float(score),
383
+ "bbox": {
384
+ "x": float(x1 / w),
385
+ "y": float(y1 / h),
386
+ "width": float(bw / w),
387
+ "height": float(bh / h)
388
+ },
389
+ "area": float((bw * bh) / (w * h))
390
+ })
391
+
392
+ # Sort by confidence and limit results
393
+ detections.sort(key=lambda x: x["confidence"], reverse=True)
394
+ return detections[:100]
395
+
396
+ except Exception as e:
397
+ print(f"⚠️ MMDET inference failed: {e}")
398
+ print(f" Error details: {traceback.format_exc()}")
399
+ # Fall back to heuristic if inference fails
400
+ return heuristic_detect(image_np)
401
+ else:
402
+ # Use heuristic detection
403
+ return heuristic_detect(image_np)
404
 
405
 
406
+ # ============================================================================
407
+ # API Routes
408
+ # ============================================================================
409
+
410
  @app.on_event("startup")
411
  async def startup_event():
412
+ """Initialize model on startup"""
413
  try:
 
 
 
 
 
 
 
 
 
 
 
 
 
414
  load_model()
415
+ except Exception as e:
416
+ print(f"⚠️ Model loading failed: {e}")
417
+ model_state["loaded"] = False
418
+
419
+
420
+ @app.get("/api/health")
421
+ async def health_check():
422
+ """Health check endpoint"""
423
+ return {
424
+ "status": "ok",
425
+ "model_loaded": model_state["loaded"],
426
+ "mmdet_available": model_state["mmdet_available"],
427
+ "detection_mode": "MMDET" if model_state["mmdet_available"] else "Heuristic",
428
+ "device": model_state["device"],
429
+ "model_type": model_state["model_type"],
430
+ "weights_path": str(Config.MODEL_WEIGHTS_PATH),
431
+ "weights_exists": Config.MODEL_WEIGHTS_PATH.exists(),
432
+ "weights_size_gb": Config.MODEL_WEIGHTS_PATH.stat().st_size / (1024**3) if Config.MODEL_WEIGHTS_PATH.exists() else 0
433
+ }
434
+
435
+
436
+ @app.get("/api/model-info")
437
+ async def model_info():
438
+ """Get model information"""
439
+ return {
440
+ "name": "RoDLA InternImage-XL",
441
+ "version": "3.0.0",
442
+ "type": "Document Layout Analysis",
443
+ "mmdet_loaded": model_state["loaded"],
444
+ "mmdet_available": model_state["mmdet_available"],
445
+ "detection_mode": "MMDET (Real Model)" if model_state["mmdet_available"] else "Heuristic (Contour-based)",
446
+ "error": model_state["error"],
447
+ "device": model_state["device"],
448
+ "framework": "MMDET + PyTorch (or Heuristic Fallback)",
449
+ "backbone": "InternImage-XL with DCNv3",
450
+ "detector": "DINO",
451
+ "dataset": "M6Doc (75 classes)",
452
+ "weights_file": str(Config.MODEL_WEIGHTS_PATH),
453
+ "config_file": str(Config.MODEL_CONFIG_PATH),
454
+ "perturbations_available": True,
455
+ "supported_perturbations": [
456
+ "defocus", "vibration", "speckle", "texture",
457
+ "watermark", "background", "ink_holdout", "ink_bleeding",
458
+ "illumination", "rotation", "keystoning", "warping"
459
+ ]
460
+ }
461
+
462
+
463
+ @app.get("/api/perturbations/info")
464
+ async def perturbation_info():
465
+ """Get information about available perturbations"""
466
+ return {
467
+ "total_perturbations": 12,
468
+ "categories": {
469
+ "blur": {
470
+ "types": ["defocus", "vibration"],
471
+ "description": "Blur effects simulating optical issues"
472
+ },
473
+ "noise": {
474
+ "types": ["speckle", "texture"],
475
+ "description": "Noise patterns and texture artifacts"
476
+ },
477
+ "content": {
478
+ "types": ["watermark", "background"],
479
+ "description": "Content additions like watermarks and backgrounds"
480
+ },
481
+ "inconsistency": {
482
+ "types": ["ink_holdout", "ink_bleeding", "illumination"],
483
+ "description": "Print quality issues and lighting variations"
484
+ },
485
+ "spatial": {
486
+ "types": ["rotation", "keystoning", "warping"],
487
+ "description": "Geometric transformations"
488
+ }
489
+ },
490
+ "all_types": [
491
+ "defocus", "vibration", "speckle", "texture",
492
+ "watermark", "background", "ink_holdout", "ink_bleeding",
493
+ "illumination", "rotation", "keystoning", "warping"
494
+ ],
495
+ "degree_levels": {
496
+ 1: "Mild - Subtle effect",
497
+ 2: "Moderate - Noticeable effect",
498
+ 3: "Severe - Strong effect"
499
+ }
500
+ }
501
+
502
+
503
+ @app.post("/api/detect")
504
+ async def detect(file: UploadFile = File(...), threshold: float = 0.3):
505
+ """Detect document layout using RoDLA with real weights or heuristic fallback"""
506
+ start_time = datetime.now()
507
+
508
+ try:
509
+ # Load image
510
+ contents = await file.read()
511
+ image = Image.open(BytesIO(contents)).convert('RGB')
512
+ image_np = np.array(image)
513
+ h, w = image_np.shape[:2]
514
+
515
+ # Run inference
516
+ detections = run_inference(image_np, threshold=threshold)
517
 
518
+ # Build class distribution
519
+ class_distribution = {}
520
+ for det in detections:
521
+ cn = det["class_name"]
522
+ class_distribution[cn] = class_distribution.get(cn, 0) + 1
523
+
524
+ processing_time = (datetime.now() - start_time).total_seconds() * 1000
525
+
526
+ detection_mode = "Real MMDET Model (3.8GB weights)" if model_state["mmdet_available"] else "Heuristic Detection"
527
+
528
+ return {
529
+ "success": True,
530
+ "message": f"Detection completed using {detection_mode}",
531
+ "detection_mode": detection_mode,
532
+ "image_width": w,
533
+ "image_height": h,
534
+ "num_detections": len(detections),
535
+ "detections": detections,
536
+ "class_distribution": class_distribution,
537
+ "processing_time_ms": processing_time
538
+ }
539
 
540
  except Exception as e:
541
+ print(f"❌ Detection error: {e}\n{traceback.format_exc()}")
542
+ processing_time = (datetime.now() - start_time).total_seconds() * 1000
543
+
544
+ return {
545
+ "success": False,
546
+ "message": str(e),
547
+ "image_width": 0,
548
+ "image_height": 0,
549
+ "num_detections": 0,
550
+ "detections": [],
551
+ "class_distribution": {},
552
+ "processing_time_ms": processing_time
553
+ }
554
 
555
 
556
+ @app.post("/api/generate-perturbations")
557
+ async def generate_perturbations(file: UploadFile = File(...)):
558
+ """Generate all 12 perturbations with 3 degree levels each (36 total images)"""
559
+
560
+ try:
561
+ # Import simple perturbation functions (no external dependencies beyond common libs)
562
+ from perturbations_simple import apply_perturbation as simple_apply_perturbation
563
+
564
+ # Load image
565
+ contents = await file.read()
566
+ image = Image.open(BytesIO(contents)).convert('RGB')
567
+ image_np = np.array(image)
568
+ image_bgr = cv2.cvtColor(image_np, cv2.COLOR_RGB2BGR)
569
+
570
+ perturbations = {}
571
+
572
+ # Original
573
+ perturbations["original"] = {
574
+ "original": encode_image_to_base64(image_np)
575
+ }
576
+
577
+ # All 12 perturbation types
578
+ all_types = [
579
+ "defocus", "vibration", "speckle", "texture",
580
+ "watermark", "background", "ink_holdout", "ink_bleeding",
581
+ "illumination", "rotation", "keystoning", "warping"
582
+ ]
583
+
584
+ print(f"πŸ“Š Generating perturbations for {len(all_types)} types Γ— 3 degrees = 36 images...")
585
+
586
+ # Generate all perturbations with 3 degree levels
587
+ generated_count = 0
588
+ for ptype in all_types:
589
+ perturbations[ptype] = {}
590
+
591
+ for degree in [1, 2, 3]:
592
+ try:
593
+ # Use simple perturbation function (no external heavy dependencies)
594
+ result_image, success, message = simple_apply_perturbation(
595
+ image_bgr.copy(),
596
+ ptype,
597
+ degree=degree
598
+ )
599
+
600
+ if success:
601
+ # Convert BGR to RGB for display
602
+ if len(result_image.shape) == 3 and result_image.shape[2] == 3:
603
+ result_rgb = cv2.cvtColor(result_image, cv2.COLOR_BGR2RGB)
604
+ else:
605
+ result_rgb = result_image
606
+
607
+ perturbations[ptype][f"degree_{degree}"] = encode_image_to_base64(result_rgb)
608
+ generated_count += 1
609
+ print(f" βœ… {ptype:12} degree {degree}: {message}")
610
+ else:
611
+ print(f" ⚠️ {ptype:12} degree {degree}: {message}")
612
+ perturbations[ptype][f"degree_{degree}"] = encode_image_to_base64(image_np)
613
+
614
+ except Exception as e:
615
+ print(f" ⚠️ Exception {ptype:12} degree {degree}: {e}")
616
+ perturbations[ptype][f"degree_{degree}"] = encode_image_to_base64(image_np)
617
+
618
+ print(f"\nβœ… Generated {generated_count}/36 perturbation images successfully")
619
+
620
+ return {
621
+ "success": True,
622
+ "message": f"Perturbations generated: 12 types Γ— 3 degrees = 36 images + 1 original = 37 total",
623
+ "perturbations": perturbations,
624
+ "grid_info": {
625
+ "total_perturbations": 12,
626
+ "degree_levels": 3,
627
+ "total_images": 37,
628
+ "generated_count": generated_count
629
+ }
630
+ }
631
+
632
+ except ImportError as e:
633
+ print(f"❌ Import error: {e}\n{traceback.format_exc()}")
634
+ return {
635
+ "success": False,
636
+ "message": f"Perturbation module import error: {str(e)}",
637
+ "perturbations": {}
638
+ }
639
+ except Exception as e:
640
+ print(f"❌ Perturbation generation error: {e}\n{traceback.format_exc()}")
641
+ return {
642
+ "success": False,
643
+ "message": str(e),
644
+ "perturbations": {}
645
+ }
646
+
647
 
648
+ # ============================================================================
649
+ # Main
650
+ # ============================================================================
651
 
652
  if __name__ == "__main__":
653
+ print("\n" + "πŸ”·"*35)
654
+ print("πŸ”· RoDLA PRODUCTION BACKEND")
655
+ print("πŸ”· Model: InternImage-XL with DINO")
656
+ print("πŸ”· Weights: 3.8GB (rodla_internimage_xl_publaynet.pth)")
657
+ print("πŸ”· Perturbations: 12 types Γ— 3 degrees each")
658
+ print("πŸ”· Detection: MMDET (if available) or Heuristic fallback")
659
+ print("πŸ”·"*35)
660
+
661
  uvicorn.run(
662
  app,
663
+ host="0.0.0.0",
664
+ port=Config.API_PORT,
665
  log_level="info"
666
+ )
deployment/backend/backend_adaptive.py DELETED
@@ -1,500 +0,0 @@
1
- """
2
- RoDLA Object Detection API - Adaptive Backend
3
- Attempts to use real model if available, falls back to enhanced simulation
4
- """
5
- from fastapi import FastAPI, File, UploadFile, HTTPException, Form
6
- from fastapi.middleware.cors import CORSMiddleware
7
- from fastapi.responses import JSONResponse
8
- import uvicorn
9
- from pathlib import Path
10
- import json
11
- import base64
12
- import cv2
13
- import numpy as np
14
- from io import BytesIO
15
- from PIL import Image, ImageDraw, ImageFont
16
- import asyncio
17
- import sys
18
-
19
- # Try to import ML frameworks
20
- try:
21
- import torch
22
- from mmdet.apis import init_detector, inference_detector
23
- HAS_MMDET = True
24
- print("βœ“ PyTorch/MMDET available - Using REAL model")
25
- except ImportError:
26
- HAS_MMDET = False
27
- print("⚠ PyTorch/MMDET not available - Using enhanced simulation")
28
-
29
- # Add paths for config access
30
- sys.path.insert(0, '/home/admin/CV/rodla-academic')
31
- sys.path.insert(0, '/home/admin/CV/rodla-academic/model')
32
-
33
- # Try to import settings
34
- try:
35
- from deployment.backend.config.settings import (
36
- MODEL_CONFIG_PATH, MODEL_WEIGHTS_PATH,
37
- API_HOST, API_PORT, CORS_ORIGINS, CORS_METHODS, CORS_HEADERS
38
- )
39
- print(f"βœ“ Config loaded from: {MODEL_CONFIG_PATH}")
40
- except Exception as e:
41
- print(f"⚠ Could not load config: {e}")
42
- API_HOST = "0.0.0.0"
43
- API_PORT = 8000
44
- CORS_ORIGINS = ["*"]
45
- CORS_METHODS = ["*"]
46
- CORS_HEADERS = ["*"]
47
-
48
- # Initialize FastAPI app
49
- app = FastAPI(
50
- title="RoDLA Object Detection API (Adaptive)",
51
- description="RoDLA Document Layout Analysis API - Real or Simulated Backend",
52
- version="2.1.0"
53
- )
54
-
55
- # Add CORS middleware
56
- app.add_middleware(
57
- CORSMiddleware,
58
- allow_origins=CORS_ORIGINS,
59
- allow_credentials=True,
60
- allow_methods=CORS_METHODS,
61
- allow_headers=CORS_HEADERS,
62
- )
63
-
64
- # Configuration
65
- OUTPUT_DIR = Path("outputs")
66
- OUTPUT_DIR.mkdir(exist_ok=True)
67
-
68
- # Model classes (from DINO detection)
69
- MODEL_CLASSES = [
70
- 'Title', 'Abstract', 'Introduction', 'Related Work', 'Methodology',
71
- 'Experiments', 'Results', 'Discussion', 'Conclusion', 'References',
72
- 'Text', 'Figure', 'Table', 'Header', 'Footer', 'Page Number',
73
- 'Caption', 'Section', 'Subsection', 'Equation', 'Chart', 'List'
74
- ]
75
-
76
- # Global model instance
77
- _model = None
78
- backend_mode = "SIMULATED" # Will change if model loads
79
-
80
- # ============================================
81
- # MODEL LOADING
82
- # ============================================
83
-
84
- def load_real_model():
85
- """Try to load the actual RoDLA model"""
86
- global _model, backend_mode
87
-
88
- if not HAS_MMDET:
89
- return False
90
-
91
- try:
92
- print("\nπŸ”„ Attempting to load real RoDLA model...")
93
-
94
- # Check if files exist
95
- if not Path(MODEL_CONFIG_PATH).exists():
96
- print(f"❌ Config not found: {MODEL_CONFIG_PATH}")
97
- return False
98
-
99
- if not Path(MODEL_WEIGHTS_PATH).exists():
100
- print(f"❌ Weights not found: {MODEL_WEIGHTS_PATH}")
101
- return False
102
-
103
- # Load model
104
- device = "cuda:0" if torch.cuda.is_available() else "cpu"
105
- print(f"Using device: {device}")
106
-
107
- _model = init_detector(
108
- str(MODEL_CONFIG_PATH),
109
- str(MODEL_WEIGHTS_PATH),
110
- device=device
111
- )
112
-
113
- backend_mode = "REAL"
114
- print("βœ… Real RoDLA model loaded successfully!")
115
- return True
116
-
117
- except Exception as e:
118
- print(f"❌ Failed to load real model: {e}")
119
- print("Falling back to enhanced simulation...")
120
- return False
121
-
122
- def predict_with_model(image_array, score_threshold=0.3):
123
- """Run inference with actual model"""
124
- try:
125
- if _model is None or backend_mode != "REAL":
126
- return None
127
-
128
- result = inference_detector(_model, image_array)
129
- return result
130
- except Exception as e:
131
- print(f"Model inference error: {e}")
132
- return None
133
-
134
- # ============================================
135
- # ENHANCED SIMULATION
136
- # ============================================
137
-
138
- class EnhancedDetector:
139
- """Enhanced simulation that respects document layout"""
140
-
141
- def __init__(self):
142
- self.regions = []
143
-
144
- def analyze_layout(self, image_array):
145
- """Analyze document layout to place detections intelligently"""
146
- h, w = image_array.shape[:2]
147
-
148
- # Common document layout regions
149
- layouts = {
150
- 'title': (0.05*w, 0.02*h, 0.95*w, 0.08*h),
151
- 'abstract': (0.05*w, 0.09*h, 0.95*w, 0.2*h),
152
- 'introduction': (0.05*w, 0.21*h, 0.95*w, 0.35*h),
153
- 'figure': (0.1*w, 0.36*h, 0.5*w, 0.65*h),
154
- 'table': (0.55*w, 0.36*h, 0.95*w, 0.65*h),
155
- 'references': (0.05*w, 0.7*h, 0.95*w, 0.98*h),
156
- }
157
- return layouts
158
-
159
- def generate_detections(self, image_array, num_detections=None):
160
- """Generate contextual detections"""
161
- if num_detections is None:
162
- num_detections = np.random.randint(10, 25)
163
-
164
- h, w = image_array.shape[:2]
165
- layouts = self.analyze_layout(image_array)
166
- detections = []
167
-
168
- # Grid-based detection for realistic distribution
169
- grid_w, grid_h = np.random.randint(2, 4), np.random.randint(3, 6)
170
- cell_w, cell_h = w // grid_w, h // grid_h
171
-
172
- for i in range(num_detections):
173
- # Pick random grid cell
174
- grid_x = np.random.randint(0, grid_w)
175
- grid_y = np.random.randint(0, grid_h)
176
-
177
- # Add some variation within cell
178
- margin = 0.1
179
- x_min = int(grid_x * cell_w + margin * cell_w)
180
- x_max = int((grid_x + 1) * cell_w - margin * cell_w)
181
- y_min = int(grid_y * cell_h + margin * cell_h)
182
- y_max = int((grid_y + 1) * cell_h - margin * cell_h)
183
-
184
- if x_max <= x_min or y_max <= y_min:
185
- continue
186
-
187
- x1 = np.random.randint(x_min, x_max)
188
- y1 = np.random.randint(y_min, y_max)
189
- x2 = x1 + np.random.randint(50, min(200, x_max - x1))
190
- y2 = y1 + np.random.randint(30, min(150, y_max - y1))
191
-
192
- # Prefer certain classes in certain regions
193
- if y1 < h * 0.1:
194
- class_name = np.random.choice(['Title', 'Abstract', 'Header'])
195
- elif y1 > h * 0.85:
196
- class_name = np.random.choice(['Footer', 'References', 'Page Number'])
197
- elif (x1 < w * 0.15 or x2 > w * 0.85):
198
- class_name = np.random.choice(['Figure', 'Table', 'List'])
199
- else:
200
- class_name = np.random.choice(MODEL_CLASSES)
201
-
202
- detection = {
203
- 'class': class_name,
204
- 'confidence': float(np.random.uniform(0.6, 0.98)),
205
- 'box': {
206
- 'x1': int(max(0, x1)),
207
- 'y1': int(max(0, y1)),
208
- 'x2': int(min(w, x2)),
209
- 'y2': int(min(h, y2))
210
- }
211
- }
212
- detections.append(detection)
213
-
214
- return detections
215
-
216
- detector = EnhancedDetector()
217
-
218
- # ============================================
219
- # HELPER FUNCTIONS
220
- # ============================================
221
-
222
- def generate_detections(image_shape, num_detections=None):
223
- """Generate detections"""
224
- return detector.generate_detections(np.zeros(image_shape), num_detections)
225
-
226
- def create_annotated_image(image_array, detections):
227
- """Create annotated image with bounding boxes"""
228
- img = Image.fromarray(image_array.astype('uint8'))
229
- draw = ImageDraw.Draw(img)
230
-
231
- box_color = (0, 255, 0) # Lime green
232
- text_color = (0, 255, 255) # Cyan
233
-
234
- for detection in detections:
235
- box = detection['box']
236
- x1, y1, x2, y2 = box['x1'], box['y1'], box['x2'], box['y2']
237
- conf = detection['confidence']
238
- class_name = detection['class']
239
-
240
- draw.rectangle([x1, y1, x2, y2], outline=box_color, width=2)
241
- label_text = f"{class_name} {conf*100:.0f}%"
242
- draw.text((x1, y1-15), label_text, fill=text_color)
243
-
244
- return np.array(img)
245
-
246
- def apply_perturbation(image_array, perturbation_type):
247
- """Apply perturbation to image"""
248
- result = image_array.copy()
249
-
250
- if perturbation_type == 'blur':
251
- result = cv2.GaussianBlur(result, (15, 15), 0)
252
-
253
- elif perturbation_type == 'noise':
254
- noise = np.random.normal(0, 25, result.shape)
255
- result = np.clip(result.astype(float) + noise, 0, 255).astype(np.uint8)
256
-
257
- elif perturbation_type == 'rotation':
258
- h, w = result.shape[:2]
259
- center = (w // 2, h // 2)
260
- angle = np.random.uniform(-15, 15)
261
- M = cv2.getRotationMatrix2D(center, angle, 1.0)
262
- result = cv2.warpAffine(result, M, (w, h))
263
-
264
- elif perturbation_type == 'scaling':
265
- scale = np.random.uniform(0.8, 1.2)
266
- h, w = result.shape[:2]
267
- new_h, new_w = int(h * scale), int(w * scale)
268
- result = cv2.resize(result, (new_w, new_h))
269
- if new_h > h or new_w > w:
270
- result = result[:h, :w]
271
- else:
272
- pad_h = h - new_h
273
- pad_w = w - new_w
274
- result = cv2.copyMakeBorder(result, pad_h//2, pad_h-pad_h//2,
275
- pad_w//2, pad_w-pad_w//2, cv2.BORDER_CONSTANT)
276
-
277
- elif perturbation_type == 'perspective':
278
- h, w = result.shape[:2]
279
- pts1 = np.float32([[0, 0], [w, 0], [0, h], [w, h]])
280
- pts2 = np.float32([
281
- [np.random.randint(0, 30), np.random.randint(0, 30)],
282
- [w - np.random.randint(0, 30), np.random.randint(0, 30)],
283
- [np.random.randint(0, 30), h - np.random.randint(0, 30)],
284
- [w - np.random.randint(0, 30), h - np.random.randint(0, 30)]
285
- ])
286
- M = cv2.getPerspectiveTransform(pts1, pts2)
287
- result = cv2.warpPerspective(result, M, (w, h))
288
-
289
- return result
290
-
291
- def image_to_base64(image_array):
292
- """Convert image array to base64 string"""
293
- img = Image.fromarray(image_array.astype('uint8'))
294
- buffer = BytesIO()
295
- img.save(buffer, format='PNG')
296
- return base64.b64encode(buffer.getvalue()).decode()
297
-
298
- # ============================================
299
- # API ENDPOINTS
300
- # ============================================
301
-
302
- @app.on_event("startup")
303
- async def startup_event():
304
- """Initialize on startup"""
305
- print("="*60)
306
- print("Starting RoDLA Document Layout Analysis API (Adaptive)")
307
- print("="*60)
308
-
309
- # Try to load real model
310
- load_real_model()
311
-
312
- print(f"\nπŸ“Š Backend Mode: {backend_mode}")
313
- print(f"🌐 Main API: http://{API_HOST}:{API_PORT}")
314
- print(f"πŸ“š Docs: http://localhost:{API_PORT}/docs")
315
- print(f"πŸ“– ReDoc: http://localhost:{API_PORT}/redoc")
316
- print("\n🎯 Available Endpoints:")
317
- print(" β€’ GET /api/health - Health check")
318
- print(" β€’ GET /api/model-info - Model information")
319
- print(" β€’ POST /api/detect - Standard detection")
320
- print(" β€’ GET /api/perturbations/info - Perturbation info")
321
- print(" β€’ POST /api/generate-perturbations - Generate perturbations")
322
- print(" β€’ POST /api/detect-with-perturbation - Detect with perturbations")
323
- print("="*60)
324
- print("βœ… API Ready!\n")
325
-
326
-
327
- @app.get("/api/health")
328
- async def health_check():
329
- """Health check endpoint"""
330
- return JSONResponse({
331
- "status": "healthy",
332
- "mode": backend_mode,
333
- "has_model": backend_mode == "REAL"
334
- })
335
-
336
-
337
- @app.get("/api/model-info")
338
- async def model_info():
339
- """Get model information"""
340
- return JSONResponse({
341
- "model_name": "RoDLA InternImage-XL",
342
- "paper": "RoDLA: Benchmarking the Robustness of Document Layout Analysis Models (CVPR 2024)",
343
- "backbone": "InternImage-XL",
344
- "detection_framework": "DINO with Channel Attention + Average Pooling",
345
- "dataset": "M6Doc-P",
346
- "max_detections_per_image": 300,
347
- "backend_mode": backend_mode,
348
- "state_of_the_art_performance": {
349
- "clean_mAP": 70.0,
350
- "perturbed_avg_mAP": 61.7,
351
- "mRD_score": 147.6
352
- }
353
- })
354
-
355
-
356
- @app.post("/api/detect")
357
- async def detect(file: UploadFile = File(...), score_threshold: float = Form(0.3)):
358
- """Standard detection endpoint"""
359
- try:
360
- contents = await file.read()
361
- image = Image.open(BytesIO(contents)).convert('RGB')
362
- image_array = np.array(image)
363
-
364
- detections = generate_detections(image_array.shape)
365
- detections = [d for d in detections if d['confidence'] >= score_threshold]
366
-
367
- annotated = create_annotated_image(image_array, detections)
368
- annotated_b64 = image_to_base64(annotated)
369
-
370
- class_dist = {}
371
- for det in detections:
372
- cls = det['class']
373
- class_dist[cls] = class_dist.get(cls, 0) + 1
374
-
375
- return JSONResponse({
376
- "detections": detections,
377
- "class_distribution": class_dist,
378
- "annotated_image": annotated_b64,
379
- "metrics": {
380
- "total_detections": len(detections),
381
- "average_confidence": float(np.mean([d['confidence'] for d in detections]) if detections else 0),
382
- "max_confidence": float(max([d['confidence'] for d in detections]) if detections else 0),
383
- "min_confidence": float(min([d['confidence'] for d in detections]) if detections else 0),
384
- "backend_mode": backend_mode
385
- }
386
- })
387
-
388
- except Exception as e:
389
- raise HTTPException(status_code=400, detail=str(e))
390
-
391
-
392
- @app.get("/api/perturbations/info")
393
- async def perturbations_info():
394
- """Get available perturbation types"""
395
- return JSONResponse({
396
- "available_perturbations": [
397
- "blur",
398
- "noise",
399
- "rotation",
400
- "scaling",
401
- "perspective"
402
- ],
403
- "description": "Various document perturbations for robustness testing"
404
- })
405
-
406
-
407
- @app.post("/api/generate-perturbations")
408
- async def generate_perturbations(
409
- file: UploadFile = File(...),
410
- perturbation_types: str = Form("blur,noise")
411
- ):
412
- """Generate and return perturbations"""
413
- try:
414
- contents = await file.read()
415
- image = Image.open(BytesIO(contents)).convert('RGB')
416
- image_array = np.array(image)
417
-
418
- pert_types = [p.strip() for p in perturbation_types.split(',')]
419
-
420
- results = {
421
- "original": image_to_base64(image_array),
422
- "perturbations": {}
423
- }
424
-
425
- for pert_type in pert_types:
426
- if pert_type:
427
- perturbed = apply_perturbation(image_array, pert_type)
428
- results["perturbations"][pert_type] = image_to_base64(perturbed)
429
-
430
- return JSONResponse(results)
431
-
432
- except Exception as e:
433
- raise HTTPException(status_code=400, detail=str(e))
434
-
435
-
436
- @app.post("/api/detect-with-perturbation")
437
- async def detect_with_perturbation(
438
- file: UploadFile = File(...),
439
- score_threshold: float = Form(0.3),
440
- perturbation_types: str = Form("blur,noise")
441
- ):
442
- """Detect with perturbations"""
443
- try:
444
- contents = await file.read()
445
- image = Image.open(BytesIO(contents)).convert('RGB')
446
- image_array = np.array(image)
447
-
448
- pert_types = [p.strip() for p in perturbation_types.split(',')]
449
-
450
- results = {
451
- "clean": {},
452
- "perturbed": {}
453
- }
454
-
455
- # Clean detection
456
- clean_dets = generate_detections(image_array.shape)
457
- clean_dets = [d for d in clean_dets if d['confidence'] >= score_threshold]
458
- clean_img = create_annotated_image(image_array, clean_dets)
459
-
460
- results["clean"]["detections"] = clean_dets
461
- results["clean"]["annotated_image"] = image_to_base64(clean_img)
462
-
463
- # Perturbed detections
464
- for pert_type in pert_types:
465
- if pert_type:
466
- perturbed_img = apply_perturbation(image_array, pert_type)
467
- pert_dets = generate_detections(perturbed_img.shape)
468
- pert_dets = [
469
- {**d, 'confidence': max(0, d['confidence'] - np.random.uniform(0, 0.1))}
470
- for d in pert_dets
471
- ]
472
- pert_dets = [d for d in pert_dets if d['confidence'] >= score_threshold]
473
- annotated_pert = create_annotated_image(perturbed_img, pert_dets)
474
-
475
- results["perturbed"][pert_type] = {
476
- "detections": pert_dets,
477
- "annotated_image": image_to_base64(annotated_pert)
478
- }
479
-
480
- return JSONResponse(results)
481
-
482
- except Exception as e:
483
- raise HTTPException(status_code=400, detail=str(e))
484
-
485
-
486
- @app.on_event("shutdown")
487
- async def shutdown_event():
488
- """Cleanup on shutdown"""
489
- print("\n" + "="*60)
490
- print("πŸ›‘ Shutting down RoDLA API...")
491
- print("="*60)
492
-
493
-
494
- if __name__ == "__main__":
495
- uvicorn.run(
496
- app,
497
- host=API_HOST,
498
- port=API_PORT,
499
- log_level="info"
500
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
deployment/backend/backend_demo.py DELETED
@@ -1,366 +0,0 @@
1
- """
2
- RoDLA Object Detection API - Demo/Lightweight Backend
3
- Simulates the full backend for testing when real model weights unavailable
4
- """
5
- from fastapi import FastAPI, File, UploadFile, HTTPException, Form
6
- from fastapi.middleware.cors import CORSMiddleware
7
- from fastapi.responses import JSONResponse
8
- import uvicorn
9
- from pathlib import Path
10
- import json
11
- import base64
12
- import cv2
13
- import numpy as np
14
- from io import BytesIO
15
- from PIL import Image, ImageDraw, ImageFont
16
- import asyncio
17
-
18
- # Initialize FastAPI app
19
- app = FastAPI(
20
- title="RoDLA Object Detection API (Demo Mode)",
21
- description="RoDLA Document Layout Analysis API - Demo/Test Version",
22
- version="2.1.0"
23
- )
24
-
25
- # Add CORS middleware
26
- app.add_middleware(
27
- CORSMiddleware,
28
- allow_origins=["*"],
29
- allow_credentials=True,
30
- allow_methods=["*"],
31
- allow_headers=["*"],
32
- )
33
-
34
- # Configuration
35
- API_HOST = "0.0.0.0"
36
- API_PORT = 8000
37
- OUTPUT_DIR = Path("outputs")
38
- OUTPUT_DIR.mkdir(exist_ok=True)
39
-
40
- # Model classes
41
- MODEL_CLASSES = [
42
- 'Title', 'Abstract', 'Introduction', 'Related Work', 'Methodology',
43
- 'Experiments', 'Results', 'Discussion', 'Conclusion', 'References',
44
- 'Text', 'Figure', 'Table', 'Header', 'Footer', 'Page Number', 'Caption'
45
- ]
46
-
47
- # ============================================
48
- # HELPER FUNCTIONS
49
- # ============================================
50
-
51
- def generate_demo_detections(image_shape, num_detections=None):
52
- """Generate realistic demo detections"""
53
- if num_detections is None:
54
- num_detections = np.random.randint(8, 20)
55
-
56
- height, width = image_shape[:2]
57
- detections = []
58
-
59
- for i in range(num_detections):
60
- x1 = np.random.randint(10, width - 200)
61
- y1 = np.random.randint(10, height - 100)
62
- x2 = x1 + np.random.randint(100, min(300, width - x1))
63
- y2 = y1 + np.random.randint(50, min(200, height - y1))
64
-
65
- detection = {
66
- 'class': np.random.choice(MODEL_CLASSES),
67
- 'confidence': float(np.random.uniform(0.5, 0.99)),
68
- 'box': {
69
- 'x1': int(x1),
70
- 'y1': int(y1),
71
- 'x2': int(x2),
72
- 'y2': int(y2)
73
- }
74
- }
75
- detections.append(detection)
76
-
77
- return detections
78
-
79
- def create_annotated_image(image_array, detections):
80
- """Create annotated image with bounding boxes"""
81
- # Convert to PIL Image
82
- img = Image.fromarray(image_array.astype('uint8'))
83
- draw = ImageDraw.Draw(img)
84
-
85
- # Colors in teal/lime theme
86
- box_color = (0, 255, 0) # Lime green
87
- text_color = (0, 255, 255) # Cyan
88
-
89
- for detection in detections:
90
- box = detection['box']
91
- x1, y1, x2, y2 = box['x1'], box['y1'], box['x2'], box['y2']
92
- conf = detection['confidence']
93
- class_name = detection['class']
94
-
95
- # Draw box
96
- draw.rectangle([x1, y1, x2, y2], outline=box_color, width=2)
97
-
98
- # Draw label
99
- label_text = f"{class_name} {conf*100:.0f}%"
100
- draw.text((x1, y1-15), label_text, fill=text_color)
101
-
102
- return np.array(img)
103
-
104
- def apply_perturbation(image_array, perturbation_type):
105
- """Apply perturbation to image"""
106
- result = image_array.copy()
107
-
108
- if perturbation_type == 'blur':
109
- result = cv2.GaussianBlur(result, (15, 15), 0)
110
-
111
- elif perturbation_type == 'noise':
112
- noise = np.random.normal(0, 25, result.shape)
113
- result = np.clip(result.astype(float) + noise, 0, 255).astype(np.uint8)
114
-
115
- elif perturbation_type == 'rotation':
116
- h, w = result.shape[:2]
117
- center = (w // 2, h // 2)
118
- angle = np.random.uniform(-15, 15)
119
- M = cv2.getRotationMatrix2D(center, angle, 1.0)
120
- result = cv2.warpAffine(result, M, (w, h))
121
-
122
- elif perturbation_type == 'scaling':
123
- scale = np.random.uniform(0.8, 1.2)
124
- h, w = result.shape[:2]
125
- new_h, new_w = int(h * scale), int(w * scale)
126
- result = cv2.resize(result, (new_w, new_h))
127
- # Pad or crop to original size
128
- if new_h > h or new_w > w:
129
- result = result[:h, :w]
130
- else:
131
- pad_h = h - new_h
132
- pad_w = w - new_w
133
- result = cv2.copyMakeBorder(result, pad_h//2, pad_h-pad_h//2,
134
- pad_w//2, pad_w-pad_w//2, cv2.BORDER_CONSTANT)
135
-
136
- elif perturbation_type == 'perspective':
137
- h, w = result.shape[:2]
138
- pts1 = np.float32([[0, 0], [w, 0], [0, h], [w, h]])
139
- pts2 = np.float32([
140
- [np.random.randint(0, 30), np.random.randint(0, 30)],
141
- [w - np.random.randint(0, 30), np.random.randint(0, 30)],
142
- [np.random.randint(0, 30), h - np.random.randint(0, 30)],
143
- [w - np.random.randint(0, 30), h - np.random.randint(0, 30)]
144
- ])
145
- M = cv2.getPerspectiveTransform(pts1, pts2)
146
- result = cv2.warpPerspective(result, M, (w, h))
147
-
148
- return result
149
-
150
- def image_to_base64(image_array):
151
- """Convert image array to base64 string"""
152
- img = Image.fromarray(image_array.astype('uint8'))
153
- buffer = BytesIO()
154
- img.save(buffer, format='PNG')
155
- return base64.b64encode(buffer.getvalue()).decode()
156
-
157
- # ============================================
158
- # API ENDPOINTS
159
- # ============================================
160
-
161
- @app.on_event("startup")
162
- async def startup_event():
163
- """Initialize on startup"""
164
- print("="*60)
165
- print("Starting RoDLA Document Layout Analysis API (DEMO)")
166
- print("="*60)
167
- print(f"🌐 Main API: http://{API_HOST}:{API_PORT}")
168
- print(f"πŸ“š Docs: http://localhost:{API_PORT}/docs")
169
- print(f"πŸ“– ReDoc: http://localhost:{API_PORT}/redoc")
170
- print("\n🎯 Available Endpoints:")
171
- print(" β€’ GET /api/health - Health check")
172
- print(" β€’ GET /api/model-info - Model information")
173
- print(" β€’ POST /api/detect - Standard detection")
174
- print(" β€’ GET /api/perturbations/info - Perturbation info")
175
- print(" β€’ POST /api/generate-perturbations - Generate perturbations")
176
- print(" β€’ POST /api/detect-with-perturbation - Detect with perturbations")
177
- print("="*60)
178
- print("βœ… API Ready! (Demo Mode)\n")
179
-
180
-
181
- @app.get("/api/health")
182
- async def health_check():
183
- """Health check endpoint"""
184
- return JSONResponse({
185
- "status": "healthy",
186
- "mode": "demo",
187
- "timestamp": str(Path.cwd())
188
- })
189
-
190
-
191
- @app.get("/api/model-info")
192
- async def model_info():
193
- """Get model information"""
194
- return JSONResponse({
195
- "model_name": "RoDLA InternImage-XL (Demo Mode)",
196
- "paper": "RoDLA: Benchmarking the Robustness of Document Layout Analysis Models (CVPR 2024)",
197
- "backbone": "InternImage-XL",
198
- "detection_framework": "DINO with Channel Attention + Average Pooling",
199
- "dataset": "M6Doc-P",
200
- "max_detections_per_image": 300,
201
- "demo_mode": True,
202
- "state_of_the_art_performance": {
203
- "clean_mAP": 70.0,
204
- "perturbed_avg_mAP": 61.7,
205
- "mRD_score": 147.6
206
- }
207
- })
208
-
209
-
210
- @app.post("/api/detect")
211
- async def detect(file: UploadFile = File(...), score_threshold: float = Form(0.3)):
212
- """Standard detection endpoint"""
213
- try:
214
- # Read image
215
- contents = await file.read()
216
- image = Image.open(BytesIO(contents)).convert('RGB')
217
- image_array = np.array(image)
218
-
219
- # Generate demo detections
220
- detections = generate_demo_detections(image_array.shape)
221
-
222
- # Filter by threshold
223
- detections = [d for d in detections if d['confidence'] >= score_threshold]
224
-
225
- # Create annotated image
226
- annotated = create_annotated_image(image_array, detections)
227
- annotated_b64 = image_to_base64(annotated)
228
-
229
- # Calculate class distribution
230
- class_dist = {}
231
- for det in detections:
232
- cls = det['class']
233
- class_dist[cls] = class_dist.get(cls, 0) + 1
234
-
235
- return JSONResponse({
236
- "detections": detections,
237
- "class_distribution": class_dist,
238
- "annotated_image": annotated_b64,
239
- "metrics": {
240
- "total_detections": len(detections),
241
- "average_confidence": float(np.mean([d['confidence'] for d in detections]) if detections else 0),
242
- "max_confidence": float(max([d['confidence'] for d in detections]) if detections else 0),
243
- "min_confidence": float(min([d['confidence'] for d in detections]) if detections else 0)
244
- }
245
- })
246
-
247
- except Exception as e:
248
- raise HTTPException(status_code=400, detail=str(e))
249
-
250
-
251
- @app.get("/api/perturbations/info")
252
- async def perturbations_info():
253
- """Get available perturbation types"""
254
- return JSONResponse({
255
- "available_perturbations": [
256
- "blur",
257
- "noise",
258
- "rotation",
259
- "scaling",
260
- "perspective"
261
- ],
262
- "description": "Various document perturbations for robustness testing"
263
- })
264
-
265
-
266
- @app.post("/api/generate-perturbations")
267
- async def generate_perturbations(
268
- file: UploadFile = File(...),
269
- perturbation_types: str = Form("blur,noise")
270
- ):
271
- """Generate and return perturbations"""
272
- try:
273
- # Read image
274
- contents = await file.read()
275
- image = Image.open(BytesIO(contents)).convert('RGB')
276
- image_array = np.array(image)
277
-
278
- # Parse perturbation types
279
- pert_types = [p.strip() for p in perturbation_types.split(',')]
280
-
281
- # Generate perturbations
282
- results = {
283
- "original": image_to_base64(image_array),
284
- "perturbations": {}
285
- }
286
-
287
- for pert_type in pert_types:
288
- if pert_type:
289
- perturbed = apply_perturbation(image_array, pert_type)
290
- results["perturbations"][pert_type] = image_to_base64(perturbed)
291
-
292
- return JSONResponse(results)
293
-
294
- except Exception as e:
295
- raise HTTPException(status_code=400, detail=str(e))
296
-
297
-
298
- @app.post("/api/detect-with-perturbation")
299
- async def detect_with_perturbation(
300
- file: UploadFile = File(...),
301
- score_threshold: float = Form(0.3),
302
- perturbation_types: str = Form("blur,noise")
303
- ):
304
- """Detect with perturbations"""
305
- try:
306
- # Read image
307
- contents = await file.read()
308
- image = Image.open(BytesIO(contents)).convert('RGB')
309
- image_array = np.array(image)
310
-
311
- # Parse perturbation types
312
- pert_types = [p.strip() for p in perturbation_types.split(',')]
313
-
314
- # Results for each perturbation
315
- results = {
316
- "clean": {},
317
- "perturbed": {}
318
- }
319
-
320
- # Clean detection
321
- clean_dets = generate_demo_detections(image_array.shape)
322
- clean_dets = [d for d in clean_dets if d['confidence'] >= score_threshold]
323
- clean_img = create_annotated_image(image_array, clean_dets)
324
-
325
- results["clean"]["detections"] = clean_dets
326
- results["clean"]["annotated_image"] = image_to_base64(clean_img)
327
-
328
- # Perturbed detections
329
- for pert_type in pert_types:
330
- if pert_type:
331
- perturbed_img = apply_perturbation(image_array, pert_type)
332
- pert_dets = generate_demo_detections(perturbed_img.shape)
333
- # Add slight confidence reduction for perturbed
334
- pert_dets = [
335
- {**d, 'confidence': max(0, d['confidence'] - np.random.uniform(0, 0.1))}
336
- for d in pert_dets
337
- ]
338
- pert_dets = [d for d in pert_dets if d['confidence'] >= score_threshold]
339
- annotated_pert = create_annotated_image(perturbed_img, pert_dets)
340
-
341
- results["perturbed"][pert_type] = {
342
- "detections": pert_dets,
343
- "annotated_image": image_to_base64(annotated_pert)
344
- }
345
-
346
- return JSONResponse(results)
347
-
348
- except Exception as e:
349
- raise HTTPException(status_code=400, detail=str(e))
350
-
351
-
352
- @app.on_event("shutdown")
353
- async def shutdown_event():
354
- """Cleanup on shutdown"""
355
- print("\n" + "="*60)
356
- print("πŸ›‘ Shutting down RoDLA API...")
357
- print("="*60)
358
-
359
-
360
- if __name__ == "__main__":
361
- uvicorn.run(
362
- app,
363
- host=API_HOST,
364
- port=API_PORT,
365
- log_level="info"
366
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
deployment/backend/backend_lite.py DELETED
@@ -1,618 +0,0 @@
1
- """
2
- Lightweight RoDLA Backend - Pure PyTorch Implementation
3
- Bypasses MMCV/MMDET compiled extensions for CPU-only systems
4
- """
5
-
6
- import os
7
- import sys
8
- import json
9
- import base64
10
- import traceback
11
- import subprocess
12
- from pathlib import Path
13
- from typing import Dict, List, Any, Optional, Tuple
14
- from io import BytesIO
15
- from datetime import datetime
16
-
17
- import numpy as np
18
- from PIL import Image
19
- import cv2
20
- import torch
21
-
22
- from fastapi import FastAPI, File, UploadFile, HTTPException, BackgroundTasks
23
- from fastapi.middleware.cors import CORSMiddleware
24
- from fastapi.responses import JSONResponse
25
- from pydantic import BaseModel
26
- import uvicorn
27
-
28
- # Try to import real perturbation functions
29
- try:
30
- from perturbations.apply import (
31
- apply_perturbation as real_apply_perturbation,
32
- apply_multiple_perturbations,
33
- get_perturbation_info as get_real_perturbation_info,
34
- PERTURBATION_CATEGORIES
35
- )
36
- REAL_PERTURBATIONS_AVAILABLE = True
37
- print("βœ… Real perturbation module imported successfully")
38
- except Exception as e:
39
- REAL_PERTURBATIONS_AVAILABLE = False
40
- print(f"⚠️ Could not import real perturbations: {e}")
41
- PERTURBATION_CATEGORIES = {}
42
-
43
- # ============================================================================
44
- # Configuration
45
- # ============================================================================
46
-
47
- class Config:
48
- """Global configuration"""
49
- API_PORT = 8000
50
- MAX_UPLOAD_SIZE = 50 * 1024 * 1024 # 50MB
51
- DEFAULT_SCORE_THRESHOLD = 0.3
52
- MAX_DETECTIONS_PER_IMAGE = 300
53
- REPO_ROOT = Path("/home/admin/CV/rodla-academic")
54
- MODEL_CONFIG_PATH = REPO_ROOT / "model/configs/m6doc/rodla_internimage_xl_m6doc.py"
55
- MODEL_WEIGHTS_PATH = REPO_ROOT / "finetuning_rodla/finetuning_rodla/checkpoints/rodla_internimage_xl_publaynet.pth"
56
-
57
-
58
- # ============================================================================
59
- # Global State
60
- # ============================================================================
61
-
62
- app = FastAPI(title="RoDLA Backend Lite", version="1.0.0")
63
- model_state = {
64
- "loaded": False,
65
- "error": None,
66
- "model": None,
67
- "model_type": "lightweight",
68
- "device": "cpu"
69
- }
70
-
71
- # Add CORS middleware
72
- app.add_middleware(
73
- CORSMiddleware,
74
- allow_origins=["*"],
75
- allow_credentials=True,
76
- allow_methods=["*"],
77
- allow_headers=["*"],
78
- )
79
-
80
-
81
- # ============================================================================
82
- # Schemas
83
- # ============================================================================
84
-
85
- class DetectionResult(BaseModel):
86
- class_id: int
87
- class_name: str
88
- confidence: float
89
- bbox: Dict[str, float] # {x, y, width, height}
90
- area: float
91
-
92
-
93
- class AnalysisResponse(BaseModel):
94
- success: bool
95
- message: str
96
- image_width: int
97
- image_height: int
98
- num_detections: int
99
- detections: List[DetectionResult]
100
- class_distribution: Dict[str, int]
101
- processing_time_ms: float
102
-
103
-
104
- class PerturbationResponse(BaseModel):
105
- success: bool
106
- message: str
107
- perturbation_type: str
108
- original_image: str # base64
109
- perturbed_image: str # base64
110
-
111
-
112
- class BatchAnalysisRequest(BaseModel):
113
- threshold: float = Config.DEFAULT_SCORE_THRESHOLD
114
- score_threshold: float = Config.DEFAULT_SCORE_THRESHOLD
115
-
116
-
117
- # ============================================================================
118
- # Simple Mock Model (Lightweight Detection)
119
- # ============================================================================
120
-
121
- class LightweightDetector:
122
- """
123
- Simple layout detection model that doesn't require MMCV/MMDET
124
- Generates synthetic but realistic detections for document layout analysis
125
- """
126
-
127
- DOCUMENT_CLASSES = {
128
- 0: "Text",
129
- 1: "Title",
130
- 2: "Figure",
131
- 3: "Table",
132
- 4: "Header",
133
- 5: "Footer",
134
- 6: "List"
135
- }
136
-
137
- def __init__(self):
138
- self.device = "cpu"
139
- print(f"βœ… Lightweight detector initialized (device: {self.device})")
140
-
141
- def detect(self, image: np.ndarray, score_threshold: float = 0.3) -> List[Dict[str, Any]]:
142
- """
143
- Perform document layout detection on image
144
- Returns list of detections with class, confidence, and bbox
145
- """
146
- height, width = image.shape[:2]
147
- detections = []
148
-
149
- # Simple heuristic: scan image for content regions
150
- # Convert to grayscale
151
- if len(image.shape) == 3:
152
- gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
153
- else:
154
- gray = image
155
-
156
- # Apply threshold to find content regions
157
- _, binary = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY_INV)
158
-
159
- # Find contours
160
- contours, _ = cv2.findContours(binary, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
161
-
162
- # Process top contours as regions
163
- sorted_contours = sorted(contours, key=cv2.contourArea, reverse=True)[:15]
164
-
165
- for idx, contour in enumerate(sorted_contours):
166
- x, y, w, h = cv2.boundingRect(contour)
167
-
168
- # Skip very small regions
169
- if w < 10 or h < 10:
170
- continue
171
-
172
- # Filter regions that are too large (whole page)
173
- if w > width * 0.95 or h > height * 0.95:
174
- continue
175
-
176
- # Assign class based on heuristics
177
- aspect_ratio = w / h if h > 0 else 1
178
- area_ratio = (w * h) / (width * height)
179
-
180
- if aspect_ratio > 3: # Wide -> likely title or figure caption
181
- class_id = 1 if area_ratio < 0.15 else 2
182
- elif aspect_ratio < 0.5: # Tall -> likely list or table
183
- class_id = 3 if area_ratio > 0.2 else 6
184
- else: # Regular -> text
185
- class_id = 0
186
-
187
- # Generate confidence based on region size and position
188
- confidence = min(0.95, 0.4 + area_ratio)
189
-
190
- if confidence >= score_threshold:
191
- detections.append({
192
- "class_id": class_id,
193
- "class_name": self.DOCUMENT_CLASSES.get(class_id, "Unknown"),
194
- "confidence": float(confidence),
195
- "bbox": {
196
- "x": float(x / width),
197
- "y": float(y / height),
198
- "width": float(w / width),
199
- "height": float(h / height)
200
- },
201
- "area": float((w * h) / (width * height))
202
- })
203
-
204
- # If no detections found, add synthetic ones
205
- if not detections:
206
- detections = self._generate_synthetic_detections(width, height, score_threshold)
207
-
208
- return detections[:Config.MAX_DETECTIONS_PER_IMAGE]
209
-
210
- def _generate_synthetic_detections(self, width: int, height: int,
211
- score_threshold: float) -> List[Dict[str, Any]]:
212
- """Generate synthetic detections when contour detection fails"""
213
- detections = []
214
-
215
- # Title at top
216
- detections.append({
217
- "class_id": 1,
218
- "class_name": "Title",
219
- "confidence": 0.92,
220
- "bbox": {"x": 0.05, "y": 0.05, "width": 0.9, "height": 0.1},
221
- "area": 0.09
222
- })
223
-
224
- # Main text body
225
- detections.append({
226
- "class_id": 0,
227
- "class_name": "Text",
228
- "confidence": 0.88,
229
- "bbox": {"x": 0.05, "y": 0.2, "width": 0.9, "height": 0.6},
230
- "area": 0.54
231
- })
232
-
233
- # Side figure
234
- detections.append({
235
- "class_id": 2,
236
- "class_name": "Figure",
237
- "confidence": 0.85,
238
- "bbox": {"x": 0.55, "y": 0.22, "width": 0.4, "height": 0.4},
239
- "area": 0.16
240
- })
241
-
242
- return [d for d in detections if d["confidence"] >= score_threshold]
243
-
244
-
245
- # ============================================================================
246
- # Model Loading
247
- # ============================================================================
248
-
249
- def load_model():
250
- """Load the detection model"""
251
- global model_state
252
-
253
- try:
254
- print("\n" + "="*60)
255
- print("πŸš€ Loading RoDLA Model (Lightweight Mode)")
256
- print("="*60)
257
-
258
- model_state["model"] = LightweightDetector()
259
- model_state["loaded"] = True
260
- model_state["error"] = None
261
-
262
- print("βœ… Model loaded successfully!")
263
- print(f" Device: {model_state['model'].device}")
264
- print(f" Type: Lightweight detector (no MMCV/MMDET required)")
265
- print("="*60 + "\n")
266
-
267
- return model_state["model"]
268
-
269
- except Exception as e:
270
- error_msg = f"Failed to load model: {str(e)}\n{traceback.format_exc()}"
271
- print(f"❌ {error_msg}")
272
- model_state["error"] = error_msg
273
- model_state["loaded"] = False
274
- raise
275
-
276
-
277
- # ============================================================================
278
- # Utility Functions
279
- # ============================================================================
280
-
281
- def encode_image_to_base64(image: np.ndarray) -> str:
282
- """Convert numpy array to base64 string"""
283
- _, buffer = cv2.imencode('.png', cv2.cvtColor(image, cv2.COLOR_RGB2BGR))
284
- return base64.b64encode(buffer).decode('utf-8')
285
-
286
-
287
- def decode_base64_to_image(b64_str: str) -> np.ndarray:
288
- """Convert base64 string to numpy array"""
289
- buffer = base64.b64decode(b64_str)
290
- image = Image.open(BytesIO(buffer)).convert('RGB')
291
- return np.array(image)
292
-
293
-
294
- def apply_perturbation(image: np.ndarray, perturbation_type: str,
295
- degree: int = 2, **kwargs) -> np.ndarray:
296
- """Apply perturbation using real backend if available, else fallback"""
297
-
298
- if REAL_PERTURBATIONS_AVAILABLE:
299
- try:
300
- result, success, msg = real_apply_perturbation(image, perturbation_type, degree=degree)
301
- if success:
302
- return result
303
- else:
304
- print(f"⚠️ Real perturbation failed ({perturbation_type}): {msg}")
305
- except Exception as e:
306
- print(f"⚠️ Exception in real perturbation ({perturbation_type}): {e}")
307
-
308
- # Fallback to simple perturbations
309
- h, w = image.shape[:2]
310
-
311
- if perturbation_type == "blur" or perturbation_type == "defocus":
312
- kernel_size = [3, 5, 7][degree - 1]
313
- return cv2.GaussianBlur(image, (kernel_size, kernel_size), 0)
314
-
315
- elif perturbation_type == "noise" or perturbation_type == "speckle":
316
- std = [10, 25, 50][degree - 1]
317
- noise = np.random.normal(0, std, image.shape)
318
- return np.clip(image.astype(float) + noise, 0, 255).astype(np.uint8)
319
-
320
- elif perturbation_type == "rotation":
321
- angle = [5, 15, 25][degree - 1]
322
- center = (w // 2, h // 2)
323
- M = cv2.getRotationMatrix2D(center, angle, 1.0)
324
- return cv2.warpAffine(image, M, (w, h), borderValue=(255, 255, 255))
325
-
326
- elif perturbation_type == "scaling":
327
- scale = [0.9, 0.8, 0.7][degree - 1]
328
- new_w, new_h = int(w * scale), int(h * scale)
329
- resized = cv2.resize(image, (new_w, new_h))
330
- canvas = np.full((h, w, 3), 255, dtype=np.uint8)
331
- y_offset = (h - new_h) // 2
332
- x_offset = (w - new_w) // 2
333
- canvas[y_offset:y_offset+new_h, x_offset:x_offset+new_w] = resized
334
- return canvas
335
-
336
- elif perturbation_type == "perspective":
337
- offset = [10, 20, 40][degree - 1]
338
- pts1 = np.float32([[0, 0], [w, 0], [0, h], [w, h]])
339
- pts2 = np.float32([
340
- [offset, 0],
341
- [w - offset, offset],
342
- [0, h - offset],
343
- [w - offset, h]
344
- ])
345
- M = cv2.getPerspectiveTransform(pts1, pts2)
346
- return cv2.warpPerspective(image, M, (w, h), borderValue=(255, 255, 255))
347
-
348
- else:
349
- return image
350
-
351
-
352
- # ============================================================================
353
- # API Routes
354
- # ============================================================================
355
-
356
- @app.on_event("startup")
357
- async def startup_event():
358
- """Initialize model on startup"""
359
- try:
360
- load_model()
361
- except Exception as e:
362
- print(f"⚠️ Startup error: {e}")
363
-
364
-
365
- @app.get("/api/health")
366
- async def health_check():
367
- """Health check endpoint"""
368
- return {
369
- "status": "ok",
370
- "model_loaded": model_state["loaded"],
371
- "device": model_state["device"],
372
- "model_type": model_state["model_type"]
373
- }
374
-
375
-
376
- @app.get("/api/model-info")
377
- async def model_info():
378
- """Get model information"""
379
- return {
380
- "name": "RoDLA Lightweight",
381
- "version": "1.0.0",
382
- "type": "Document Layout Analysis",
383
- "loaded": model_state["loaded"],
384
- "device": model_state["device"],
385
- "framework": "PyTorch (Pure)",
386
- "classes": LightweightDetector.DOCUMENT_CLASSES,
387
- "supported_perturbations": ["blur", "noise", "rotation", "scaling", "perspective"]
388
- }
389
-
390
-
391
- @app.post("/api/detect")
392
- async def detect(file: UploadFile = File(...), threshold: float = 0.3):
393
- """Detect document layout in image"""
394
- start_time = datetime.now()
395
-
396
- try:
397
- if not model_state["loaded"]:
398
- raise HTTPException(status_code=500, detail="Model not loaded")
399
-
400
- # Read image
401
- contents = await file.read()
402
- image = Image.open(BytesIO(contents)).convert('RGB')
403
- image_np = np.array(image)
404
-
405
- # Run detection
406
- detections = model_state["model"].detect(image_np, score_threshold=threshold)
407
-
408
- # Build response
409
- class_distribution = {}
410
- for det in detections:
411
- class_name = det["class_name"]
412
- class_distribution[class_name] = class_distribution.get(class_name, 0) + 1
413
-
414
- processing_time = (datetime.now() - start_time).total_seconds() * 1000
415
-
416
- return {
417
- "success": True,
418
- "message": "Detection completed",
419
- "image_width": image_np.shape[1],
420
- "image_height": image_np.shape[0],
421
- "num_detections": len(detections),
422
- "detections": detections,
423
- "class_distribution": class_distribution,
424
- "processing_time_ms": processing_time
425
- }
426
-
427
- except Exception as e:
428
- print(f"❌ Detection error: {e}")
429
- return {
430
- "success": False,
431
- "message": str(e),
432
- "image_width": 0,
433
- "image_height": 0,
434
- "num_detections": 0,
435
- "detections": [],
436
- "class_distribution": {},
437
- "processing_time_ms": 0
438
- }
439
-
440
-
441
- @app.get("/api/perturbations/info")
442
- async def perturbation_info():
443
- """Get information about available perturbations"""
444
- return {
445
- "total_perturbations": 12,
446
- "categories": {
447
- "blur": {
448
- "types": ["defocus", "vibration"],
449
- "description": "Blur effects simulating optical issues"
450
- },
451
- "noise": {
452
- "types": ["speckle", "texture"],
453
- "description": "Noise patterns and texture artifacts"
454
- },
455
- "content": {
456
- "types": ["watermark", "background"],
457
- "description": "Content additions like watermarks and backgrounds"
458
- },
459
- "inconsistency": {
460
- "types": ["ink_holdout", "ink_bleeding", "illumination"],
461
- "description": "Print quality issues and lighting variations"
462
- },
463
- "spatial": {
464
- "types": ["rotation", "keystoning", "warping"],
465
- "description": "Geometric transformations"
466
- }
467
- },
468
- "all_types": [
469
- "defocus", "vibration", "speckle", "texture",
470
- "watermark", "background", "ink_holdout", "ink_bleeding",
471
- "illumination", "rotation", "keystoning", "warping"
472
- ],
473
- "degree_levels": {
474
- 1: "Mild - Subtle effect",
475
- 2: "Moderate - Noticeable effect",
476
- 3: "Severe - Strong effect"
477
- }
478
- }
479
-
480
-
481
- @app.post("/api/generate-perturbations")
482
- async def generate_perturbations(file: UploadFile = File(...)):
483
- """Generate perturbed versions of image with all 12 types Γ— 3 degrees"""
484
-
485
- try:
486
- # Read image
487
- contents = await file.read()
488
- image = Image.open(BytesIO(contents)).convert('RGB')
489
- image_np = np.array(image)
490
-
491
- # Convert RGB to BGR for OpenCV
492
- image_bgr = cv2.cvtColor(image_np, cv2.COLOR_RGB2BGR)
493
-
494
- perturbations = {}
495
-
496
- # Original
497
- perturbations["original"] = {
498
- "original": encode_image_to_base64(image_np)
499
- }
500
-
501
- # All 12 perturbation types
502
- all_types = [
503
- "defocus", "vibration", "speckle", "texture",
504
- "watermark", "background", "ink_holdout", "ink_bleeding",
505
- "illumination", "rotation", "keystoning", "warping"
506
- ]
507
-
508
- for ptype in all_types:
509
- perturbations[ptype] = {}
510
- for degree in [1, 2, 3]:
511
- try:
512
- perturbed = apply_perturbation(image_bgr.copy(), ptype, degree)
513
- # Convert back to RGB for display
514
- if len(perturbed.shape) == 3 and perturbed.shape[2] == 3:
515
- perturbed_rgb = cv2.cvtColor(perturbed, cv2.COLOR_BGR2RGB)
516
- else:
517
- perturbed_rgb = perturbed
518
- perturbations[ptype][f"degree_{degree}"] = encode_image_to_base64(perturbed_rgb)
519
- except Exception as e:
520
- print(f"⚠️ Warning: Failed to apply {ptype} degree {degree}: {e}")
521
- # Use original as fallback
522
- perturbations[ptype][f"degree_{degree}"] = encode_image_to_base64(image_np)
523
-
524
- return {
525
- "success": True,
526
- "message": "Perturbations generated (12 types Γ— 3 levels)",
527
- "perturbations": perturbations,
528
- "grid_info": {
529
- "total_perturbations": 12,
530
- "degree_levels": 3,
531
- "total_images": 13 # 1 original + 12 types
532
- }
533
- }
534
-
535
- except Exception as e:
536
- print(f"❌ Perturbation error: {e}")
537
- import traceback
538
- traceback.print_exc()
539
- return {
540
- "success": False,
541
- "message": str(e),
542
- "perturbations": {}
543
- }
544
-
545
-
546
- @app.post("/api/detect-with-perturbation")
547
- async def detect_with_perturbation(
548
- file: UploadFile = File(...),
549
- perturbation_type: str = "blur",
550
- threshold: float = 0.3
551
- ):
552
- """Apply perturbation and detect"""
553
-
554
- try:
555
- # Read image
556
- contents = await file.read()
557
- image = Image.open(BytesIO(contents)).convert('RGB')
558
- image_np = np.array(image)
559
-
560
- # Apply perturbation
561
- if perturbation_type == "blur":
562
- perturbed = apply_perturbation(image_np, "blur", kernel_size=15)
563
- elif perturbation_type == "noise":
564
- perturbed = apply_perturbation(image_np, "noise", std=25)
565
- elif perturbation_type == "rotation":
566
- perturbed = apply_perturbation(image_np, "rotation", angle=15)
567
- elif perturbation_type == "scaling":
568
- perturbed = apply_perturbation(image_np, "scaling", scale=0.85)
569
- elif perturbation_type == "perspective":
570
- perturbed = apply_perturbation(image_np, "perspective", offset=20)
571
- else:
572
- perturbed = image_np
573
-
574
- # Run detection
575
- detections = model_state["model"].detect(perturbed, score_threshold=threshold)
576
-
577
- class_distribution = {}
578
- for det in detections:
579
- class_name = det["class_name"]
580
- class_distribution[class_name] = class_distribution.get(class_name, 0) + 1
581
-
582
- return {
583
- "success": True,
584
- "message": "Detection with perturbation completed",
585
- "perturbation_type": perturbation_type,
586
- "image_width": perturbed.shape[1],
587
- "image_height": perturbed.shape[0],
588
- "num_detections": len(detections),
589
- "detections": detections,
590
- "class_distribution": class_distribution
591
- }
592
-
593
- except Exception as e:
594
- print(f"❌ Detection with perturbation error: {e}")
595
- return {
596
- "success": False,
597
- "message": str(e),
598
- "perturbation_type": perturbation_type,
599
- "num_detections": 0,
600
- "detections": []
601
- }
602
-
603
-
604
- # ============================================================================
605
- # Main
606
- # ============================================================================
607
-
608
- if __name__ == "__main__":
609
- print("\n" + "πŸ”·"*30)
610
- print("πŸ”· RoDLA Lightweight Backend Starting...")
611
- print("πŸ”·"*30)
612
-
613
- uvicorn.run(
614
- app,
615
- host="0.0.0.0",
616
- port=Config.API_PORT,
617
- log_level="info"
618
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
deployment/backend/{backend_two.py β†’ backend_old.py} RENAMED
File without changes
deployment/backend/perturbations/spatial.py CHANGED
@@ -1,41 +1,49 @@
1
  import os.path
2
- from detectron2.data.transforms import RotationTransform
3
- from detectron2.data.detection_utils import transform_instance_annotations
4
  import numpy as np
5
- from detectron2.data.datasets import register_coco_instances
6
  from copy import deepcopy
7
  import os
8
  import cv2
9
- from detectron2.data.datasets.coco import convert_to_coco_json, convert_to_coco_dict
10
- from detectron2.data import MetadataCatalog, DatasetCatalog
11
  import imgaug.augmenters as iaa
12
  from imgaug.augmentables.bbs import BoundingBox, BoundingBoxesOnImage
13
  from imgaug.augmentables.polys import Polygon, PolygonsOnImage
14
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  def apply_rotation(image, degree, annos=None):
17
  if degree == 0:
18
- return image
 
19
  angle_low_list = [0, 5, 10]
20
  angle_high_list = [5, 10, 15]
21
  angle_high = angle_high_list[degree - 1]
22
  angle_low = angle_low_list[degree - 1]
23
  h, w = image.shape[:2]
 
24
  if angle_low == 0:
25
  rotation = np.random.choice(np.arange(-angle_high, angle_high+1))
26
  else:
27
  rotation = np.random.choice(np.concatenate([np.arange(-angle_high, -angle_low+1), np.arange(angle_low, angle_high+1)]))
28
- rotation_transform = RotationTransform(h, w, rotation)
29
- rotated_image = rotation_transform.apply_image(image)
 
 
 
 
30
  if annos is None:
31
  return rotated_image
32
- rotated_annos = []
33
- for anno in annos:
34
- rotated_anno = transform_instance_annotations(anno, rotation_transform, (h, w))
35
- for i, seg in enumerate(rotated_anno["segmentation"]):
36
- rotated_anno["segmentation"][i] = seg.tolist()
37
- rotated_annos.append(rotated_anno)
38
- return rotated_image, rotated_annos
39
 
40
 
41
  def apply_warping(image, degree, annos=None):
 
1
  import os.path
 
 
2
  import numpy as np
 
3
  from copy import deepcopy
4
  import os
5
  import cv2
 
 
6
  import imgaug.augmenters as iaa
7
  from imgaug.augmentables.bbs import BoundingBox, BoundingBoxesOnImage
8
  from imgaug.augmentables.polys import Polygon, PolygonsOnImage
9
 
10
+ # detectron2 imports are only used for annotation transformation (optional)
11
+ try:
12
+ from detectron2.data.transforms import RotationTransform
13
+ from detectron2.data.detection_utils import transform_instance_annotations
14
+ from detectron2.data.datasets import register_coco_instances
15
+ from detectron2.data.datasets.coco import convert_to_coco_json, convert_to_coco_dict
16
+ from detectron2.data import MetadataCatalog, DatasetCatalog
17
+ HAS_DETECTRON2 = True
18
+ except ImportError:
19
+ HAS_DETECTRON2 = False
20
+
21
 
22
  def apply_rotation(image, degree, annos=None):
23
  if degree == 0:
24
+ return image if annos is None else (image, annos)
25
+
26
  angle_low_list = [0, 5, 10]
27
  angle_high_list = [5, 10, 15]
28
  angle_high = angle_high_list[degree - 1]
29
  angle_low = angle_low_list[degree - 1]
30
  h, w = image.shape[:2]
31
+
32
  if angle_low == 0:
33
  rotation = np.random.choice(np.arange(-angle_high, angle_high+1))
34
  else:
35
  rotation = np.random.choice(np.concatenate([np.arange(-angle_high, -angle_low+1), np.arange(angle_low, angle_high+1)]))
36
+
37
+ # Use OpenCV for rotation instead of detectron2
38
+ center = (w // 2, h // 2)
39
+ rotation_matrix = cv2.getRotationMatrix2D(center, rotation, 1.0)
40
+ rotated_image = cv2.warpAffine(image, rotation_matrix, (w, h), borderValue=(255, 255, 255))
41
+
42
  if annos is None:
43
  return rotated_image
44
+
45
+ # For annotations, return original since we don't have detectron2
46
+ return rotated_image, annos
 
 
 
 
47
 
48
 
49
  def apply_warping(image, degree, annos=None):
deployment/backend/perturbations_simple.py ADDED
@@ -0,0 +1,516 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Perturbation Application Module - Using Common Libraries
3
+ Applies 12 document degradation perturbations using PIL, OpenCV, NumPy, and SciPy
4
+ """
5
+
6
+ import cv2
7
+ import numpy as np
8
+ from PIL import Image, ImageDraw, ImageFilter, ImageOps
9
+ from typing import Optional, Tuple, List, Dict
10
+ from scipy import ndimage
11
+ from scipy.ndimage import gaussian_filter
12
+ import random
13
+
14
+
15
+ def encode_to_rgb(image: np.ndarray) -> np.ndarray:
16
+ """Ensure image is in RGB format"""
17
+ if len(image.shape) == 2: # Grayscale
18
+ return cv2.cvtColor(image, cv2.COLOR_GRAY2RGB)
19
+ elif image.shape[2] == 4: # RGBA
20
+ return cv2.cvtColor(image, cv2.COLOR_RGBA2RGB)
21
+ return image
22
+
23
+
24
+ # ============================================================================
25
+ # BLUR PERTURBATIONS
26
+ # ============================================================================
27
+
28
+ def apply_defocus(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
29
+ """
30
+ Apply defocus blur (Gaussian blur simulating out-of-focus camera)
31
+ degree: 1 (mild), 2 (moderate), 3 (severe)
32
+ """
33
+ if degree == 0:
34
+ return image, True, "No defocus"
35
+
36
+ try:
37
+ image = encode_to_rgb(image)
38
+
39
+ # Kernel sizes for different degrees
40
+ kernel_sizes = {1: 3, 2: 7, 3: 15}
41
+ kernel_size = kernel_sizes.get(degree, 15)
42
+
43
+ # Apply Gaussian blur
44
+ blurred = cv2.GaussianBlur(image, (kernel_size, kernel_size), 0)
45
+
46
+ return blurred, True, f"Defocus applied (kernel={kernel_size})"
47
+ except Exception as e:
48
+ return image, False, f"Defocus error: {str(e)}"
49
+
50
+
51
+ def apply_vibration(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
52
+ """
53
+ Apply motion blur (vibration/camera shake effect)
54
+ degree: 1 (mild), 2 (moderate), 3 (severe)
55
+ """
56
+ if degree == 0:
57
+ return image, True, "No vibration"
58
+
59
+ try:
60
+ image = encode_to_rgb(image)
61
+ h, w = image.shape[:2]
62
+
63
+ # Motion blur kernel sizes
64
+ kernel_sizes = {1: 5, 2: 15, 3: 25}
65
+ kernel_size = kernel_sizes.get(degree, 25)
66
+
67
+ # Create motion blur kernel
68
+ kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (kernel_size, kernel_size))
69
+ kernel = kernel / kernel.sum()
70
+
71
+ # Apply motion blur
72
+ blurred = cv2.filter2D(image, -1, kernel)
73
+
74
+ return blurred, True, f"Vibration applied (kernel={kernel_size})"
75
+ except Exception as e:
76
+ return image, False, f"Vibration error: {str(e)}"
77
+
78
+
79
+ # ============================================================================
80
+ # NOISE PERTURBATIONS
81
+ # ============================================================================
82
+
83
+ def apply_speckle(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
84
+ """
85
+ Apply speckle noise (multiplicative noise)
86
+ degree: 1 (mild), 2 (moderate), 3 (severe)
87
+ """
88
+ if degree == 0:
89
+ return image, True, "No speckle"
90
+
91
+ try:
92
+ image = encode_to_rgb(image)
93
+ image_float = image.astype(np.float32) / 255.0
94
+
95
+ # Noise intensity
96
+ noise_levels = {1: 0.1, 2: 0.25, 3: 0.5}
97
+ noise_level = noise_levels.get(degree, 0.5)
98
+
99
+ # Generate speckle noise
100
+ speckle = np.random.normal(1, noise_level, image_float.shape)
101
+ noisy = image_float * speckle
102
+
103
+ # Clip values
104
+ noisy = np.clip(noisy, 0, 1)
105
+ noisy = (noisy * 255).astype(np.uint8)
106
+
107
+ return noisy, True, f"Speckle applied (intensity={noise_level})"
108
+ except Exception as e:
109
+ return image, False, f"Speckle error: {str(e)}"
110
+
111
+
112
+ def apply_texture(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
113
+ """
114
+ Apply texture/grain noise (additive Gaussian noise)
115
+ degree: 1 (mild), 2 (moderate), 3 (severe)
116
+ """
117
+ if degree == 0:
118
+ return image, True, "No texture"
119
+
120
+ try:
121
+ image = encode_to_rgb(image)
122
+ image_float = image.astype(np.float32)
123
+
124
+ # Noise levels
125
+ noise_levels = {1: 10, 2: 25, 3: 50}
126
+ noise_level = noise_levels.get(degree, 50)
127
+
128
+ # Add Gaussian noise
129
+ noise = np.random.normal(0, noise_level, image_float.shape)
130
+ noisy = image_float + noise
131
+
132
+ # Clip values
133
+ noisy = np.clip(noisy, 0, 255).astype(np.uint8)
134
+
135
+ return noisy, True, f"Texture applied (std={noise_level})"
136
+ except Exception as e:
137
+ return image, False, f"Texture error: {str(e)}"
138
+
139
+
140
+ # ============================================================================
141
+ # CONTENT PERTURBATIONS
142
+ # ============================================================================
143
+
144
+ def apply_watermark(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
145
+ """
146
+ Add watermark text overlay
147
+ degree: 1 (subtle), 2 (noticeable), 3 (heavy)
148
+ """
149
+ if degree == 0:
150
+ return image, True, "No watermark"
151
+
152
+ try:
153
+ image = encode_to_rgb(image)
154
+ h, w = image.shape[:2]
155
+
156
+ # Convert to PIL for text drawing
157
+ pil_image = Image.fromarray(image)
158
+ draw = ImageDraw.Draw(pil_image, 'RGBA')
159
+
160
+ # Watermark parameters by degree
161
+ watermark_text = "WATERMARK" * degree
162
+ fontsize_list = {1: max(10, h // 20), 2: max(15, h // 15), 3: max(20, h // 10)}
163
+ fontsize = fontsize_list.get(degree, 20)
164
+
165
+ alpha_list = {1: 64, 2: 128, 3: 200}
166
+ alpha = alpha_list.get(degree, 200)
167
+
168
+ # Draw watermark multiple times
169
+ num_watermarks = {1: 1, 2: 3, 3: 5}.get(degree, 5)
170
+
171
+ for i in range(num_watermarks):
172
+ x = (w // (num_watermarks + 1)) * (i + 1)
173
+ y = h // 2
174
+ color = (255, 0, 0, alpha)
175
+ draw.text((x, y), watermark_text, fill=color)
176
+
177
+ return np.array(pil_image), True, f"Watermark applied (degree={degree})"
178
+ except Exception as e:
179
+ return image, False, f"Watermark error: {str(e)}"
180
+
181
+
182
+ def apply_background(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
183
+ """
184
+ Add background patterns/textures
185
+ degree: 1 (subtle), 2 (noticeable), 3 (heavy)
186
+ """
187
+ if degree == 0:
188
+ return image, True, "No background"
189
+
190
+ try:
191
+ image = encode_to_rgb(image)
192
+ h, w = image.shape[:2]
193
+
194
+ # Create background pattern
195
+ pattern_intensity = {1: 0.1, 2: 0.2, 3: 0.35}.get(degree, 0.35)
196
+
197
+ # Generate random pattern
198
+ pattern = np.random.randint(0, 100, (h, w, 3), dtype=np.uint8)
199
+ pattern = cv2.GaussianBlur(pattern, (21, 21), 0)
200
+
201
+ # Blend with original image
202
+ result = cv2.addWeighted(image, 1.0, pattern, pattern_intensity, 0)
203
+
204
+ return result.astype(np.uint8), True, f"Background applied (intensity={pattern_intensity})"
205
+ except Exception as e:
206
+ return image, False, f"Background error: {str(e)}"
207
+
208
+
209
+ # ============================================================================
210
+ # INCONSISTENCY PERTURBATIONS
211
+ # ============================================================================
212
+
213
+ def apply_ink_holdout(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
214
+ """
215
+ Apply ink holdout (missing ink/text drop-out)
216
+ degree: 1 (few gaps), 2 (some gaps), 3 (many gaps)
217
+ """
218
+ if degree == 0:
219
+ return image, True, "No ink holdout"
220
+
221
+ try:
222
+ image = encode_to_rgb(image)
223
+ h, w = image.shape[:2]
224
+
225
+ # Create white mask to simulate missing ink
226
+ num_dropouts = {1: 3, 2: 8, 3: 15}.get(degree, 15)
227
+
228
+ result = image.copy()
229
+
230
+ for _ in range(num_dropouts):
231
+ # Random position and size
232
+ x = np.random.randint(0, w - 20)
233
+ y = np.random.randint(0, h - 20)
234
+ size = np.random.randint(10, 40)
235
+
236
+ # Create white rectangle (simulating ink dropout)
237
+ result[y:y+size, x:x+size] = [255, 255, 255]
238
+
239
+ return result, True, f"Ink holdout applied (dropouts={num_dropouts})"
240
+ except Exception as e:
241
+ return image, False, f"Ink holdout error: {str(e)}"
242
+
243
+
244
+ def apply_ink_bleeding(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
245
+ """
246
+ Apply ink bleeding effect (ink spread/bleed)
247
+ degree: 1 (mild), 2 (moderate), 3 (severe)
248
+ """
249
+ if degree == 0:
250
+ return image, True, "No ink bleeding"
251
+
252
+ try:
253
+ image = encode_to_rgb(image)
254
+
255
+ # Convert to grayscale for processing
256
+ gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
257
+
258
+ # Dilate dark regions (simulating ink spread)
259
+ kernel_sizes = {1: 3, 2: 5, 3: 7}
260
+ kernel_size = kernel_sizes.get(degree, 7)
261
+ kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (kernel_size, kernel_size))
262
+
263
+ # Dilate to spread ink
264
+ dilated = cv2.dilate(gray, kernel, iterations=degree)
265
+
266
+ # Blend back with original
267
+ result = image.copy().astype(np.float32)
268
+ result[:,:,0] = cv2.addWeighted(image[:,:,0], 0.7, dilated, 0.3, 0)
269
+ result[:,:,1] = cv2.addWeighted(image[:,:,1], 0.7, dilated, 0.3, 0)
270
+ result[:,:,2] = cv2.addWeighted(image[:,:,2], 0.7, dilated, 0.3, 0)
271
+
272
+ return np.clip(result, 0, 255).astype(np.uint8), True, f"Ink bleeding applied (degree={degree})"
273
+ except Exception as e:
274
+ return image, False, f"Ink bleeding error: {str(e)}"
275
+
276
+
277
+ def apply_illumination(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
278
+ """
279
+ Apply illumination variations (uneven lighting)
280
+ degree: 1 (subtle), 2 (moderate), 3 (severe)
281
+ """
282
+ if degree == 0:
283
+ return image, True, "No illumination"
284
+
285
+ try:
286
+ image = encode_to_rgb(image)
287
+ h, w = image.shape[:2]
288
+
289
+ # Create illumination pattern
290
+ intensity = {1: 0.15, 2: 0.3, 3: 0.5}.get(degree, 0.5)
291
+
292
+ # Create gradient-like illumination from corners
293
+ x = np.linspace(-1, 1, w)
294
+ y = np.linspace(-1, 1, h)
295
+ X, Y = np.meshgrid(x, y)
296
+
297
+ # Create vignette effect
298
+ illumination = 1 - intensity * (np.sqrt(X**2 + Y**2) / np.sqrt(2))
299
+ illumination = np.clip(illumination, 0, 1)
300
+
301
+ # Apply to each channel
302
+ result = image.astype(np.float32)
303
+ for c in range(3):
304
+ result[:,:,c] = result[:,:,c] * illumination
305
+
306
+ return np.clip(result, 0, 255).astype(np.uint8), True, f"Illumination applied (intensity={intensity})"
307
+ except Exception as e:
308
+ return image, False, f"Illumination error: {str(e)}"
309
+
310
+
311
+ # ============================================================================
312
+ # SPATIAL PERTURBATIONS
313
+ # ============================================================================
314
+
315
+ def apply_rotation(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
316
+ """
317
+ Apply rotation
318
+ degree: 1 (Β±5Β°), 2 (Β±10Β°), 3 (Β±15Β°)
319
+ """
320
+ if degree == 0:
321
+ return image, True, "No rotation"
322
+
323
+ try:
324
+ image = encode_to_rgb(image)
325
+ h, w = image.shape[:2]
326
+
327
+ # Angle ranges by degree
328
+ angle_ranges = {1: 5, 2: 10, 3: 15}
329
+ max_angle = angle_ranges.get(degree, 15)
330
+
331
+ # Random angle
332
+ angle = np.random.uniform(-max_angle, max_angle)
333
+
334
+ # Rotation matrix
335
+ center = (w // 2, h // 2)
336
+ rotation_matrix = cv2.getRotationMatrix2D(center, angle, 1.0)
337
+
338
+ # Apply rotation with white padding
339
+ rotated = cv2.warpAffine(image, rotation_matrix, (w, h), borderValue=(255, 255, 255))
340
+
341
+ return rotated, True, f"Rotation applied (angle={angle:.1f}Β°)"
342
+ except Exception as e:
343
+ return image, False, f"Rotation error: {str(e)}"
344
+
345
+
346
+ def apply_keystoning(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
347
+ """
348
+ Apply keystoning effect (perspective distortion)
349
+ degree: 1 (subtle), 2 (moderate), 3 (severe)
350
+ """
351
+ if degree == 0:
352
+ return image, True, "No keystoning"
353
+
354
+ try:
355
+ image = encode_to_rgb(image)
356
+ h, w = image.shape[:2]
357
+
358
+ # Distortion amount
359
+ distortion = {1: w * 0.05, 2: w * 0.1, 3: w * 0.15}.get(degree, w * 0.15)
360
+
361
+ # Source corners
362
+ src_points = np.float32([
363
+ [0, 0],
364
+ [w - 1, 0],
365
+ [0, h - 1],
366
+ [w - 1, h - 1]
367
+ ])
368
+
369
+ # Destination corners (with perspective distortion)
370
+ dst_points = np.float32([
371
+ [distortion, 0],
372
+ [w - 1 - distortion * 0.5, 0],
373
+ [0, h - 1],
374
+ [w - 1, h - 1]
375
+ ])
376
+
377
+ # Get perspective transform
378
+ matrix = cv2.getPerspectiveTransform(src_points, dst_points)
379
+ warped = cv2.warpPerspective(image, matrix, (w, h), borderValue=(255, 255, 255))
380
+
381
+ return warped, True, f"Keystoning applied (distortion={distortion:.1f})"
382
+ except Exception as e:
383
+ return image, False, f"Keystoning error: {str(e)}"
384
+
385
+
386
+ def apply_warping(image: np.ndarray, degree: int) -> Tuple[np.ndarray, bool, str]:
387
+ """
388
+ Apply elastic/elastic deformation
389
+ degree: 1 (mild), 2 (moderate), 3 (severe)
390
+ """
391
+ if degree == 0:
392
+ return image, True, "No warping"
393
+
394
+ try:
395
+ image = encode_to_rgb(image)
396
+ h, w = image.shape[:2]
397
+
398
+ # Warping parameters
399
+ alpha_values = {1: 15, 2: 30, 3: 60}
400
+ sigma_values = {1: 3, 2: 5, 3: 8}
401
+ alpha = alpha_values.get(degree, 60)
402
+ sigma = sigma_values.get(degree, 8)
403
+
404
+ # Generate random displacement field
405
+ dx = np.random.randn(h, w) * sigma
406
+ dy = np.random.randn(h, w) * sigma
407
+
408
+ # Smooth displacement field
409
+ dx = gaussian_filter(dx, sigma=sigma) * alpha
410
+ dy = gaussian_filter(dy, sigma=sigma) * alpha
411
+
412
+ # Create coordinate grids
413
+ x, y = np.meshgrid(np.arange(w), np.arange(h))
414
+
415
+ # Apply displacement
416
+ x_warped = np.clip(x + dx, 0, w - 1).astype(np.float32)
417
+ y_warped = np.clip(y + dy, 0, h - 1).astype(np.float32)
418
+
419
+ # Remap image
420
+ warped = cv2.remap(image, x_warped, y_warped, cv2.INTER_LINEAR, borderValue=(255, 255, 255))
421
+
422
+ return warped, True, f"Warping applied (alpha={alpha}, sigma={sigma})"
423
+ except Exception as e:
424
+ return image, False, f"Warping error: {str(e)}"
425
+
426
+
427
+ # ============================================================================
428
+ # Main Perturbation Application
429
+ # ============================================================================
430
+
431
+ PERTURBATION_FUNCTIONS = {
432
+ # Blur
433
+ "defocus": apply_defocus,
434
+ "vibration": apply_vibration,
435
+ # Noise
436
+ "speckle": apply_speckle,
437
+ "texture": apply_texture,
438
+ # Content
439
+ "watermark": apply_watermark,
440
+ "background": apply_background,
441
+ # Inconsistency
442
+ "ink_holdout": apply_ink_holdout,
443
+ "ink_bleeding": apply_ink_bleeding,
444
+ "illumination": apply_illumination,
445
+ # Spatial
446
+ "rotation": apply_rotation,
447
+ "keystoning": apply_keystoning,
448
+ "warping": apply_warping,
449
+ }
450
+
451
+
452
+ def apply_perturbation(
453
+ image: np.ndarray,
454
+ perturbation_type: str,
455
+ degree: int = 1
456
+ ) -> Tuple[np.ndarray, bool, str]:
457
+ """
458
+ Apply a single perturbation to an image
459
+
460
+ Args:
461
+ image: Input image as numpy array (BGR or RGB)
462
+ perturbation_type: Type of perturbation (see PERTURBATION_FUNCTIONS)
463
+ degree: Severity level (1=mild, 2=moderate, 3=severe)
464
+
465
+ Returns:
466
+ Tuple of (result_image, success, message)
467
+ """
468
+ if perturbation_type not in PERTURBATION_FUNCTIONS:
469
+ return image, False, f"Unknown perturbation type: {perturbation_type}"
470
+
471
+ if degree < 0 or degree > 3:
472
+ return image, False, f"Invalid degree: {degree} (must be 0-3)"
473
+
474
+ func = PERTURBATION_FUNCTIONS[perturbation_type]
475
+ return func(image, degree)
476
+
477
+
478
+ def apply_multiple_perturbations(
479
+ image: np.ndarray,
480
+ perturbations: List[Tuple[str, int]]
481
+ ) -> Tuple[np.ndarray, bool, str]:
482
+ """
483
+ Apply multiple perturbations in sequence
484
+
485
+ Args:
486
+ image: Input image
487
+ perturbations: List of (type, degree) tuples
488
+
489
+ Returns:
490
+ Tuple of (result_image, success, message)
491
+ """
492
+ result = image.copy()
493
+ messages = []
494
+
495
+ for ptype, degree in perturbations:
496
+ result, success, msg = apply_perturbation(result, ptype, degree)
497
+ messages.append(msg)
498
+ if not success:
499
+ return image, False, f"Failed: {msg}"
500
+
501
+ return result, True, " | ".join(messages)
502
+
503
+
504
+ def get_perturbation_info() -> Dict:
505
+ """Get information about all available perturbations"""
506
+ return {
507
+ "total_perturbations": len(PERTURBATION_FUNCTIONS),
508
+ "types": list(PERTURBATION_FUNCTIONS.keys()),
509
+ "categories": {
510
+ "blur": ["defocus", "vibration"],
511
+ "noise": ["speckle", "texture"],
512
+ "content": ["watermark", "background"],
513
+ "inconsistency": ["ink_holdout", "ink_bleeding", "illumination"],
514
+ "spatial": ["rotation", "keystoning", "warping"]
515
+ }
516
+ }
deployment/backend/register_dino.py ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Register DINO detector with MMDET if not already registered
3
+ This allows loading RoDLA models without requiring DCNv3 compilation
4
+ """
5
+
6
+ import sys
7
+ from pathlib import Path
8
+
9
+ def register_dino():
10
+ """Register DINO with MMDET model registry"""
11
+ try:
12
+ from mmdet.models.builder import DETECTORS, BACKBONES, NECKS, HEADS
13
+
14
+ # Check if already registered
15
+ if 'DINO' in DETECTORS.module_dict:
16
+ print("βœ… DINO already registered in MMDET")
17
+ return True
18
+
19
+ print("⏳ Registering DINO detector...")
20
+
21
+ # Try to import and register custom models
22
+ # Use absolute path from /home/admin/CV/rodla-academic
23
+ REPO_ROOT = Path("/home/admin/CV/rodla-academic")
24
+ sys.path.insert(0, str(REPO_ROOT / "model"))
25
+ sys.path.insert(0, str(REPO_ROOT / "model" / "ops_dcnv3"))
26
+
27
+ try:
28
+ import mmdet_custom
29
+ if 'DINO' in DETECTORS.module_dict:
30
+ print("βœ… DINO registered successfully from mmdet_custom")
31
+ return True
32
+ else:
33
+ print("⚠️ DINO not found in mmdet_custom registry")
34
+ return False
35
+ except ModuleNotFoundError as e:
36
+ if "DCNv3" in str(e):
37
+ print(f"⚠️ Cannot register DINO: DCNv3 module not available")
38
+ print(f" Error: {e}")
39
+ return False
40
+ else:
41
+ print(f"❌ Error importing mmdet_custom: {e}")
42
+ return False
43
+
44
+ except Exception as e:
45
+ print(f"❌ Error registering DINO: {e}")
46
+ return False
47
+
48
+
49
+ def try_load_with_dino_registration(config_path: str, checkpoint_path: str, device: str = "cpu"):
50
+ """Try to load a DINO model, registering it if necessary"""
51
+ from mmdet.apis import init_detector
52
+
53
+ # Try registering DINO first
54
+ dino_registered = register_dino()
55
+
56
+ if not dino_registered:
57
+ print("⚠️ DINO could not be registered")
58
+ print(" Will attempt to load anyway...")
59
+
60
+ # Try to load the model
61
+ try:
62
+ print(f"⏳ Loading model from {checkpoint_path}...")
63
+ model = init_detector(config_path, checkpoint_path, device=device)
64
+ print("βœ… Model loaded successfully!")
65
+ return model
66
+ except Exception as e:
67
+ print(f"❌ Failed to load model: {e}")
68
+ return None
frontend/index.html CHANGED
@@ -106,12 +106,18 @@
106
 
107
  <!-- Action Buttons -->
108
  <section class="section button-section">
109
- <button id="analyzeBtn" class="btn btn-primary" disabled>
110
  [ANALYZE DOCUMENT]
111
  </button>
112
  <button id="resetBtn" class="btn btn-secondary">
113
  [CLEAR ALL]
114
  </button>
 
 
 
 
 
 
115
  </section>
116
 
117
  <!-- Status Section -->
 
106
 
107
  <!-- Action Buttons -->
108
  <section class="section button-section">
109
+ <button id="analyzeBtn" class="btn btn-primary" disabled title="(1) Upload image, (2) Make sure STANDARD mode is selected">
110
  [ANALYZE DOCUMENT]
111
  </button>
112
  <button id="resetBtn" class="btn btn-secondary">
113
  [CLEAR ALL]
114
  </button>
115
+ <p id="modeHint" class="mode-hint" style="display: none; color: #00FF00; margin-top: 10px; font-size: 12px;">
116
+ >>> Use [GENERATE PERTURBATIONS] button above to analyze with perturbations
117
+ </p>
118
+ <p id="standardModeHint" class="mode-hint" style="color: #00FF00; margin-top: 5px; font-size: 12px;">
119
+ >>> STANDARD MODE: Upload an image and click [ANALYZE DOCUMENT] to detect layout
120
+ </p>
121
  </section>
122
 
123
  <!-- Status Section -->
frontend/script.js CHANGED
@@ -56,12 +56,30 @@ function setupEventListeners() {
56
  btn.classList.add('active');
57
  currentMode = btn.dataset.mode;
58
 
59
- // Toggle perturbation options
60
  const pertOptions = document.getElementById('perturbationOptions');
 
 
 
 
61
  if (currentMode === 'perturbation') {
 
62
  pertOptions.style.display = 'block';
 
 
 
 
 
 
63
  } else {
 
64
  pertOptions.style.display = 'none';
 
 
 
 
 
 
65
  }
66
  });
67
  });
@@ -98,7 +116,12 @@ function handleFileSelect(file) {
98
 
99
  currentFile = file;
100
  showPreview(file);
101
- document.getElementById('analyzeBtn').disabled = false;
 
 
 
 
 
102
  }
103
 
104
  function showPreview(file) {
@@ -121,39 +144,6 @@ function showPreview(file) {
121
  // ANALYSIS
122
  // ============================================
123
 
124
- async function handleAnalysis() {
125
- if (!currentFile) {
126
- showError('Please select an image first.');
127
- return;
128
- }
129
-
130
- const analysisType = currentMode === 'standard' ? 'Standard Detection' : 'Perturbation Analysis';
131
- updateStatus(`> INITIATING ${analysisType.toUpperCase()}...`);
132
- showStatus();
133
- hideError();
134
-
135
- try {
136
- const startTime = Date.now();
137
- const results = await runAnalysis();
138
- const processingTime = Date.now() - startTime;
139
-
140
- lastResults = {
141
- ...results,
142
- processingTime: processingTime,
143
- timestamp: new Date().toISOString(),
144
- mode: currentMode,
145
- fileName: currentFile.name
146
- };
147
-
148
- displayResults(results, processingTime);
149
- hideStatus();
150
- } catch (error) {
151
- console.error('[ERROR]', error);
152
- showError(`Analysis failed: ${error.message}`);
153
- hideStatus();
154
- }
155
- }
156
-
157
  async function handleAnalysis() {
158
  if (!currentFile) {
159
  showError('Please select an image first.');
@@ -178,8 +168,12 @@ async function handleAnalysis() {
178
 
179
  const processingTime = Date.now() - startTime;
180
 
 
 
 
181
  lastResults = {
182
  ...results,
 
183
  processingTime: processingTime,
184
  timestamp: new Date().toISOString(),
185
  mode: currentMode,
@@ -202,36 +196,72 @@ async function runAnalysis() {
202
  const threshold = parseFloat(document.getElementById('confidenceThreshold').value);
203
  formData.append('score_threshold', threshold);
204
 
205
- if (currentMode === 'perturbation') {
206
- // Get selected perturbation types
207
- const perturbationTypes = [];
208
- document.querySelectorAll('.checkbox-label input[type="checkbox"]:checked').forEach(checkbox => {
209
- perturbationTypes.push(checkbox.value);
210
- });
 
 
 
 
211
 
212
- if (perturbationTypes.length === 0) {
213
- throw new Error('Please select at least one perturbation type.');
214
- }
 
 
215
 
216
- formData.append('perturbation_types', perturbationTypes.join(','));
 
217
 
218
- updateStatus('> APPLYING PERTURBATIONS...');
219
- return await fetch(`${API_BASE_URL}/detect-with-perturbation`, {
220
- method: 'POST',
221
- body: formData
222
- }).then(r => {
223
- if (!r.ok) throw new Error(`API Error: ${r.status}`);
224
- return r.json();
225
- });
226
- } else {
227
- updateStatus('> RUNNING STANDARD DETECTION...');
228
- return await fetch(`${API_BASE_URL}/detect`, {
 
 
 
 
 
 
229
  method: 'POST',
230
  body: formData
231
- }).then(r => {
232
- if (!r.ok) throw new Error(`API Error: ${r.status}`);
233
- return r.json();
234
  });
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
235
  }
236
  }
237
 
@@ -291,16 +321,27 @@ function displayPerturbations(results) {
291
  }
292
 
293
  let html = `<div style="font-size: 0.9em; color: #00FFFF; margin-bottom: 15px; padding: 10px; border: 1px dashed #00FFFF;">
294
- TOTAL: 12 Perturbation Types Γ— 3 Degree Levels (1=Mild, 2=Moderate, 3=Severe)
295
  </div>`;
296
 
 
 
 
297
  // Add original
 
 
 
 
 
298
  html += `
299
  <div class="perturbation-grid-section">
300
  <div class="perturbation-type-label">[ORIGINAL IMAGE]</div>
301
  <div style="padding: 10px;">
302
  <img src="data:image/png;base64,${results.perturbations.original.original}"
303
- alt="Original" class="perturbation-preview-image" style="width: 200px; height: auto;">
 
 
 
304
  </div>
305
  </div>
306
  `;
@@ -337,13 +378,24 @@ function displayPerturbations(results) {
337
  const degreeLabel = ['MILD', 'MODERATE', 'SEVERE'][degree - 1];
338
 
339
  if (results.perturbations[ptype][degreeKey]) {
 
 
 
 
 
 
340
  html += `
341
  <div style="text-align: center;">
342
  <div style="color: #00FFFF; font-size: 0.8em; margin-bottom: 5px;">DEG ${degree}: ${degreeLabel}</div>
343
  <img src="data:image/png;base64,${results.perturbations[ptype][degreeKey]}"
344
  alt="${ptype} degree ${degree}"
345
  class="perturbation-preview-image"
346
- style="width: 150px; height: auto; border: 1px solid #008080; padding: 2px;">
 
 
 
 
 
347
  </div>
348
  `;
349
  }
@@ -357,6 +409,33 @@ function displayPerturbations(results) {
357
  });
358
 
359
  container.innerHTML = html;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
360
  section.style.display = 'block';
361
  section.scrollIntoView({ behavior: 'smooth' });
362
  }
@@ -376,11 +455,17 @@ function displayResults(results, processingTime) {
376
 
377
  document.getElementById('detectionCount').textContent = detections.length;
378
  document.getElementById('avgConfidence').textContent = `${avgConfidence}%`;
379
- document.getElementById('processingTime').textContent = `${processingTime}ms`;
380
 
381
- // Display image
382
- if (results.annotated_image) {
383
- document.getElementById('resultImage').src = `data:image/png;base64,${results.annotated_image}`;
 
 
 
 
 
 
384
  }
385
 
386
  // Class distribution
@@ -390,13 +475,114 @@ function displayResults(results, processingTime) {
390
  displayDetectionsTable(detections);
391
 
392
  // Metrics
393
- displayMetrics(results.metrics || {});
394
 
395
  // Show results section
396
  document.getElementById('resultsSection').style.display = 'block';
397
  document.getElementById('resultsSection').scrollIntoView({ behavior: 'smooth' });
398
  }
399
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
400
  function displayClassDistribution(distribution) {
401
  const chart = document.getElementById('classChart');
402
 
@@ -429,30 +615,44 @@ function displayDetectionsTable(detections) {
429
  const tbody = document.getElementById('detectionsTableBody');
430
 
431
  if (detections.length === 0) {
432
- tbody.innerHTML = '<tr><td colspan="4" class="no-data">NO DETECTIONS</td></tr>';
433
  return;
434
  }
435
 
436
  let html = '';
437
  detections.slice(0, 50).forEach((det, idx) => {
438
- const box = det.box || {};
439
- const x1 = box.x1 ? box.x1.toFixed(0) : '?';
440
- const y1 = box.y1 ? box.y1.toFixed(0) : '?';
441
- const x2 = box.x2 ? box.x2.toFixed(0) : '?';
442
- const y2 = box.y2 ? box.y2.toFixed(0) : '?';
 
 
 
 
 
 
 
 
 
 
 
 
 
 
443
 
444
  html += `
445
  <tr>
446
  <td>${idx + 1}</td>
447
- <td>${det.class || 'Unknown'}</td>
448
- <td>${(det.confidence * 100).toFixed(1)}%</td>
449
- <td>[${x1},${y1},${x2},${y2}]</td>
450
  </tr>
451
  `;
452
  });
453
 
454
  if (detections.length > 50) {
455
- html += `<tr><td colspan="4" class="no-data">... and ${detections.length - 50} more</td></tr>`;
456
  }
457
 
458
  tbody.innerHTML = html;
@@ -658,5 +858,76 @@ async function checkBackendStatus() {
658
  // UTILITY FUNCTIONS
659
  // ============================================
660
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
661
  console.log('[RODLA] Frontend loaded successfully. Ready for analysis.');
662
  console.log('[RODLA] Demo mode available if backend is unavailable.');
 
56
  btn.classList.add('active');
57
  currentMode = btn.dataset.mode;
58
 
59
+ // Toggle perturbation options and hint
60
  const pertOptions = document.getElementById('perturbationOptions');
61
+ const modeHint = document.getElementById('modeHint');
62
+ const standardModeHint = document.getElementById('standardModeHint');
63
+ const analyzeBtn = document.getElementById('analyzeBtn');
64
+
65
  if (currentMode === 'perturbation') {
66
+ // PERTURBATION MODE - allow analysis of original or perturbation images
67
  pertOptions.style.display = 'block';
68
+ modeHint.style.display = 'block';
69
+ standardModeHint.style.display = 'none';
70
+ analyzeBtn.style.opacity = currentFile ? '1' : '0.5';
71
+ analyzeBtn.style.cursor = currentFile ? 'pointer' : 'not-allowed';
72
+ analyzeBtn.disabled = !currentFile;
73
+ analyzeBtn.title = 'Click to generate perturbations, then click on any image to analyze it';
74
  } else {
75
+ // STANDARD MODE
76
  pertOptions.style.display = 'none';
77
+ modeHint.style.display = 'none';
78
+ standardModeHint.style.display = 'block';
79
+ analyzeBtn.style.opacity = currentFile ? '1' : '0.5';
80
+ analyzeBtn.style.cursor = currentFile ? 'pointer' : 'not-allowed';
81
+ analyzeBtn.disabled = !currentFile;
82
+ analyzeBtn.title = 'Click to analyze the document layout';
83
  }
84
  });
85
  });
 
116
 
117
  currentFile = file;
118
  showPreview(file);
119
+
120
+ // Enable analyze button only if in standard mode
121
+ const analyzeBtn = document.getElementById('analyzeBtn');
122
+ if (currentMode === 'standard') {
123
+ analyzeBtn.disabled = false;
124
+ }
125
  }
126
 
127
  function showPreview(file) {
 
144
  // ANALYSIS
145
  // ============================================
146
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
147
  async function handleAnalysis() {
148
  if (!currentFile) {
149
  showError('Please select an image first.');
 
168
 
169
  const processingTime = Date.now() - startTime;
170
 
171
+ // Read original image as base64 for annotation
172
+ const originalImageBase64 = await readFileAsBase64(currentFile);
173
+
174
  lastResults = {
175
  ...results,
176
+ original_image: originalImageBase64,
177
  processingTime: processingTime,
178
  timestamp: new Date().toISOString(),
179
  mode: currentMode,
 
196
  const threshold = parseFloat(document.getElementById('confidenceThreshold').value);
197
  formData.append('score_threshold', threshold);
198
 
199
+ // Only standard detection mode
200
+ updateStatus('> RUNNING STANDARD DETECTION...');
201
+ return await fetch(`${API_BASE_URL}/detect`, {
202
+ method: 'POST',
203
+ body: formData
204
+ }).then(r => {
205
+ if (!r.ok) throw new Error(`API Error: ${r.status}`);
206
+ return r.json();
207
+ });
208
+ }
209
 
210
+ async function analyzePerturbationImage(imageBase64, perturbationType, degree) {
211
+ // Analyze a specific perturbation image
212
+ updateStatus(`> ANALYZING ${perturbationType.toUpperCase()} (DEGREE ${degree})...`);
213
+ showStatus();
214
+ hideError();
215
 
216
+ try {
217
+ const startTime = Date.now();
218
 
219
+ // Convert base64 to blob and create file
220
+ const binaryString = atob(imageBase64);
221
+ const bytes = new Uint8Array(binaryString.length);
222
+ for (let i = 0; i < binaryString.length; i++) {
223
+ bytes[i] = binaryString.charCodeAt(i);
224
+ }
225
+ const blob = new Blob([bytes], { type: 'image/png' });
226
+ const file = new File([blob], `${perturbationType}_degree_${degree}.png`, { type: 'image/png' });
227
+
228
+ // Create form data
229
+ const formData = new FormData();
230
+ formData.append('file', file);
231
+ const threshold = parseFloat(document.getElementById('confidenceThreshold').value);
232
+ formData.append('score_threshold', threshold);
233
+
234
+ // Send to backend
235
+ const response = await fetch(`${API_BASE_URL}/detect`, {
236
  method: 'POST',
237
  body: formData
 
 
 
238
  });
239
+
240
+ if (!response.ok) {
241
+ throw new Error(`API Error: ${response.status}`);
242
+ }
243
+
244
+ const results = await response.json();
245
+ const processingTime = Date.now() - startTime;
246
+
247
+ // Store results with perturbation info
248
+ lastResults = {
249
+ ...results,
250
+ original_image: imageBase64,
251
+ processingTime: processingTime,
252
+ timestamp: new Date().toISOString(),
253
+ mode: 'perturbation',
254
+ perturbation_type: perturbationType,
255
+ perturbation_degree: degree,
256
+ fileName: `${perturbationType}_degree_${degree}.png`
257
+ };
258
+
259
+ displayResults(results, processingTime);
260
+ hideStatus();
261
+ } catch (error) {
262
+ console.error('[ERROR]', error);
263
+ showError(`Perturbation analysis failed: ${error.message}`);
264
+ hideStatus();
265
  }
266
  }
267
 
 
321
  }
322
 
323
  let html = `<div style="font-size: 0.9em; color: #00FFFF; margin-bottom: 15px; padding: 10px; border: 1px dashed #00FFFF;">
324
+ TOTAL: 12 Perturbation Types Γ— 3 Degree Levels (1=Mild, 2=Moderate, 3=Severe) - CLICK ON ANY IMAGE TO ANALYZE
325
  </div>`;
326
 
327
+ // Store all perturbation images for clickable analysis
328
+ const perturbationImages = [];
329
+
330
  // Add original
331
+ perturbationImages.push({
332
+ name: 'original',
333
+ image: results.perturbations.original.original
334
+ });
335
+
336
  html += `
337
  <div class="perturbation-grid-section">
338
  <div class="perturbation-type-label">[ORIGINAL IMAGE]</div>
339
  <div style="padding: 10px;">
340
  <img src="data:image/png;base64,${results.perturbations.original.original}"
341
+ alt="Original" class="perturbation-preview-image"
342
+ data-perturbation="original" data-degree="0"
343
+ style="width: 200px; height: auto; cursor: pointer; border: 2px solid transparent; transition: all 0.2s;"
344
+ title="Click to analyze this image">
345
  </div>
346
  </div>
347
  `;
 
378
  const degreeLabel = ['MILD', 'MODERATE', 'SEVERE'][degree - 1];
379
 
380
  if (results.perturbations[ptype][degreeKey]) {
381
+ perturbationImages.push({
382
+ name: ptype,
383
+ degree: degree,
384
+ image: results.perturbations[ptype][degreeKey]
385
+ });
386
+
387
  html += `
388
  <div style="text-align: center;">
389
  <div style="color: #00FFFF; font-size: 0.8em; margin-bottom: 5px;">DEG ${degree}: ${degreeLabel}</div>
390
  <img src="data:image/png;base64,${results.perturbations[ptype][degreeKey]}"
391
  alt="${ptype} degree ${degree}"
392
  class="perturbation-preview-image"
393
+ data-perturbation="${ptype}"
394
+ data-degree="${degree}"
395
+ style="width: 150px; height: auto; border: 2px solid #008080; padding: 2px; cursor: pointer; transition: all 0.2s;"
396
+ title="Click to analyze this perturbation"
397
+ onmouseover="this.style.borderColor='#00FF00'; this.style.boxShadow='0 0 10px #00FF00';"
398
+ onmouseout="this.style.borderColor='#008080'; this.style.boxShadow='none';">
399
  </div>
400
  `;
401
  }
 
409
  });
410
 
411
  container.innerHTML = html;
412
+
413
+ // Add click handlers to perturbation images
414
+ const perturbationImgs = container.querySelectorAll('[data-perturbation]');
415
+ perturbationImgs.forEach(img => {
416
+ img.addEventListener('click', async function() {
417
+ const perturbationType = this.dataset.perturbation;
418
+ const degree = this.dataset.degree;
419
+
420
+ // Find the image data
421
+ let imageBase64 = null;
422
+ if (perturbationType === 'original') {
423
+ imageBase64 = results.perturbations.original.original;
424
+ } else {
425
+ const degreeKey = `degree_${degree}`;
426
+ imageBase64 = results.perturbations[perturbationType][degreeKey];
427
+ }
428
+
429
+ if (!imageBase64) {
430
+ showError('Failed to load image for analysis');
431
+ return;
432
+ }
433
+
434
+ // Convert base64 to File object and analyze
435
+ await analyzePerturbationImage(imageBase64, perturbationType, degree);
436
+ });
437
+ });
438
+
439
  section.style.display = 'block';
440
  section.scrollIntoView({ behavior: 'smooth' });
441
  }
 
455
 
456
  document.getElementById('detectionCount').textContent = detections.length;
457
  document.getElementById('avgConfidence').textContent = `${avgConfidence}%`;
458
+ document.getElementById('processingTime').textContent = `${processingTime.toFixed(0)}ms`;
459
 
460
+ // Draw annotated image with bounding boxes
461
+ if (lastResults && lastResults.original_image) {
462
+ drawAnnotatedImage(lastResults.original_image, detections, results.image_width, results.image_height);
463
+ } else {
464
+ // Fallback: try to use previewImage
465
+ const previewImg = document.getElementById('previewImage');
466
+ if (previewImg && previewImg.src) {
467
+ drawAnnotatedImageFromSrc(previewImg.src, detections, results.image_width, results.image_height);
468
+ }
469
  }
470
 
471
  // Class distribution
 
475
  displayDetectionsTable(detections);
476
 
477
  // Metrics
478
+ displayMetrics(results, processingTime);
479
 
480
  // Show results section
481
  document.getElementById('resultsSection').style.display = 'block';
482
  document.getElementById('resultsSection').scrollIntoView({ behavior: 'smooth' });
483
  }
484
 
485
+ function drawAnnotatedImage(imageBase64, detections, imgWidth, imgHeight) {
486
+ // Draw bounding boxes on image and display
487
+ const canvas = document.createElement('canvas');
488
+ const ctx = canvas.getContext('2d');
489
+
490
+ // Load image
491
+ const img = new Image();
492
+ img.onload = () => {
493
+ canvas.width = img.width;
494
+ canvas.height = img.height;
495
+ ctx.drawImage(img, 0, 0);
496
+
497
+ // Draw bounding boxes
498
+ detections.forEach((det, idx) => {
499
+ const bbox = det.bbox || {};
500
+
501
+ // Convert normalized coordinates to pixel coordinates
502
+ const x = bbox.x * img.width;
503
+ const y = bbox.y * img.height;
504
+ const w = bbox.width * img.width;
505
+ const h = bbox.height * img.height;
506
+
507
+ // Draw box
508
+ ctx.strokeStyle = '#00FF00';
509
+ ctx.lineWidth = 2;
510
+ ctx.strokeRect(x, y, w, h);
511
+
512
+ // Draw label
513
+ const label = `${det.class_name || 'Unknown'} (${(det.confidence * 100).toFixed(1)}%)`;
514
+ const fontSize = Math.max(12, Math.min(18, Math.floor(img.height / 30)));
515
+ ctx.font = `bold ${fontSize}px monospace`;
516
+ ctx.fillStyle = '#000000';
517
+ ctx.fillRect(x, y - fontSize - 5, ctx.measureText(label).width + 10, fontSize + 5);
518
+ ctx.fillStyle = '#00FF00';
519
+ ctx.fillText(label, x + 5, y - 5);
520
+ });
521
+
522
+ // Display canvas as image
523
+ const resultImage = document.getElementById('resultImage');
524
+ resultImage.src = canvas.toDataURL('image/png');
525
+ resultImage.style.display = 'block';
526
+ };
527
+
528
+ img.src = `data:image/png;base64,${imageBase64}`;
529
+ }
530
+
531
+ function drawAnnotatedImageFromSrc(imageSrc, detections, imgWidth, imgHeight) {
532
+ // Draw bounding boxes on image from data URL
533
+ const canvas = document.createElement('canvas');
534
+ const ctx = canvas.getContext('2d');
535
+
536
+ const img = new Image();
537
+ img.onload = () => {
538
+ canvas.width = img.width;
539
+ canvas.height = img.height;
540
+ ctx.drawImage(img, 0, 0);
541
+
542
+ // Draw bounding boxes with colors based on class
543
+ const colors = ['#00FF00', '#00FFFF', '#FF00FF', '#FFFF00', '#FF6600', '#00FF99'];
544
+
545
+ detections.forEach((det, idx) => {
546
+ const bbox = det.bbox || {};
547
+
548
+ // Convert normalized coordinates to pixel coordinates
549
+ const x = bbox.x * img.width;
550
+ const y = bbox.y * img.height;
551
+ const w = bbox.width * img.width;
552
+ const h = bbox.height * img.height;
553
+
554
+ // Select color
555
+ const color = colors[idx % colors.length];
556
+
557
+ // Draw box
558
+ ctx.strokeStyle = color;
559
+ ctx.lineWidth = 2;
560
+ ctx.strokeRect(x, y, w, h);
561
+
562
+ // Draw label background
563
+ const label = `${idx + 1}. ${det.class_name || 'Unknown'} (${(det.confidence * 100).toFixed(1)}%)`;
564
+ const fontSize = 14;
565
+ ctx.font = `bold ${fontSize}px monospace`;
566
+ const textWidth = ctx.measureText(label).width;
567
+
568
+ ctx.fillStyle = 'rgba(0, 0, 0, 0.7)';
569
+ ctx.fillRect(x, y - fontSize - 8, textWidth + 8, fontSize + 6);
570
+ ctx.fillStyle = color;
571
+ ctx.fillText(label, x + 4, y - 4);
572
+ });
573
+
574
+ // Display canvas as image
575
+ const resultImage = document.getElementById('resultImage');
576
+ resultImage.src = canvas.toDataURL('image/png');
577
+ resultImage.style.display = 'block';
578
+ resultImage.style.maxWidth = '100%';
579
+ resultImage.style.height = 'auto';
580
+ resultImage.style.border = '2px solid #00FF00';
581
+ };
582
+
583
+ img.src = imageSrc;
584
+ }
585
+
586
  function displayClassDistribution(distribution) {
587
  const chart = document.getElementById('classChart');
588
 
 
615
  const tbody = document.getElementById('detectionsTableBody');
616
 
617
  if (detections.length === 0) {
618
+ tbody.innerHTML = '<tr><td colspan="5" class="no-data">NO DETECTIONS</td></tr>';
619
  return;
620
  }
621
 
622
  let html = '';
623
  detections.slice(0, 50).forEach((det, idx) => {
624
+ // Handle different bbox formats
625
+ const bbox = det.bbox || det.box || {};
626
+
627
+ // Convert normalized coordinates to pixel coordinates
628
+ let x = '?', y = '?', w = '?', h = '?';
629
+ if (bbox.x !== undefined && bbox.y !== undefined && bbox.width !== undefined && bbox.height !== undefined) {
630
+ x = bbox.x.toFixed(3);
631
+ y = bbox.y.toFixed(3);
632
+ w = bbox.width.toFixed(3);
633
+ h = bbox.height.toFixed(3);
634
+ } else if (bbox.x1 !== undefined && bbox.y1 !== undefined && bbox.x2 !== undefined && bbox.y2 !== undefined) {
635
+ x = bbox.x1.toFixed(0);
636
+ y = bbox.y1.toFixed(0);
637
+ w = (bbox.x2 - bbox.x1).toFixed(0);
638
+ h = (bbox.y2 - bbox.y1).toFixed(0);
639
+ }
640
+
641
+ const className = det.class_name || det.class || 'Unknown';
642
+ const confidence = det.confidence ? (det.confidence * 100).toFixed(1) : '0.0';
643
 
644
  html += `
645
  <tr>
646
  <td>${idx + 1}</td>
647
+ <td>${className}</td>
648
+ <td>${confidence}%</td>
649
+ <td title="x: ${x}, y: ${y}, w: ${w}, h: ${h}">[${x.substring(0,5)}, ${y.substring(0,5)}, ${w.substring(0,5)}, ${h.substring(0,5)}]</td>
650
  </tr>
651
  `;
652
  });
653
 
654
  if (detections.length > 50) {
655
+ html += `<tr><td colspan="5" class="no-data">... and ${detections.length - 50} more</td></tr>`;
656
  }
657
 
658
  tbody.innerHTML = html;
 
858
  // UTILITY FUNCTIONS
859
  // ============================================
860
 
861
+ function readFileAsBase64(file) {
862
+ return new Promise((resolve, reject) => {
863
+ const reader = new FileReader();
864
+ reader.onload = () => {
865
+ const result = reader.result;
866
+ // Extract base64 data without the data:image/png;base64, prefix
867
+ const base64 = result.split(',')[1];
868
+ resolve(base64);
869
+ };
870
+ reader.onerror = reject;
871
+ reader.readAsDataURL(file);
872
+ });
873
+ }
874
+
875
+ function displayMetrics(results, processingTime) {
876
+ const metricsDiv = document.getElementById('metricsBox');
877
+ if (!metricsDiv) return;
878
+
879
+ const detections = results.detections || [];
880
+ const confidences = detections.map(d => d.confidence || 0);
881
+ const avgConfidence = confidences.length > 0
882
+ ? (confidences.reduce((a, b) => a + b) / confidences.length * 100).toFixed(1)
883
+ : 0;
884
+ const maxConfidence = confidences.length > 0
885
+ ? (Math.max(...confidences) * 100).toFixed(1)
886
+ : 0;
887
+ const minConfidence = confidences.length > 0
888
+ ? (Math.min(...confidences) * 100).toFixed(1)
889
+ : 0;
890
+
891
+ // Determine detection mode
892
+ let detectionMode = 'HEURISTIC (CPU Fallback)';
893
+ let modelType = 'Heuristic Layout Detection';
894
+
895
+ if (results.detection_mode === 'mmdet') {
896
+ detectionMode = 'MMDET Neural Network';
897
+ modelType = 'DINO (InternImage-XL)';
898
+ }
899
+
900
+ const metricsHTML = `
901
+ <div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); gap: 12px;">
902
+ <div style="background: #1a1a1a; border: 2px solid #00FF00; border-radius: 4px; padding: 12px;">
903
+ <div style="color: #00FFFF; font-size: 12px; font-weight: bold;">DETECTION MODE</div>
904
+ <div style="color: #00FF00; font-size: 14px; margin-top: 4px;">${detectionMode}</div>
905
+ </div>
906
+ <div style="background: #1a1a1a; border: 2px solid #00FF00; border-radius: 4px; padding: 12px;">
907
+ <div style="color: #00FFFF; font-size: 12px; font-weight: bold;">MODEL TYPE</div>
908
+ <div style="color: #00FF00; font-size: 14px; margin-top: 4px;">${modelType}</div>
909
+ </div>
910
+ <div style="background: #1a1a1a; border: 2px solid #00FF00; border-radius: 4px; padding: 12px;">
911
+ <div style="color: #00FFFF; font-size: 12px; font-weight: bold;">PROCESSING TIME</div>
912
+ <div style="color: #00FF00; font-size: 14px; margin-top: 4px;">${processingTime.toFixed(0)}ms</div>
913
+ </div>
914
+ <div style="background: #1a1a1a; border: 2px solid #00FF00; border-radius: 4px; padding: 12px;">
915
+ <div style="color: #00FFFF; font-size: 12px; font-weight: bold;">AVG CONFIDENCE</div>
916
+ <div style="color: #00FF00; font-size: 14px; margin-top: 4px;">${avgConfidence}%</div>
917
+ </div>
918
+ <div style="background: #1a1a1a; border: 2px solid #00FF00; border-radius: 4px; padding: 12px;">
919
+ <div style="color: #00FFFF; font-size: 12px; font-weight: bold;">MAX CONFIDENCE</div>
920
+ <div style="color: #00FF00; font-size: 14px; margin-top: 4px;">${maxConfidence}%</div>
921
+ </div>
922
+ <div style="background: #1a1a1a; border: 2px solid #00FF00; border-radius: 4px; padding: 12px;">
923
+ <div style="color: #00FFFF; font-size: 12px; font-weight: bold;">MIN CONFIDENCE</div>
924
+ <div style="color: #00FF00; font-size: 14px; margin-top: 4px;">${minConfidence}%</div>
925
+ </div>
926
+ </div>
927
+ `;
928
+
929
+ metricsDiv.innerHTML = metricsHTML;
930
+ }
931
+
932
  console.log('[RODLA] Frontend loaded successfully. Ready for analysis.');
933
  console.log('[RODLA] Demo mode available if backend is unavailable.');
setup.sh ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # Exit immediately if a command exits with a non-zero status
4
+ set -e
5
+
6
+ # --- Configuration ---
7
+ ENV_NAME="RoDLA"
8
+ ENV_PATH="./$ENV_NAME"
9
+
10
+ # URLs for PyTorch/Detectron2 wheels
11
+ TORCH_VERSION="1.11.0+cu113"
12
+ TORCH_URL="https://download.pytorch.org/whl/cu113/torch_stable.html"
13
+
14
+ DETECTRON2_VERSION="cu113/torch1.11"
15
+ DETECTRON2_URL="https://dl.fbaipublicfiles.com/detectron2/wheels/$DETECTRON2_VERSION/index.html"
16
+
17
+ DCNV3_URL="https://github.com/OpenGVLab/InternImage/releases/download/whl_files/DCNv3-1.0+cu113torch1.11.0-cp37-cp37m-linux_x86_64.whl"
18
+
19
+ # Check if the environment exists and activate it
20
+ if [ ! -d "$ENV_PATH" ]; then
21
+ echo "❌ Error: Virtual environment '$ENV_NAME' not found at '$ENV_PATH'."
22
+ echo "Please ensure you have created the environment using 'python3.7 -m venv $ENV_NAME' first."
23
+ exit 1
24
+ fi
25
+
26
+ echo "--- πŸ› οΈ Activating Virtual Environment: $ENV_NAME ---"
27
+ # Deactivate if active, then activate the target environment
28
+ # We use the full path to pip/python for reliability instead of 'source' which only affects the current shell session.
29
+ export PATH="$ENV_PATH/bin:$PATH"
30
+
31
+ # Check if the activation worked by checking the 'which python' command
32
+ if ! command -v python | grep -q "$ENV_PATH"; then
33
+ echo "❌ Failed to set environment path. Aborting."
34
+ exit 1
35
+ fi
36
+
37
+ echo "--- πŸ—‘οΈ Uninstalling Old PyTorch Packages (if present) ---"
38
+ # Use the environment's pip (now in $PATH)
39
+ pip uninstall torch torchvision torchaudio -y || true
40
+
41
+ echo "--- πŸ“¦ Installing PyTorch 1.11.0+cu113 and Core Dependencies ---"
42
+ # Note: We are using the correct PyTorch 1.11.0 versions that match the DCNv3 wheel.
43
+ pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 -f "$TORCH_URL"
44
+
45
+ echo "--- πŸ“¦ Installing OpenMMLab and Other Benchmarking Dependencies ---"
46
+ pip install -U openmim
47
+ # Ensure the full path to python is used for detectron2 (though it should be the venv python now)
48
+ python -m pip install detectron2 -f "$DETECTRON2_URL"
49
+ mim install mmcv-full==1.5.0
50
+ pip install timm==0.6.11 mmdet==2.28.1
51
+ pip install Pillow==9.5.0
52
+ pip install opencv-python termcolor yacs pyyaml scipy
53
+
54
+ echo "--- πŸš€ Installing Compatible DCNv3 Wheel ---"
55
+ pip install "$DCNV3_URL"
56
+
57
+ echo "--- βœ… Setup Complete ---"
58
+ echo "The $ENV_NAME environment is configured. To use it, run:"
59
+ echo "source $ENV_PATH/bin/activate"
start.sh DELETED
@@ -1,143 +0,0 @@
1
- #!/bin/bash
2
- # RoDLA Complete Startup Script
3
- # Starts both frontend and backend services
4
-
5
- set -e
6
-
7
- # Colors
8
- RED='\033[0;31m'
9
- GREEN='\033[0;32m'
10
- YELLOW='\033[1;33m'
11
- BLUE='\033[0;34m'
12
- NC='\033[0m' # No Color
13
-
14
- # Header
15
- echo -e "${BLUE}╔════════════════════════════════════════════════════════════╗${NC}"
16
- echo -e "${BLUE}β•‘ RoDLA DOCUMENT LAYOUT ANALYSIS - 90s Edition β•‘${NC}"
17
- echo -e "${BLUE}β•‘ Startup Script (Frontend + Backend) β•‘${NC}"
18
- echo -e "${BLUE}β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•${NC}"
19
- echo ""
20
-
21
- # Get script directory
22
- SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
23
- cd "$SCRIPT_DIR"
24
-
25
- # Check if required directories exist
26
- if [ ! -d "deployment/backend" ]; then
27
- echo -e "${RED}ERROR: deployment/backend directory not found${NC}"
28
- exit 1
29
- fi
30
-
31
- if [ ! -d "frontend" ]; then
32
- echo -e "${RED}ERROR: frontend directory not found${NC}"
33
- exit 1
34
- fi
35
-
36
- # Check if Python is available
37
- if ! command -v python3 &> /dev/null; then
38
- echo -e "${RED}ERROR: Python 3 is not installed${NC}"
39
- exit 1
40
- fi
41
-
42
- echo -e "${GREEN}βœ“ System check passed${NC}"
43
- echo ""
44
-
45
- # Function to handle Ctrl+C
46
- cleanup() {
47
- echo ""
48
- echo -e "${YELLOW}Shutting down RoDLA...${NC}"
49
- kill $BACKEND_PID 2>/dev/null || true
50
- kill $FRONTEND_PID 2>/dev/null || true
51
- echo -e "${GREEN}βœ“ Services stopped${NC}"
52
- exit 0
53
- }
54
-
55
- # Set trap for Ctrl+C
56
- trap cleanup SIGINT
57
-
58
- # Check ports
59
- check_port() {
60
- if lsof -Pi :$1 -sTCP:LISTEN -t >/dev/null 2>&1 ; then
61
- return 0
62
- else
63
- return 1
64
- fi
65
- }
66
-
67
- # Start Backend
68
- echo -e "${BLUE}[1/2] Starting Backend API (port 8000)...${NC}"
69
-
70
- if check_port 8000; then
71
- echo -e "${YELLOW}⚠ Port 8000 is already in use${NC}"
72
- read -p "Continue anyway? (y/n) " -n 1 -r
73
- echo
74
- if [[ ! $REPLY =~ ^[Yy]$ ]]; then
75
- exit 1
76
- fi
77
- fi
78
-
79
- cd "$SCRIPT_DIR/deployment/backend"
80
- python3 backend.py > /tmp/rodla_backend.log 2>&1 &
81
- BACKEND_PID=$!
82
- echo -e "${GREEN}βœ“ Backend started (PID: $BACKEND_PID)${NC}"
83
- sleep 2
84
-
85
- # Check if backend started successfully
86
- if ! kill -0 $BACKEND_PID 2>/dev/null; then
87
- echo -e "${RED}βœ— Backend failed to start${NC}"
88
- echo -e "${RED}Check logs: cat /tmp/rodla_backend.log${NC}"
89
- exit 1
90
- fi
91
-
92
- # Start Frontend
93
- echo -e "${BLUE}[2/2] Starting Frontend Server (port 8080)...${NC}"
94
-
95
- if check_port 8080; then
96
- echo -e "${YELLOW}⚠ Port 8080 is already in use${NC}"
97
- read -p "Continue anyway? (y/n) " -n 1 -r
98
- echo
99
- if [[ ! $REPLY =~ ^[Yy]$ ]]; then
100
- kill $BACKEND_PID
101
- exit 1
102
- fi
103
- fi
104
-
105
- cd "$SCRIPT_DIR/frontend"
106
- python3 server.py > /tmp/rodla_frontend.log 2>&1 &
107
- FRONTEND_PID=$!
108
- echo -e "${GREEN}βœ“ Frontend started (PID: $FRONTEND_PID)${NC}"
109
- sleep 1
110
-
111
- # Summary
112
- echo ""
113
- echo -e "${BLUE}════════════════════════════════════════════════════════════${NC}"
114
- echo -e "${GREEN}βœ“ RoDLA System is Ready!${NC}"
115
- echo -e "${BLUE}════════════════════════════════════════════════════════════${NC}"
116
- echo ""
117
- echo -e "${YELLOW}Access Points:${NC}"
118
- echo -e " 🌐 Frontend: ${BLUE}http://localhost:8080${NC}"
119
- echo -e " πŸ”Œ Backend: ${BLUE}http://localhost:8000${NC}"
120
- echo -e " πŸ“š API Docs: ${BLUE}http://localhost:8000/docs${NC}"
121
- echo ""
122
- echo -e "${YELLOW}Services:${NC}"
123
- echo -e " Backend PID: $BACKEND_PID"
124
- echo -e " Frontend PID: $FRONTEND_PID"
125
- echo ""
126
- echo -e "${YELLOW}Logs:${NC}"
127
- echo -e " Backend: ${BLUE}tail -f /tmp/rodla_backend.log${NC}"
128
- echo -e " Frontend: ${BLUE}tail -f /tmp/rodla_frontend.log${NC}"
129
- echo ""
130
- echo -e "${YELLOW}Usage:${NC}"
131
- echo -e " 1. Open ${BLUE}http://localhost:8080${NC} in your browser"
132
- echo -e " 2. Upload a document image"
133
- echo -e " 3. Select analysis mode (Standard or Perturbation)"
134
- echo -e " 4. Click [ANALYZE DOCUMENT]"
135
- echo -e " 5. Download results"
136
- echo ""
137
- echo -e "${YELLOW}Exit:${NC}"
138
- echo -e " Press ${BLUE}Ctrl+C${NC} to stop all services"
139
- echo ""
140
-
141
- # Keep running
142
- wait
143
-