YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

ProtonX OCR tool: Table Detector

Only 11MB size

Introduction

This model helps ProtonX support customers in reducing OCR processing costs. For documents that do not contain tables, ProtonX routes them to open-source OCR models such as Dots OCR or DeepSeek OCR. For documents with complex tables, ProtonX routes them to more powerful OCR models such as Gemini OCR, ensuring high accuracy where it matters most.

This model is a binary image classification model designed to determine whether an input document image contains at least one table.

Built on MobileNetV2 architecture, the model is optimized for document images and scanned PDFs, especially Vietnamese documents, and is intended to be used as a fast pre-filtering step in OCR and document understanding pipelines.

Task Definition

Task: Binary image classification
Objective: Detect table presence in an image

Labels

ID	Label	Meaning
0	`no_table`	Image contains no tables
1	`table`	Image contains one or more tables

⚠️ The model detects presence, not the number or location of tables.

Training Data

The model is trained using a combination of:

DocLayNet Dataset

Public document layout dataset
High-quality annotations
Diverse document layouts

In-house Labeled Vietnamese Document Dataset

Scanned PDFs from Vietnamese legal documents
Mixed-quality OCR inputs
Real-world layouts:
- Contracts
- Administrative forms
- Reports
- Tables embedded in text-heavy pages

This combination improves generalization across both clean and noisy document images.

Quick Usage

Using torchvision

import torch
import torch.nn as nn
import torchvision
from torchvision import transforms
from torchvision import models as pretrained_models
from PIL import Image
from huggingface_hub import hf_hub_download

class TableDetector:
    def __init__(self, model_name: str, device: str = 'cpu'):
        self.device = torch.device(device)
        self.model_path = hf_hub_download(repo_id=model_name, filename="model/table_detector.pth")
        self.model = self.load_model(self.model_path)
        self.model.to(self.device)
        self.model.eval()

    def load_model(self, model_path: str):
        model = pretrained_models.mobilenet_v2(weights=None)
        model.classifier[1]  = nn.Linear(in_features=model.classifier[1].in_features, out_features=2)
        model.load_state_dict(torch.load(model_path, map_location=self.device))
        return model

    def preprocess_image(self, image_path: str):
        transform = transforms.Compose([
                    transforms.Resize((224, 224)),
                    transforms.ToTensor(),
                ])
        image = Image.open(image_path).convert('RGB')
        image = transform(image).unsqueeze(0)  # Add batch dimension
        return image.to(self.device)

    def predict(self, image_path: str):
        image = self.preprocess_image(image_path)
        with torch.no_grad():
            outputs = self.model(image)
            _, preds = torch.max(outputs, 1)
        return 'have_table' if preds.item() == 1 else 'no_table'

if __name__ == "__main__":
    model = TableDetector(model_name='protonx-models/table-detector', device='cpu')

    prediction = model.predict("images/document_page_01.png")

    print(prediction)

Using ProtonX library

from protonx import ProtonX

client = ProtonX(
    mode="offline"
)
prediction = client.ocr.detect_table(image_path="images/document_page_01.png")

print(prediction)

Acknowledgments

Thanks to:

DocLayNet

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support