Chickenscratch OCR

Fine-tuned TrOCR model for personal handwriting recognition.

Model Details

  • Base Model: microsoft/trocr-base-handwritten
  • Training Data: ~1,300 labeled handwritten lines from reMarkable tablet notes
  • Performance: CER 24%, WER 51%
  • Training: 15 epochs on Apple M4 Max

Usage

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image

processor = TrOCRProcessor.from_pretrained("moonstripe/chickenscratch-ocr")
model = VisionEncoderDecoderModel.from_pretrained("moonstripe/chickenscratch-ocr")

# Load a handwritten line image
image = Image.open("line.png").convert("RGB")

# Process and predict
pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(text)

Training

This model was trained using the chickenscratch-ocr pipeline, which provides:

  • Data collection from reMarkable tablet
  • Web UI for labeling handwritten lines
  • Fine-tuning script with Apple Silicon MPS support

Limitations

  • Optimized for a specific person's handwriting
  • Best results on single text lines (not full pages)
  • May struggle with unusual symbols or very messy writing
Downloads last month
15
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support