Chickenscratch OCR
Fine-tuned TrOCR model for personal handwriting recognition.
Model Details
- Base Model: microsoft/trocr-base-handwritten
- Training Data: ~1,300 labeled handwritten lines from reMarkable tablet notes
- Performance: CER 24%, WER 51%
- Training: 15 epochs on Apple M4 Max
Usage
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
processor = TrOCRProcessor.from_pretrained("moonstripe/chickenscratch-ocr")
model = VisionEncoderDecoderModel.from_pretrained("moonstripe/chickenscratch-ocr")
# Load a handwritten line image
image = Image.open("line.png").convert("RGB")
# Process and predict
pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(text)
Training
This model was trained using the chickenscratch-ocr pipeline, which provides:
- Data collection from reMarkable tablet
- Web UI for labeling handwritten lines
- Fine-tuning script with Apple Silicon MPS support
Limitations
- Optimized for a specific person's handwriting
- Best results on single text lines (not full pages)
- May struggle with unusual symbols or very messy writing
- Downloads last month
- 15