FaceGuard – ViT (20 CelebA IDs)

A Vision Transformer (ViT-Base) fine-tuned for identity classification on a 20-identity subset of the CelebA dataset.
This model predicts anonymized celeb_id integers (not celebrity names).
It powers the demo Space: https://huggingface.co/spaces/hudaakram/FaceGuard-demo


Model Details

Model Description

  • Architecture: google/vit-base-patch16-224 (pretrained on ImageNet-1k)
  • Fine-tuned for: 20-class identity classification (CelebA celeb_ids)
  • Input: RGB image (face crop), resized and normalized to 224Γ—224
  • Output: Probability distribution over 20 anonymized IDs
  • Parameters: ~86M

Sources


Uses

Direct Use

  • Research demo for identity classification with anonymized CelebA IDs
  • Educational example of fine-tuning ViT for image classification

Downstream Use

  • As a starting point for transfer learning to other small identity classification tasks
  • As an educational reference for hackathons, workshops, or courses

Out-of-Scope Use

  • ❌ Production face recognition / surveillance
  • ❌ Identifying real celebrity names (dataset only provides integer IDs)
  • ❌ Any high-stakes application involving privacy or personal data

Bias, Risks, and Limitations

  • Bias: CelebA contains celebrity faces, which are not representative of all demographics.
  • Limitations: Trained on only 20 identities (~600 images total) β†’ limited generalization.
  • Privacy: CelebA IDs are anonymized integers, not real names. The model is not capable of returning actual identities.

Recommendation: Use strictly for research/educational purposes.


How to Get Started

Use the code below to get started with the model.

from transformers import ViTForImageClassification, AutoImageProcessor
from PIL import Image
import torch

model_id = "hudaakram/FaceGuard-20ID-ViT"
processor = AutoImageProcessor.from_pretrained(model_id)
model = ViTForImageClassification.from_pretrained(model_id)

img = Image.open("face.jpg").convert("RGB")
inputs = processor(images=img, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)[0]

id2label = {int(k): v for k, v in model.config.id2label.items()}
top5 = probs.topk(5)
for score, idx in zip(top5.values, top5.indices):
print(f"Label {idx.item()} (celeb_id {id2label[idx.item()]}): {score:.3f}")

Training Details

Training Data

  • Dataset: CelebA (top 20 identities by frequency)
  • Splits: Stratified 80% train / 10% validation / 10% test
  • Sizes: Train 501, Val 60, Test 77

Training Procedure

  • Seed: 42
  • Epochs: 4
  • Batch size: 16
  • Learning rate: 5e-5
  • Optimizer: AdamW
  • Weight decay: 0.01
  • Precision: FP16 on GPU (Colab)
  • Head resized: from 1000 classes β†’ 20 classes

Preprocessing

  • Images resized + center-cropped to 224Γ—224
  • Normalized to ImageNet mean/std
  • Labels mapped from CelebA celeb_id β†’ contiguous 0–19

Training Hyperparameters

  • Training regime: fp16 mixed precision on GPU
  • Total epochs: 4 (~3 minutes each on Colab T4)

Speeds, Sizes, Times

  • Checkpoint size: ~343 MB
  • Throughput: ~10 samples/sec (Colab T4)

Evaluation

  • Validation Accuracy: ~0.93
  • Test Accuracy: ~0.83
  • Macro AUC: (see ROC below)

Test Split Summary

Split #Images #Classes Min/Class Median/Class Max/Class
Train 501 20 24 24 28
Val 60 20 3 3 3
Test 77 20 3 4 4

Results

Confusion Matrix (normalized):
Confusion Matrix

ROC Curves (one-vs-rest):
ROC Curves


Environmental Impact

  • Hardware: Google Colab T4 GPU
  • Training time: ~12 minutes total (4 epochs)
  • Carbon emissions: negligible (short fine-tuning run)

Technical Specifications

Model Architecture and Objective

  • Vision Transformer (ViT-Base, patch16, 224Γ—224)
  • Objective: Cross-entropy classification across 20 labels

Compute Infrastructure

  • Hardware: Google Colab T4 GPU
  • Framework: PyTorch + Hugging Face Transformers

Citation

CelebA Dataset:
Z. Liu, P. Luo, X. Wang, and X. Tang. Deep Learning Face Attributes in the Wild. ICCV 2015.

ViT:
A. Dosovitskiy et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR 2021.


Model Card Authors

Hackathon submission by Huda Akram

Contact

Downloads last month
7
Safetensors
Model size
85.8M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using hudaakram/FaceGuard-20ID-ViT 1