FaceGuard – ViT (20 CelebA IDs)

A Vision Transformer (ViT-Base) fine-tuned for identity classification on a 20-identity subset of the CelebA dataset.
This model predicts anonymized celeb_id integers (not celebrity names).
It powers the demo Space: https://huggingface.co/spaces/hudaakram/FaceGuard-demo

Model Details

Model Description

Architecture: google/vit-base-patch16-224 (pretrained on ImageNet-1k)
Fine-tuned for: 20-class identity classification (CelebA celeb_ids)
Input: RGB image (face crop), resized and normalized to 224×224
Output: Probability distribution over 20 anonymized IDs
Parameters: ~86M

Sources

Base model: https://huggingface.co/google/vit-base-patch16-224
Demo Space: https://huggingface.co/spaces/hudaakram/FaceGuard-demo
Dataset: CelebA (community mirror on the Hub)

Uses

Direct Use

Research demo for identity classification with anonymized CelebA IDs
Educational example of fine-tuning ViT for image classification

Downstream Use

As a starting point for transfer learning to other small identity classification tasks
As an educational reference for hackathons, workshops, or courses

Out-of-Scope Use

❌ Production face recognition / surveillance
❌ Identifying real celebrity names (dataset only provides integer IDs)
❌ Any high-stakes application involving privacy or personal data

Bias, Risks, and Limitations

Bias: CelebA contains celebrity faces, which are not representative of all demographics.
Limitations: Trained on only 20 identities (~600 images total) → limited generalization.
Privacy: CelebA IDs are anonymized integers, not real names. The model is not capable of returning actual identities.

Recommendation: Use strictly for research/educational purposes.

How to Get Started

Use the code below to get started with the model.

from transformers import ViTForImageClassification, AutoImageProcessor
from PIL import Image
import torch

model_id = "hudaakram/FaceGuard-20ID-ViT"
processor = AutoImageProcessor.from_pretrained(model_id)
model = ViTForImageClassification.from_pretrained(model_id)

img = Image.open("face.jpg").convert("RGB")
inputs = processor(images=img, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)[0]

id2label = {int(k): v for k, v in model.config.id2label.items()}
top5 = probs.topk(5)
for score, idx in zip(top5.values, top5.indices):
print(f"Label {idx.item()} (celeb_id {id2label[idx.item()]}): {score:.3f}")

Training Details

Training Data

Dataset: CelebA (top 20 identities by frequency)
Splits: Stratified 80% train / 10% validation / 10% test
Sizes: Train 501, Val 60, Test 77

Training Procedure

Seed: 42
Epochs: 4
Batch size: 16
Learning rate: 5e-5
Optimizer: AdamW
Weight decay: 0.01
Precision: FP16 on GPU (Colab)
Head resized: from 1000 classes → 20 classes

Preprocessing

Images resized + center-cropped to 224×224
Normalized to ImageNet mean/std
Labels mapped from CelebA celeb_id → contiguous 0–19

Training Hyperparameters

Training regime: fp16 mixed precision on GPU
Total epochs: 4 (~3 minutes each on Colab T4)

Speeds, Sizes, Times

Checkpoint size: ~343 MB
Throughput: ~10 samples/sec (Colab T4)

Evaluation

Validation Accuracy: ~0.93
Test Accuracy: ~0.83
Macro AUC: (see ROC below)

Test Split Summary

Split	#Images	#Classes	Min/Class	Median/Class	Max/Class
Train	501	20	24	24	28
Val	60	20	3	3	3
Test	77	20	3	4	4

Results

Confusion Matrix (normalized):

ROC Curves (one-vs-rest):

Environmental Impact

Hardware: Google Colab T4 GPU
Training time: ~12 minutes total (4 epochs)
Carbon emissions: negligible (short fine-tuning run)

Technical Specifications

Model Architecture and Objective

Vision Transformer (ViT-Base, patch16, 224×224)
Objective: Cross-entropy classification across 20 labels

Compute Infrastructure

Hardware: Google Colab T4 GPU
Framework: PyTorch + Hugging Face Transformers

Citation

CelebA Dataset:
Z. Liu, P. Luo, X. Wang, and X. Tang. Deep Learning Face Attributes in the Wild. ICCV 2015.

ViT:
A. Dosovitskiy et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR 2021.

Model Card Authors

Hackathon submission by Huda Akram

Contact

Hugging Face profile: https://huggingface.co/hudaakram

Downloads last month: 7

Safetensors

Model size

85.8M params

Tensor type

F32

hudaakram
/

FaceGuard-20ID-ViT