Model Card for harpertoken/harpertokenGPT2

GPT-2 small model trained from scratch on WikiText-2-raw-v1 dataset for text generation.

Model Details

Model Description

This is a GPT-2 small model (117M parameters) trained from random initialization on the WikiText-2-raw-v1 dataset. It can generate coherent text continuations.

Developed by: Niladri Das
Model type: GPT-2
Language(s) (NLP): English
License: Apache-2.0

Model Sources

Repository: https://github.com/bniladridas/models

Uses

Direct Use

Use for text generation tasks, such as completing sentences or generating stories.

Out-of-Scope Use

Not suitable for tasks requiring factual accuracy, safety-critical applications, or languages other than English.

Bias, Risks, and Limitations

Trained on WikiText, which may contain biases from the source data. Model may generate inappropriate or biased content.

Recommendations

Use with caution; implement content filters for production use.

How to Get Started with the Model

from transformers import pipeline

generator = pipeline('text-generation', model='harpertoken/harpertokenGPT2')
print(generator("The quick brown fox"))

Training Details

Training Data

WikiText-2-raw-v1 dataset, a collection of Wikipedia articles.

Training Procedure

Trained from scratch using PyTorch and Transformers.

Training Hyperparameters

Epochs: 3
Batch size: 1
Learning rate: 5e-5
Max length: 512

Evaluation

Basic evaluation via text generation coherence.

Results

Generates plausible text continuations.

Environmental Impact

Hardware Type: CPU/MPS
Hours used: ~10 minutes
Carbon Emitted: Minimal (local training)

Technical Specifications

Model Architecture and Objective

GPT-2 decoder-only transformer for causal language modeling.

Compute Infrastructure

Hardware: Mac with MPS
Software: PyTorch, Transformers

Downloads last month: 5

Safetensors

Model size

0.1B params

Tensor type

F32