Model Card for harpertoken/harpertokenGPT2
GPT-2 small model trained from scratch on WikiText-2-raw-v1 dataset for text generation.
Model Details
Model Description
This is a GPT-2 small model (117M parameters) trained from random initialization on the WikiText-2-raw-v1 dataset. It can generate coherent text continuations.
- Developed by: Niladri Das
- Model type: GPT-2
- Language(s) (NLP): English
- License: Apache-2.0
Model Sources
- Repository: https://github.com/bniladridas/models
Uses
Direct Use
Use for text generation tasks, such as completing sentences or generating stories.
Out-of-Scope Use
Not suitable for tasks requiring factual accuracy, safety-critical applications, or languages other than English.
Bias, Risks, and Limitations
Trained on WikiText, which may contain biases from the source data. Model may generate inappropriate or biased content.
Recommendations
Use with caution; implement content filters for production use.
How to Get Started with the Model
from transformers import pipeline
generator = pipeline('text-generation', model='harpertoken/harpertokenGPT2')
print(generator("The quick brown fox"))
Training Details
Training Data
WikiText-2-raw-v1 dataset, a collection of Wikipedia articles.
Training Procedure
Trained from scratch using PyTorch and Transformers.
Training Hyperparameters
- Epochs: 3
- Batch size: 1
- Learning rate: 5e-5
- Max length: 512
Evaluation
Basic evaluation via text generation coherence.
Results
Generates plausible text continuations.
Environmental Impact
- Hardware Type: CPU/MPS
- Hours used: ~10 minutes
- Carbon Emitted: Minimal (local training)
Technical Specifications
Model Architecture and Objective
GPT-2 decoder-only transformer for causal language modeling.
Compute Infrastructure
- Hardware: Mac with MPS
- Software: PyTorch, Transformers
- Downloads last month
- 5