CodeLlama Fine-Tuning for RTL Code Generation
This repository contains scripts, datasets, and documentation for fine-tuning CodeLlama-7B-Instruct model for Verilog/SystemVerilog RTL code generation.
π Overview
This project fine-tunes CodeLlama-7B-Instruct to generate synthesizable Verilog/SystemVerilog code for hardware design tasks, specifically focusing on FIFO implementations.
π― Features
- CodeLlama-7B-Instruct Fine-tuning with LoRA
- Chat Template Format support
- Dataset Processing and validation scripts
- Training Scripts with checkpoint resume capability
- Inference Scripts for testing fine-tuned models
- Comprehensive Documentation and guides
π Repository Structure
codellama-migration/
βββ datasets/ # Training datasets
β βββ raw/ # Original datasets
β βββ processed/ # Processed and formatted datasets
β βββ split/ # Train/val/test splits (original format)
β βββ split_chat_format/ # Train/val/test splits (chat format)
βββ scripts/
β βββ training/ # Training scripts
β βββ inference/ # Inference scripts
β βββ dataset_split.py # Dataset splitting utility
βββ Documentation/ # All .md documentation files
βββ Scripts/ # Utility scripts
π Quick Start
Prerequisites
- Python 3.8+
- CUDA-capable GPU (recommended)
- HuggingFace transformers library
- PyTorch
Installation
pip install transformers torch peft accelerate bitsandbytes
Training
bash start_training_chat_format.sh
Inference
python3 scripts/inference/inference_codellama.py \
--mode local \
--model-path training-outputs/codellama-fifo-v2-chat \
--base-model-path models/base-models/CodeLlama-7B-Instruct \
--prompt "Your prompt here"
π Dataset
The dataset contains 94 samples of FIFO implementations in Verilog format. It's split into:
- Training: 70 samples (75%)
- Validation: 9 samples (10%)
- Test: 15 samples (15%)
π Documentation
- MIGRATION_PROGRESS.md - Overall migration tracking
- TRAINING_COMPLETE.md - Training completion details
- COMPARISON_REPORT.md - Expected vs Generated comparison
- FILE_INVENTORY.md - Complete file listing
π€ Model Information
Base Model: CodeLlama-7B-Instruct
Fine-tuning Method: LoRA (Low-Rank Adaptation)
LoRA Rank: 48
LoRA Alpha: 96
Trainable Parameters: ~120M (3.31% of total)
π License
This project is for internal use by Elinnos Systems Pvt Limited.
π₯ Contributors
Elinnos Systems Pvt Limited
π Links
- Organization: https://huggingface.co/Elinnos
- Base Model: https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf
Note: Model weights are not included in this repository. Fine-tuned models are stored separately.