Gemma 2B Code Generation (Fine-tuned)

Fine-tuned Google Gemma 2B model for code generation using QLoRA on the CodeAlpaca dataset.

Performance

Evaluated on 100 test samples from CodeAlpaca (72 Python, 28 other languages):

Metric Baseline Fine-tuned Improvement
BLEU Score 11.00 16.83 +53% ✅
Syntax Correctness 81% 76% -5%

Key Achievement: 53% improvement in code similarity (BLEU score), demonstrating the model learned to generate code closer to reference solutions.

Head-to-Head Comparison (Python tasks only):

  • Both models pass: 45/72 (62.5%)
  • Both models fail: 25/72 (34.7%)
  • Baseline wins: 0
  • Fine-tuned wins: 2 ✅

Model Details

  • Base Model: google/gemma-2-2b-it
  • Training Method: QLoRA (4-bit quantization with LoRA adapters)
  • Dataset: CodeAlpaca-20k (18,000 training examples)
  • Checkpoint: Step 2000 (~1.8 epochs, selected for best BLEU score)
  • Training Platform: Google Colab (T4 GPU, free tier)
  • Training Cost: $0

Training Configuration

  • LoRA Rank: 16
  • LoRA Alpha: 32
  • LoRA Dropout: 0.05
  • Target Modules: q_proj, v_proj
  • Quantization: 4-bit NF4
  • Learning Rate: 2e-4
  • Batch Size: 16 (effective)
  • Optimizer: Paged AdamW 8-bit

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-2b-it",
    device_map="auto",
    torch_dtype=torch.float16,
    load_in_4bit=True
)

# Load fine-tuned adapter
model = PeftModel.from_pretrained(base_model, "nvhuynh16/gemma-2b-code-alpaca-best")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")

# Generate code
instruction = "Write a function to check if a number is prime"
prompt = f"""### Instruction:
{instruction}

### Input:


### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(code.split("### Response:")[-1].strip())

Example Output

Instruction: Write a function to check if a number is prime

Generated Code:

def is_prime(num):
    if num <= 1:
        return False
    for i in range(2, num):
        if num % i == 0:
            return False
    return True

Supported Languages

Primarily trained on:

  • Python (majority)
  • SQL
  • JavaScript
  • Java
  • HTML/CSS

Limitations

  • Syntax correctness slightly lower than base model (-5%) due to sampling variance
  • Best for algorithmic/utility functions
  • May require prompt tuning for optimal results
  • Not optimized for framework-specific code (Django, FastAPI, etc.)

Evaluation Details

The model was evaluated on a held-out test set of 100 examples from CodeAlpaca:

  • 72 Python tasks: Evaluated using Python AST parser for syntax validation
  • 28 Non-Python tasks (SQL, JavaScript, Java, HTML): Validated by language detection

BLEU scores were calculated using SacreBLEU with smoothing to measure code similarity to reference implementations.

License

This model is based on Gemma 2B and follows the Gemma License.

Citation

@misc{gemma-2b-code-alpaca-best,
  title={Gemma 2B Code Generation - Fine-tuned},
  author={nvhuynh16},
  year={2025},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/nvhuynh16/gemma-2b-code-alpaca-best}}
}

Acknowledgments

  • Google for the Gemma model
  • HuggingFace for the transformers and PEFT libraries
  • CodeAlpaca dataset creators
  • Google Colab for free GPU access
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nvhuynh16/gemma-2b-code-alpaca-best

Base model

google/gemma-2-2b
Adapter
(312)
this model

Dataset used to train nvhuynh16/gemma-2b-code-alpaca-best