Qwen2.5-1.5B Math LoRA Collection
This directory aggregates all LoRA checkpoints produced by the train_lora pipeline. Every subfolder corresponds to one math dataset and contains 10 independent 100-shot LoRA runs (group 00β09) trained on Qwen2.5β1.5B-Instruct with identical hyperparameters. The adapters here are the source of truth for downstream evaluation (../θ―δΌ°δ½η³») and for the parameter_generator project, which learns to map prompts to LoRA weights.
If you are new to the project, this document explains where the data comes from, how the LoRAs are produced, and how you can reuse them for inference, evaluation, or further training.
Provenance
- Base model:
Qwen2.5-1.5B-Instruct - Datasets: sampled from
../../prepare/data/math/*.json. Each JSON is a list of{prompt, response, system?}records.dataset_sampler.pydraws 10 disjoint groups of 100 samples (unless the dataset has <1β―000 examples, in which case sampling with replacement keeps the group size fixed) using a deterministic seed derived from the dataset name. - Training recipe (from
config/default.yaml):- sequence length 4β―096; LoRA
r=64,alpha=128,dropout=0.05, target modules ={q,k,v,o,gate,up,down}_proj - 12 epochs / max 1β―800 steps, learning rate
1e-4, batch size per device2, gradient accumulation16, BF16 training, gradient checkpointing on, weight decay0.01, warmup ratio0.03, checkpoints saved every 300 steps (keeping at most 6) plus a final adapter export - Tokenizers are cloned from the base model (pad token defaults to EOS if missing)
- sequence length 4β―096; LoRA
- Monitoring & reproducibility:
- Trainer logs (loss, LR, throughput) are in
../logs/<dataset>/group_xx/. - Slurm stdout/err for each shard live in
../logs/slurm/. metadata.jsoncaptures the git commit (ifGIT_COMMITwas set), timestamps, seeds, and the effective batch size so any experiment can be repeated exactly.
- Trainer logs (loss, LR, throughput) are in
End-to-end data flow
- Raw JSON data comes from
../../prepare/data/math. Each file is a list of dict objects with keys:{ "prompt": "...question...", "response": "...reference answer...", "system": "optional system message" } python -m train_lora.dataset_sampler --config config/default.yamlreads every dataset, filters outGSM8K_test.json, and deterministically samples 10Γ100 items per dataset. The samples plus metadata (indices, seeds, timestamps) are written to../prompt_groups/<dataset>/group_xx.json.python -m train_lora.run_tasks --run(or the Slurm array) iterates dataset/group pairs, loads the corresponding prompt group, and performs LoRA fine-tuning with Hugging FaceTrainer.- After training finishes, the following artifacts land in
outputs/<dataset>/group_xx/:- a ready-to-use LoRA adapter (
adapter/) - intermediate checkpoints for analysis/resume
- tokenizers and metadata
- a ready-to-use LoRA adapter (
- The evaluation stacks (
../θ―δΌ°δ½η³»,../parameter_generator/θ―δΌ°) and the LoRA parameter generator both consume these directories directly.
Directory layout
outputs/
βββ Competition_Math/
βββ GSM8K_train/
βββ MATH/
βββ Math-IIO-68K-Mini/
βββ Math-Plus/
βββ Math_QA/
βββ Mu-Math/
βββ ToT-Math-V1/
Each dataset directory contains group_00 β¦ group_09. Inside every group:
| Item | Description |
|---|---|
adapter/ |
Final LoRA export (adapter_model.safetensors, adapter_config.json, tokenizer + chat template snapshots, and HF training_args.bin). This is the folder you will load for inference. |
checkpoints/checkpoint-xxxx/ |
Intermediate Trainer checkpoints saved every 300 steps (300β1β―800). They include optimizer, scheduler, RNG state, and tokenizer copies for resuming or studying training dynamics. |
tokenizer/ |
Standalone tokenizer snapshot identical to the one used during training; useful if you need a self-contained deployment without referencing the base model directory. |
prompt_group.json |
The exact 100-shot dataset used for this training run (a copy of prompt_groups/<dataset>/group_xx.json). Contains metadata such as sampled indices, original source file, and timestamp. |
metadata.json |
Provenance record with training loss, Trainer metrics, LoRA config, effective batch size/world size, timestamps, git commit (if exported), and file paths. |
metadata.json -> trainer_state |
Full training log history (per-step metrics). Disable via metadata.save_training_state: false if you want lighter metadata. |
Tip: Use
metadata.jsonto find the latest checkpoint, to confirm which base model/tokenizer were used, or to drive automated uploads/evaluations.
Dataset overview
| Dataset dir | Source file (relative to prepare/data/math) |
Notes |
|---|---|---|
Competition_Math |
Competition_Math.json |
100-shot groups drawn from Competition Math practice problems. |
GSM8K_train |
GSM8K_train.json |
Standard GSM8K train split, excluding the public test set (GSM8K_test.json was filtered out). |
MATH |
MATH.json |
High-school & olympiad math benchmark. |
Math-IIO-68K-Mini |
Math-IIO-68K-Mini.json |
Mini version of Math IIO dataset. |
Math-Plus |
Math-Plus.json |
Composed of challenging math word problems. |
Math_QA |
Math_QA.json |
Multi-choice MathQA dataset formatted to open-ended QA. |
Mu-Math |
Mu-Math.json |
MuSR style math reasoning set. |
ToT-Math-V1 |
ToT-Math-V1.json |
Tree-of-Thought flavored math prompts. |
All datasets follow the same JSON schema, so swapping between them only changes topical coverage.
How to navigate a single group
Math_QA/
βββ group_00/
βββ adapter/
β βββ adapter_config.json
β βββ adapter_model.safetensors
β βββ tokenizer/β¦ (extra copies of merges, vocab, chat_template.jinja)
β βββ training_args.bin
βββ checkpoints/
β βββ checkpoint-300/
β βββ checkpoint-600/
β βββ β¦
βββ tokenizer/ # same as base tokenizer but pinned to this run
βββ prompt_group.json # 100-shot data
βββ metadata.json
When inspecting or sharing a run, the minimum file set is adapter/ + prompt_group.json + metadata.json. Everything else speeds up resuming or auditing.
Using the adapters
0. Environment prerequisites
- Python β₯ 3.10,
transformers >= 4.37,peft >= 0.8,accelerate,safetensors,torch(GPU build). - The base model directory must be accessible; otherwise download
Qwen2.5-1.5B-Instructfrom Hugging Face and updatebase_modelpath. - Optional: set
HF_HOME,TRANSFORMERS_CACHEto avoid repeated downloads.
0.5. Reproduce the training pipeline (optional)
If someone wants to regenerate any adapter from scratch:
cd train_lora
python -m train_lora.dataset_sampler --overwrite # regenerates prompt groups
python -m train_lora.train_single --dataset Math_QA --group 0
# or run the full queue
python -m train_lora.run_tasks --run
These commands will rebuild prompt_groups/ and outputs/ with exactly the same seeds and configuration documented above. Slurm users should submit sbatch run_lora_multinode.sh.
1. Load adapter with PEFT
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base_model = "Qwen2.5-1.5B-Instruct"
adapter_dir = "outputs/Math_QA/group_00/adapter"
tokenizer = AutoTokenizer.from_pretrained(adapter_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_dir)
prompt = "Solve 3x + 7 = 22."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Notes:
- Loading the tokenizer from
adapter/ensures identical chat template and additional tokens (if any). You can also point to the base tokenizer path if you prefer. - For batch inference, wrap the model with
model.merge_and_unload()if you need a single combined set of weights (at the cost of losing LoRA toggling). - If you want maximal throughput on a single GPU, also call
model.half()ormodel.to(torch.bfloat16)depending on your hardware; the adapters were trained with BF16 so keeping BF16 is the safest choice.
2. Resume or continue training
python -m train_lora.train_single \
--dataset Math_QA \
--group 0 \
--group-file outputs/Math_QA/group_00/prompt_group.json
Set --group-file to reuse the same 100 samples, and initialize Trainer with checkpoints/checkpoint-XXXX via TrainingArguments.resume_from_checkpoint. This reproduces a group or lets you extend training steps.
To resume manually:
trainer.train(resume_from_checkpoint="outputs/Math_QA/group_00/checkpoints/checkpoint-1500")
3. Evaluate with Math-Verify
The evaluation stack in ../θ―δΌ°δ½η³» and ../parameter_generator/θ―δΌ° expects this directory layout. Example:
cd θ―δΌ°δ½η³»
python scripts/run_all_evals.py \
--config configs/eval_config.yaml \
--datasets Math_QA \
--groups 0 1
4. Packaging for distribution
- Upload only
adapter/andmetadata.jsonwhen sharing publicly (e.g., Hugging Face) to avoid huge checkpoint directories. - Keep
prompt_group.jsonif you want consumers to understand the training data or to regenerate LoRA weights with the same samples. - When exporting, include a README snippet that references this document so downstream users know the provenance.
- Suggested Hugging Face layout:
Math_QA/ group_00/ adapter/ prompt_group.json metadata.json README.md (copy sections describing provenance + usage)
File reference (metadata.json)
Key fields you may want to automate against:
| Field | Meaning |
|---|---|
dataset_name, group_index |
Identify the run. |
prompt_group_file |
Absolute path back to the sampled dataset. |
checkpoint_root |
Where all intermediate checkpoints live. |
train_loss, metrics |
Final loss and Trainer metrics dict. |
trainer_state |
Full log history (can be large; disable via metadata.save_training_state). |
training_args |
Exact HF TrainingArguments snapshot. |
lora_config |
Copy of the LoRA hyperparameters used. |
effective_batch_size |
world_size Γ per_device_batch_size Γ grad_accum β useful for scaling comparisons. |
git_commit |
Populated if the GIT_COMMIT env var was set before training. |
metrics.train_runtime, metrics.train_samples_per_second |
Throughput stats. |
generated_at |
UTC timestamp when the metadata was written. |
Best practices
- Always match BF16 or FP16 settings between base model loading and adapter training; these adapters were trained in BF16.
- If you edit files inside this directory, keep structure intactβother scripts rely on relative paths (
adapter,tokenizer,metadata.json). - Before deploying a new LoRA, verify it with the evaluation suite and consider merging multiple groups (e.g., ensemble or checkpoint averaging) only after confirming stability.
- Use
prompt_group.jsonandmetadata.jsonas documentation when presenting results; they already include seeds, sample indices, and environment details. - If you build new LoRAs with different configs (e.g., higher rank, more steps), add a sibling directory (e.g.,
outputs_v2/) or annotate the README so collaborators know which adapters correspond to which experiment.
Happy finetuning! If you extend this collection (new datasets, extra groups, or different hyperparameters), add another section here describing the changes so downstream consumers stay informed.