| | --- |
| | license: apache-2.0 |
| | tags: |
| | - code |
| | --- |
| | # Fine-tuned Qwen2.5-Coder-7B for Function Writing |
| |
|
| | ## Model Description |
| |
|
| | This model is a fine-tuned version of Qwen2.5-Coder-7B, specifically optimized for function writing tasks. The base model Qwen2.5-Coder-7B is part of the Qwen2.5-Coder family, which was trained on 5.5 trillion tokens including source code, text-code grounding, and synthetic data. |
| |
|
| | ### Base Model Details |
| |
|
| | * **Type**: Causal Language Model |
| | * **Architecture**: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias |
| | * **Parameters**: 7.61B (6.53B Non-Embedding) |
| | * **Layers**: 28 |
| | * **Attention Heads**: 28 for Q and 4 for KV |
| | * **Context Length**: Up to 131,072 tokens |
| |
|
| | ## Fine-tuning Specifications |
| |
|
| | The model was fine-tuned using LoRA (Low-Rank Adaptation) with the following configuration: |
| |
|
| | ### Training Parameters |
| |
|
| | * **Training Data**: 30,000 examples |
| | * **Batch Size**: 1 per device |
| | * **Gradient Accumulation Steps**: 24 |
| | * **Learning Rate**: 1e-6 |
| | * **Number of Epochs**: 2 |
| | * **Warmup Ratio**: 0.05 |
| | * **Maximum Sequence Length**: 4,096 tokens |
| | * **Weight Decay**: 0.01 |
| | * **Maximum Gradient Norm**: 0.5 |
| | * **Learning Rate Scheduler**: Cosine |
| |
|
| | ### LoRA Configuration |
| |
|
| | * **Rank (r)**: 32 |
| | * **Alpha**: 32 |
| | * **Dropout**: 0.05 |
| | * **Target Modules**: q_proj, v_proj, o_proj, gate_proj, up_proj |
| | * **Training Mode**: BF16 mixed precision |
| | * **RS-LoRA**: Enabled |
| | |
| | ### Training Infrastructure |
| | |
| | * **Quantization**: 4-bit quantization (NF4) |
| | * **Attention Implementation**: Flash Attention 2 |
| | * **Memory Optimization**: Gradient checkpointing enabled |
| | |
| | ## Usage |
| | |
| | This model is optimized for function writing tasks and can be loaded using the Hugging Face Transformers library. Here's a basic example: |
| | |
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | # Load the model and tokenizer |
| | model = AutoModelForCausalLM.from_pretrained( |
| | "path_to_your_model", |
| | trust_remote_code=True, |
| | torch_dtype=torch.bfloat16, |
| | device_map="auto" |
| | ) |
| | tokenizer = AutoTokenizer.from_pretrained( |
| | "path_to_your_model", |
| | trust_remote_code=True |
| | ) |
| | |
| | # Generate text |
| | input_text = "Write a function that..." |
| | inputs = tokenizer(input_text, return_tensors="pt").to(model.device) |
| | outputs = model.generate(**inputs, max_new_tokens=500) |
| | response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
| | ``` |
| | |
| | ## Limitations |
| | |
| | * The model is specifically fine-tuned for function writing tasks and may not perform optimally for general code generation or other tasks |
| | * Maximum context length during fine-tuning was limited to 4,096 tokens |
| | * While the base model supports up to 128K tokens, using beyond 4,096 tokens may require additional validation |
| | |
| | ## License |
| | |
| | This model inherits the Apache 2.0 license from its base model Qwen2.5-Coder-7B. |
| | |
| | ## Citation |
| | |
| | If you use this model, please cite both the original Qwen2.5-Coder paper and acknowledge the fine-tuning work: |
| | |
| | ```bibtex |
| | @article{hui2024qwen2, |
| | title={Qwen2.5-Coder Technical Report}, |
| | author={Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jiaxi and Liu, Dayiheng and Zhang, Lei and Liu, Tianyu and Zhang, Jiajun and Yu, Bowen and Dang, Kai and others}, |
| | journal={arXiv preprint arXiv:2409.12186}, |
| | year={2024} |
| | } |
| | ``` |