| |
|
| | --- |
| | license: bigcode-openrail-m |
| | datasets: |
| | - bigcode/guanaco-commits |
| | metrics: |
| | - code_eval |
| | library_name: peft |
| | tags: |
| | - code |
| | --- |
| | # Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models |
| | <p align="center" width="100%"> |
| | <a ><img src="https://github.com/bigcode-project/astraios/blob/main/visuals/banner.png?raw=true" alt="Astraios" style="width: 20%; min-width: 300px; display: block; margin: auto;"></a> |
| | </p> |
| |
|
| | # Table of Contents |
| |
|
| | 1. [Model Summary](#model-summary) |
| | 2. [Use](#use) |
| | 3. [Training](#training) |
| | 4. [Citation](#citation) |
| |
|
| | # Model Summary |
| |
|
| | > Astraios-AdapterP is an instruction tuned model with 15.5B parameters created by finetuning StarCoderBase on CommitPackFT & OASST as described in the Astraios paper. |
| |
|
| | - **Repository:** [bigcode-project/astraios](https://github.com/bigcode-project/astraios) |
| | - **Paper:** [Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models]() |
| | - **Languages:** 80+ Programming languages |
| | - **✨Astraios:** |
| | <table> |
| | <tr> |
| | <th>Data</t> |
| | <td><a href=https://huggingface.co/datasets/bigcode/guanaco-commits>CommitPackFT+OASST</a></td> |
| | <td>Filtered version of CommitPack and OASST for high-quality commit messages that resemble instructions</td> |
| | </tr> |
| | <tr> |
| | <th>Model</t> |
| | <td><a href=https://huggingface.co/collections/bigcode/astraios-1b-6576ff1b8e449026ae327c1c>Astraios-1B</a></td> |
| | <td>Collection of StarCoderBase-1B models instruction tuned on CommitPackFT + OASST with different tuning methods</td> |
| | </tr> |
| | <tr> |
| | <th></t> |
| | <td><a href=https://huggingface.co/collections/bigcode/astraios-3b-6577127317ee44ff547252d3>Astraios-3B</a></td> |
| | <td>Collection of StarCoderBase-3B (3B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods</td> |
| | </tr> |
| | <tr> |
| | <th></t> |
| | <td><a href=https://huggingface.co/collections/starpeft/starcoderbase-7b-650c1f028b45cfec8e72c265>Astraios-7B</a></td> |
| | <td>Collection of StarCoderBase-7B (7B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods</td> |
| | </tr> |
| | <tr> |
| | <th></t> |
| | <td><a href=https://huggingface.co/collections/bigcode/astraios-16b-65788b7476b6de79781054cc>Astraios-16B</a></td> |
| | <td>Collection of StarCoderBase-16B (16B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods</td> |
| | </tr> |
| | <tr> |
| | <th>Evaluation</t> |
| | <td><a href=https://huggingface.co/datasets/code_x_glue_cc_clone_detection_big_clone_bench>BigCloneBench</a></td> |
| | <td>Dataset for clone detection; We use 2,000 samples for evaluation</td> |
| | </tr> |
| | <tr> |
| | <th></t> |
| | <td><a href=https://huggingface.co/datasets/code_x_glue_cc_defect_detection>Devign</a></td> |
| | <td>Dataset for defect detection; We use 2,000 samples for evaluation</td> |
| | </tr> |
| | <tr> |
| | <th></t> |
| | <td><a href=https://huggingface.co/datasets/bigcode/humanevalpack>HumanEvalPack</a></td> |
| | <td>Extension of OpenAI's HumanEval to cover 3 scenarios across 6 languages</td> |
| | </tr> |
| | <tr> |
| | <th></t> |
| | <td><a href=https://huggingface.co/datasets/RaymondLi/perturbed_humaneval>ReCode</a></td> |
| | <td>Dataset for the robustness of code generation, covering 4 variants</td> |
| | </tr> |
| | <tr> |
| | <th></t> |
| | <td><a href=https://huggingface.co/datasets/moyix/asleep_keyboard>Asleep At The Keyboard</a></td> |
| | <td>Datasets for security of code generation; We use DoW for evaluation</td> |
| | </tr> |
| | </table> |
| |
|
| |
|
| | # Use |
| |
|
| | ## Intended use |
| |
|
| | The model follows instructions provided in the input. You should always preface your input with "Question: " and finish it with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort. |
| |
|
| | Answer:" |
| |
|
| | **Feel free to share your generations in the Community tab!** |
| |
|
| | ## Generation |
| | ```python |
| | # pip install -q transformers |
| | # pip install -e git+https://github.com/bigcode-project/astraios#subdirectory=peft |
| | from peft import PeftModel |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | peft_checkpoint = "bigcode/astraios-adapterp" |
| | checkpoint = "bigcode/starcoderbase" |
| | model = AutoModelForCausalLM.from_pretrained(checkpoint) |
| | model = PeftModel.from_pretrained(model, peft_checkpoint) |
| | device = "cuda" # for GPU usage or "cpu" for CPU usage |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(checkpoint) |
| | model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device) |
| | |
| | inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort. |
| | |
| | Answer:", return_tensors="pt").to(device) |
| | outputs = model.generate(inputs) |
| | print(tokenizer.decode(outputs[0])) |
| | ``` |
| |
|
| | # Training |
| |
|
| | ## Model |
| |
|
| | - **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective |
| | - **Steps:** 250k pretraining & 200 instruction tuning |
| | - **Precision:** fp32 |
| |
|
| | ## Hardware |
| |
|
| | - **Pretraining:** |
| | - **GPUs:** 512 Tesla A100 |
| | - **Training time:** 24 days |
| | - **Instruction tuning:** |
| | - **GPUs:** 8 Tesla A100 |
| |
|
| | ## Software |
| |
|
| | - **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training) |
| | - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch) |
| |
|
| | # Citation |
| |
|
| | ```bibtex |
| | ``` |
| |
|