Instructions to use codesage/codesage-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use codesage/codesage-large with Transformers:
# Load model directly from transformers import CodeSage model = CodeSage.from_pretrained("codesage/codesage-large", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Add Sentence Transformers support
#3
by tomaarsen HF Staff - opened
- 1_Pooling/config.json +9 -0
- README.md +19 -0
- config_sentence_transformers.json +7 -0
- modules.json +14 -0
1_Pooling/config.json
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"word_embedding_dimension": 2048,
|
| 3 |
+
"pooling_mode_cls_token": false,
|
| 4 |
+
"pooling_mode_mean_tokens": true,
|
| 5 |
+
"pooling_mode_max_tokens": false,
|
| 6 |
+
"pooling_mode_mean_sqrt_len_tokens": false,
|
| 7 |
+
"pooling_mode_weightedmean_tokens": false,
|
| 8 |
+
"pooling_mode_lasttoken": false
|
| 9 |
+
}
|
README.md
CHANGED
|
@@ -3,6 +3,8 @@ license: apache-2.0
|
|
| 3 |
datasets:
|
| 4 |
- bigcode/the-stack-dedup
|
| 5 |
library_name: transformers
|
|
|
|
|
|
|
| 6 |
language:
|
| 7 |
- code
|
| 8 |
---
|
|
@@ -24,6 +26,8 @@ This checkpoint is first trained on code data via masked language modeling (MLM)
|
|
| 24 |
### How to use
|
| 25 |
This checkpoint consists of an encoder (1.3B model), which can be used to extract code embeddings of 2048 dimension. It can be easily loaded using the AutoModel functionality and employs the Starcoder tokenizer (https://arxiv.org/pdf/2305.06161.pdf).
|
| 26 |
|
|
|
|
|
|
|
| 27 |
```
|
| 28 |
from transformers import AutoModel, AutoTokenizer
|
| 29 |
|
|
@@ -39,6 +43,21 @@ print(f'Dimension of the embedding: {embedding[0].size()}')
|
|
| 39 |
# Dimension of the embedding: torch.Size([13, 2048])
|
| 40 |
```
|
| 41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
### BibTeX entry and citation info
|
| 43 |
```
|
| 44 |
@inproceedings{
|
|
|
|
| 3 |
datasets:
|
| 4 |
- bigcode/the-stack-dedup
|
| 5 |
library_name: transformers
|
| 6 |
+
tags:
|
| 7 |
+
- sentence-transformers
|
| 8 |
language:
|
| 9 |
- code
|
| 10 |
---
|
|
|
|
| 26 |
### How to use
|
| 27 |
This checkpoint consists of an encoder (1.3B model), which can be used to extract code embeddings of 2048 dimension. It can be easily loaded using the AutoModel functionality and employs the Starcoder tokenizer (https://arxiv.org/pdf/2305.06161.pdf).
|
| 28 |
|
| 29 |
+
### Transformers
|
| 30 |
+
|
| 31 |
```
|
| 32 |
from transformers import AutoModel, AutoTokenizer
|
| 33 |
|
|
|
|
| 43 |
# Dimension of the embedding: torch.Size([13, 2048])
|
| 44 |
```
|
| 45 |
|
| 46 |
+
### Sentence Transformers
|
| 47 |
+
|
| 48 |
+
```
|
| 49 |
+
from sentence_transformers import SentenceTransformer
|
| 50 |
+
|
| 51 |
+
checkpoint = "codesage/codesage-large"
|
| 52 |
+
device = "cuda" # for GPU usage or "cpu" for CPU usage
|
| 53 |
+
|
| 54 |
+
model = SentenceTransformer(checkpoint, device=device, trust_remote_code=True)
|
| 55 |
+
|
| 56 |
+
embedding = model.encode("def print_hello_world():\tprint('Hello World!')")
|
| 57 |
+
print(f'Dimension of the embedding: {embedding.size}')
|
| 58 |
+
# Dimension of the embedding: 2048
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
### BibTeX entry and citation info
|
| 62 |
```
|
| 63 |
@inproceedings{
|
config_sentence_transformers.json
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"__version__": {
|
| 3 |
+
"sentence_transformers": "2.4.0.dev0",
|
| 4 |
+
"transformers": "4.37.0",
|
| 5 |
+
"pytorch": "2.1.0+cu121"
|
| 6 |
+
}
|
| 7 |
+
}
|
modules.json
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"idx": 0,
|
| 4 |
+
"name": "0",
|
| 5 |
+
"path": "",
|
| 6 |
+
"type": "sentence_transformers.models.Transformer"
|
| 7 |
+
},
|
| 8 |
+
{
|
| 9 |
+
"idx": 1,
|
| 10 |
+
"name": "1",
|
| 11 |
+
"path": "1_Pooling",
|
| 12 |
+
"type": "sentence_transformers.models.Pooling"
|
| 13 |
+
}
|
| 14 |
+
]
|