datatab/alpaca-cleaned-serbian-full
Viewer • Updated • 51.8k • 819 • 1
How to use datatab/Yugo60-GPT with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="datatab/Yugo60-GPT")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("datatab/Yugo60-GPT")
model = AutoModelForCausalLM.from_pretrained("datatab/Yugo60-GPT")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use datatab/Yugo60-GPT with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "datatab/Yugo60-GPT"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "datatab/Yugo60-GPT",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/datatab/Yugo60-GPT
How to use datatab/Yugo60-GPT with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "datatab/Yugo60-GPT" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "datatab/Yugo60-GPT",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "datatab/Yugo60-GPT" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "datatab/Yugo60-GPT",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use datatab/Yugo60-GPT with Docker Model Runner:
docker model run hf.co/datatab/Yugo60-GPT
Results obtained through the Serbian LLM evaluation, released by Aleksa Gordić: serbian-llm-eval
- Evaluation was conducted on a 4-bit version of the model due to hardware resource constraints.
| MODEL | ARC-E | ARC-C | Hellaswag | BoolQ | Winogrande | OpenbookQA | PiQA |
|---|---|---|---|---|---|---|---|
| *Yugo55-GPT-v4-4bit | 51.41 | 36.00 | 57.51 | 80.92 | 65.75 | 34.70 | 70.54 |
| Yugo55A-GPT | 51.52 | 37.78 | 57.52 | 84.40 | 65.43 | 35.60 | 69.43 |
| Yugo60-GPT | tbd | tbd | tbd | tbd | tbd | tbd | tbd |
!pip -q install git+https://github.com/huggingface/transformers
!pip install -q datasets loralib sentencepiece
!pip -q install bitsandbytes accelerate
from IPython.display import HTML, display
def set_css():
display(HTML('''
<style>
pre {
white-space: pre-wrap;
}
</style>
'''))
get_ipython().events.register('pre_run_cell', set_css)
import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"datatab/Yugo60-GPT", torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"datatab/Yugo60-GPT", torch_dtype="auto"
)
from typing import Optional
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
def generate(
user_content: str, system_content: Optional[str] = ""
) -> str:
system_content = "Ispod je uputstvo koje opisuje zadatak, upareno sa unosom koji pruža dodatni kontekst. Napišite odgovor koji na odgovarajući način kompletira zahtev."
messages = [
{
"role": "system",
"content": system_content,
},
{"role": "user", "content": user_content},
]
tokenized_chat = tokenizer.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to("cuda")
text_streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
output = model.generate(
tokenized_chat,
streamer=text_streamer,
max_new_tokens=2048,
temperature=0.1,
repetition_penalty=1.11,
top_p=0.92,
top_k=1000,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
do_sample=True,
)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
generate("Nabroj mi sve planete suncevog sistemai reci mi koja je najveca planeta")
generate("Koja je razlika između lame, vikune i alpake?")
generate("Napišite kratku e-poruku Semu Altmanu dajući razloge za GPT-4 otvorenog koda")
Base model
mlabonne/Monarch-7B