Instructions to use rishiraj/smol-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rishiraj/smol-7b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="rishiraj/smol-7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("rishiraj/smol-7b")
model = AutoModelForCausalLM.from_pretrained("rishiraj/smol-7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use rishiraj/smol-7b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rishiraj/smol-7b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rishiraj/smol-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/rishiraj/smol-7b

SGLang

How to use rishiraj/smol-7b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "rishiraj/smol-7b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rishiraj/smol-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "rishiraj/smol-7b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rishiraj/smol-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use rishiraj/smol-7b with Docker Model Runner:
```
docker model run hf.co/rishiraj/smol-7b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Smol 7B

This model is a fine-tuned version of openchat/openchat_3.5 on the open source dataset HuggingFaceH4/no_robots using the recipes published in The Alignment Handbook.

Model date

rishiraj/smol-7b was trained between 1st and 3rd December, 2023.

Evaluation

It achieves the following results on the Open_LLM_Leaderboard. At the time of release, smol-7b is the highest ranked 7B chat model on the MMLU Benchmark.

Model	Average	ARC	HellaSwag	MMLU	TruthfulQA	Winogrande	GSM8K
rishiraj/smol-7b	67.11	63.74	84.77	65	46.17	80.66	62.32
argilla/notus-7b-v1	63.49	64.59	84.83	63.04	54.35	79.56	34.57
Intel/neural-chat-7b-v3-1	61.59	66.21	83.64	62.37	59.65	78.14	19.56
HuggingFaceH4/zephyr-7b-beta	61.59	62.46	84.35	60.7	57.83	77.11	27.07
Qwen/Qwen-7B	59.19	51.37	78.47	59.84	47.79	72.69	44.96
microsoft/Orca-2-7b	54.55	54.1	76.19	56.37	52.45	73.48	14.71
01-ai/Yi-6B	54.08	55.55	76.57	64.11	41.96	74.19	12.13

Inference procedure

Here's how you can run the model using the pipeline() function from 🤗 Transformers:

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="rishiraj/smol-7b", torch_dtype=torch.bfloat16, device_map="auto")

# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate"
    },
    {
        "role": "user",
        "content": "How many helicopters can a human eat in one sitting?"
    }
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 128
total_train_batch_size: 512
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss
2.0569	0.16	3	2.0409

Framework versions

Transformers 4.35.2
Pytorch 2.1.1+cu121
Datasets 2.14.6
Tokenizers 0.14.1

Citation Information

@misc{rishiraj2023smol,
  author = {Rishiraj Acharya},
  title = {Smol 7B},
  year = {2023},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  howpublished = {\url{https://huggingface.co/rishiraj/smol-7b}}
}

Downloads last month: 11

Safetensors

Model size

7B params

Tensor type

BF16

Model tree for rishiraj/smol-7b

Base model

openchat/openchat_3.5

Finetuned

(28)

this model

Merges

2 models

Quantizations

3 models

rishiraj
/

smol-7b