Instructions to use NewstaR/Koss-7B-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use NewstaR/Koss-7B-chat with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="NewstaR/Koss-7B-chat")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("NewstaR/Koss-7B-chat") model = AutoModelForCausalLM.from_pretrained("NewstaR/Koss-7B-chat") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use NewstaR/Koss-7B-chat with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "NewstaR/Koss-7B-chat" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NewstaR/Koss-7B-chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/NewstaR/Koss-7B-chat
- SGLang
How to use NewstaR/Koss-7B-chat with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "NewstaR/Koss-7B-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NewstaR/Koss-7B-chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "NewstaR/Koss-7B-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NewstaR/Koss-7B-chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use NewstaR/Koss-7B-chat with Docker Model Runner:
docker model run hf.co/NewstaR/Koss-7B-chat
Koss-7B
Training Time: 1.85h
| Model | Average ⬆️ | ARC | HellaSwag | MMLU | TruthfulQA |
|---|---|---|---|---|---|
| NewstaR/Koss-7B-chat 📑 | 55.79 | 53.67 | 78.79 | 46.72 | 43.97 |
Koss-7B is the smallest variant in the Koss series of neural network models developed by Kaleido AI for natural language processing. With 7 billion parameters, it retains much of the architecture and capabilities of the larger Koss models but requires less computation to run.
Koss-7B is intended for general NLP applications including text classification, language generation, question answering, translation, and dialogue. Its small size makes it suitable for applications with constraints on memory, compute, latency, or carbon emissions.
Factors:
- Koss-7B should not be used for tasks requiring very specialized knowledge or skills, since its limited parameters reduce expertise in niche domains. For best performance, finetune on in-domain data.
- As with all AI systems, Koss-7B's behavior is dependent on its training data. It may exhibit biases inherited from non-diverse data. Audit data and mitigation strategies to avoid unfair impacts.
- Koss-7B is not a creative agent. Its outputs will be limited to recombinations of patterns in its training data. Do not ascribe human-like agency or consciousness.
Recommended Prompt Template:
<s>[INST] {prompt} [/INST] {response} </s>
or
<s>[INST] {prompt} [/INST]
The model will start it's response after the [/INST] Example:
<s>[INST] Why did the chicken cross the road? [/INST] To get to the other side! </s>
Loss
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
| Metric | Value |
|---|---|
| Avg. | 44.98 |
| ARC (25-shot) | 53.67 |
| HellaSwag (10-shot) | 78.79 |
| MMLU (5-shot) | 46.72 |
| TruthfulQA (0-shot) | 43.97 |
| Winogrande (5-shot) | 71.74 |
| GSM8K (5-shot) | 7.35 |
| DROP (3-shot) | 12.62 |
- Downloads last month
- 1,033
