| --- |
| license: apache-2.0 |
| --- |
| |
|
|
| # Download the llamafile |
| - Download the llamafile from https://huggingface.co/avilum/llamafile-python-openai-template/blob/main/TinyLlama-1.1B.llamafile |
| - Use the download button. |
|
|
| # Run the server |
| ```shell |
| chmod +x TinyLlama-1.1B.llamafile |
| |
| ./TinyLlama-1.1B.llamafile --server --host 0.0.0.0 --port 1234 |
| ``` |
|
|
| # Use the LLM with OpenAI SDK: |
| ```python |
| from openai import OpenAI |
| |
| |
| client = OpenAI(base_url="http://127.0.0.1:1234/v1", api_key="test") |
| |
| # Prompt |
| prompt = "Hi, tell me something new about AppSec" |
| |
| # Send API request to llamafile server |
| stream = client.chat.completions.create( |
| model="avi-llmsky", |
| messages=[{"role": "user", "content": prompt}], |
| stream=True, |
| ) |
| |
| # Print the responses |
| for chunk in stream: |
| if chunk.choices[0].delta.content is not None: |
| print(chunk.choices[0].delta.content, end="") |
| |
| ``` |