Prerequisites #
Our guides are written with the expectation that the following requirements are met:
- Server: VPS/Dedicated Vultr / Contabo
- Ubuntu operating system 22.04 configured
- Docker/Docker Compose and Nginx installed
- SSH client with SFTP capabilities
Install Ollama #
To install Ollama is really just a one liner:
curl -fsSL https://ollama.com/install.sh | sh
Verify install:fffsadfsadf
ollama
Output:
Usage:
ollama [flags]
ollama [command]
Available Commands:
serve Start ollama
...
Add Ollama to start with server #
Add User for Ollama:
sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama
Create a conf file:
nano /etc/systemd/system/ollama.service
Hint: save with crtl+O, exit with crtl+X
Write into the file:
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
[Install]
WantedBy=default.target
Reload and start service:
sudo systemctl daemon-reload
sudo systemctl enable ollama
Running a modell #
You can download & start any model in playground mode, simple as:
ollama run llama3.1:8b
Leave the playground chat:
/bye
Use Ollama as an API #
The best way to use Ollama is via the Http/API interface, the API is by default listening on the Port: 11434
.
Open Nginx config with the following command:
nano /etc/nginx/nginx.conf
Hint: You can save with CRTL + O
and exit with CRTL + X
on Mac Command + O / Command + X
Use the following Nginx conf:
user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
events {
worker_connections 768;
}
http {
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
client_max_body_size 10M;
include /etc/nginx/mime.types;
default_type application/octet-stream;
gzip on;
gzip_disable "msie6";
server {
listen 80;
server_name yourdomain.com;
location / {
proxy_set_header Host $host;
proxy_pass http://127.0.0.1:11434;
proxy_redirect off;
}
}
}
Access Ollama API from Javascript #
Here is a simple example how you can integrate Ollama API into a Javascript application using the Ollama
package:
npm install ollama
import { Ollama } from "ollama";
const ollama = new Ollama({ host: "yourdomain.com" });
const start = async () => {
const response = await ollama.chat({
model: "llama3.1",
messages: [
{
role: "system",
content: `You are an artificial intelligence assistant, you should response questions with best of your knowledge`,
},
{
role: "user",
content: `Why the sky is blue?`,
},
],
stream: false,
options: {
num_keep: 5500, // Context limits
// seed: 42, // Random seed, if this setting enabled you should always get the same answer for the same question
num_predict: 5500, // Context limits
num_ctx: 15000, // Maximum context
top_k: 20,
top_p: 0.8,
tfs_z: 0.5,
typical_p: 0.9, // 0-1 How professional should be the answer, 0.9 good for programming 0.2 good of creativity
repeat_last_n: 33,
repeat_penalty: 1.3,
use_mlock: true, // Locking memory better performance
num_thread: 15, // How many CPU thread should be used for the answer
},
});
console.log(response);
console.log(response.message.content);
};
start().catch(console.error);