Run your own LLM with Ollama

Published: Aug 19, 2024 by Michael Schmidt

Tags:

#ollama #llm #chatgpt #llama3.1

Table of Content

Prerequisites #

Our guides are written with the expectation that the following requirements are met:

Server: VPS/Dedicated Vultr / Contabo
Ubuntu operating system 22.04 configured
Docker/Docker Compose and Nginx installed
SSH client with SFTP capabilities

Install Ollama #

To install Ollama is really just a one liner:

curl -fsSL https://ollama.com/install.sh | sh

Verify install:fffsadfsadf

ollama

Output:
Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve       Start ollama
  ...

Add Ollama to start with server #

Add User for Ollama:

sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama

Create a conf file:

nano /etc/systemd/system/ollama.service

Hint: save with crtl+O, exit with crtl+X

Write into the file:

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3

[Install]
WantedBy=default.target

Reload and start service:

sudo systemctl daemon-reload
sudo systemctl enable ollama

Running a modell #

You can download & start any model in playground mode, simple as:

ollama run llama3.1:8b

Leave the playground chat:

/bye

Use Ollama as an API #

The best way to use Ollama is via the Http/API interface, the API is by default listening on the Port: 11434.

Open Nginx config with the following command:

nano /etc/nginx/nginx.conf

Hint: You can save with CRTL + O and exit with CRTL + X on Mac Command + O / Command + X

Use the following Nginx conf:

user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
    worker_connections 768;
}

http {

        sendfile on;
        tcp_nopush on;
        tcp_nodelay on;
        keepalive_timeout 65;
        types_hash_max_size 2048;
        client_max_body_size 10M;
        include /etc/nginx/mime.types;
        default_type application/octet-stream;
        gzip on;
        gzip_disable "msie6";


    server {
        listen 80;
        server_name yourdomain.com;
        location / {
            proxy_set_header Host $host;
            proxy_pass http://127.0.0.1:11434;
            proxy_redirect off;
        }
    }
}

Access Ollama API from Javascript #

Here is a simple example how you can integrate Ollama API into a Javascript application using the Ollama package:

npm install ollama

import { Ollama } from "ollama";


const ollama = new Ollama({ host: "yourdomain.com" });

const start = async () => {
  const response = await ollama.chat({
    model: "llama3.1",
    messages: [
      {
        role: "system",
        content: `You are an artificial intelligence assistant, you should response questions with best of your knowledge`,
      },
      {
        role: "user",
        content: `Why the sky is blue?`,
      },
    ],
    stream: false,
    options: {
      num_keep: 5500, // Context limits
      //   seed: 42, // Random seed, if this setting enabled you should always get the same answer for the same question
      num_predict: 5500, // Context limits
      num_ctx: 15000, // Maximum context
      top_k: 20,
      top_p: 0.8,
      tfs_z: 0.5,
      typical_p: 0.9, // 0-1 How professional should be the answer, 0.9 good for programming 0.2 good of creativity
      repeat_last_n: 33,
      repeat_penalty: 1.3,
      use_mlock: true, // Locking memory better performance
      num_thread: 15, // How many CPU thread should be used for the answer
    },
  });

  console.log(response);
  console.log(response.message.content);
};

start().catch(console.error);