Quick Start Guide: Matrix42 LLM for On-Premise Deployment

+ More

Table of Contents

Overview Prerequisites Installation Use the Pre-Built Image (Recommended) Run the Server GPU Version API Endpoints Port 8011 - New API (AiCore 1.1.x) Port 8010 - Legacy APPI (AiCore 1.0.x) Usage Examples (New API - Port 8011) Usage Examples (Legacy API - Port 8010) Checking the Server Status Security Notes Common Issues Container Does Not Start Services Not Responding

Overview

This document provides installation and configuration notes regarding the Matrix42 Large Language Model (LLM) for on-premise deployment.

Prerequisites

To configure the Matrix42 LLM, you must meet the following requirements:

Have Docker installed on your system.
For GPU acceleration, have an NVIDIA GPU with:
- NVIDIA driver installed on the host system;
- NVIDIA Container Toolkit installed (see Installing the NVIDIA Container Toolkit).

Installation

Use the Pre-Built Image (Recommended)

The container image is distributed as a split tar archive. To install, do the following:

```bash
# 1. Combine and extract the split archive
cat localgenai_1.0.2.000.tar localgenai_1.0.2.001.tar  localgenai_1.0.2.002.tar  localgenai_1.0.2.003.tar > localgenai_1.0.2.tar
# 2. Load the image into Docker
docker load -i localgenai_1.0.2.tar
```

Run the Server

GPU Version

```bash
# Start with GPU acceleration
docker run --gpus all -p 8010:8010 -p 8011:8011 -e API_KEY=your-secret-key localgenai:1.0.2
```

The server will be available at:

http://localhost:8010 - Legacy API (AiCore 1.0.x compatibility)
http://localhost:8011 - New API (AiCore 1.1.x)

API Endpoints

All API requests require authentication using the Authorization: Bearer header with your API key.

Port 8011 - New API (AiCore 1.1.x)

Endpoints are organized by service prefix: /llm, /embeddings, /reranker.

Endpoint	Description	Auth. Required
`/llm/v1/chat/completions`	Chat completions (OpenAI-compatible)	Yes
`/llm/v1/models`	List LLM models	Yes
`/llm/tokenize`	Tokenize text	Yes
`/llm/detokenize`	Detokenize tokens	Yes
`/llm/health`	LLM service health	No
`/embeddings/v1/embeddings`	Generate text embeddings	Yes
`/embeddings/v1/models`	List embedding models	Yes
`/embeddings/tokenize`	Tokenize text	Yes
`/embeddings/detokenize`	Detokenize tokens	Yes
`/embeddings/health`	Embeddings service health	No
`/reranker/v1/reranking`	Rerank documents by relevance	Yes
`/reranker/v1/models`	List reranker models	Yes
`/reranker/tokenize`	Tokenize text	Yes
`/reranker/detokenize`	Detokenize tokens	Yes
`/reranker/health`	Reranker service health	No
`/status`	Global service health status	No
`/status`	Simple health check	No

Port 8010 - Legacy APPI (AiCore 1.0.x)

For backward compatibility with aicore 1.0.x clients.

Endpoint	Description	Auth. Required
`/v1/chat/completions`	Chat completions (OpenAI-compatible)	Yes
`/v1/models`	List available models	Yes
`/v1/embeddings`	Generate text embeddings	Yes
`/v1/reranking`	Rerank documents by relevance	Yes
`/tokenize`	Tokenize text (completions model)	Yes
`/tokenize-embeddings`	Tokenize text (embeddings model)	Yes
`/detokenize-embeddings`	Detokenize text (embeddings model)	Yes
`/status`	Service health status	No
`/health`	Simple health check	No

Usage Examples (New API - Port 8011)

Chat Completions

```bash
curl http://localhost:8011/llm/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer your-secret-key" \
 -d '{"messages": [{"role": "user", "content": "Hello!"}]}'
```

List LLM Models

```bash
curl http://localhost:8011/llm/v1/models \
 -H "Authorization: Bearer your-secret-key"
```

Embeddings

```bash
curl http://localhost:8011/embeddings/v1/embeddings \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer your-secret-key" \
 -d '{"input": "Your text here"}'
```

List Embedding Models

```bash
curl http://localhost:8011/embeddings/v1/models \
 -H "Authorization: Bearer your-secret-key"
```

Reranking

```bash
curl http://localhost:8011/reranker/v1/reranking \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer your-secret-key" \
 -d '{"query": "search query", "documents": ["doc1", "doc2"]}'
```

List Reranker Models

```bash
curl http://localhost:8011/reranker/v1/models \
 -H "Authorization: Bearer your-secret-key"
```

Tokenize (LLM)

```bash
curl http://localhost:8011/llm/tokenize \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"content": "Text to tokenize"}'
```

Detokenize (LLM)

```bash
curl http://localhost:8011/llm/detokenize \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"tokens": [1874,311,1464,4476,553]}'
```

Tokenize (Embeddings)

```bash
curl http://localhost:8011/embeddings/tokenize \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"content": "Text to tokenize"}'
```

Detokenize (Embeddings)

```bash
curl http://localhost:8011/embeddings/detokenize \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"tokens": [24129,47,47,1098,20650]}'
```

Tokenize (Reranker)

```bash
curl http://localhost:8011/embeddings/detokenize \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"tokens": [24129,47,47,1098,20650]}'
```

Detokenize (Reranker)

```bash
curl http://localhost:8011/embeddings/detokenize \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"tokens": [24129,47,47,1098,20650]}'
```

Usage Examples (Legacy API - Port 8010)

Chat Completions

```bash
curl http://localhost:8010/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"messages": [{"role": "user", "content": "Hello!"}]}'
```

List Models

```bash
curl http://localhost:8010/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"messages": [{"role": "user", "content": "Hello!"}]}'
```

Embeddings

```bash
curl http://localhost:8010/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"messages": [{"role": "user", "content": "Hello!"}]}'
```

Reranking

```bash
curl http://localhost:8010/v1/reranking \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"query": "search query", "documents": ["doc1", "doc2"]}'
```

Tokenize (Completions Model)

```bash
curl http://localhost:8010/tokenize \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"content": "Text to tokenize"}'
```

Tokenize (Embeddings Model)

```bash
curl http://localhost:8010/tokenize-embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"content": "Text to tokenize"}'
```

Detokenize (Embeddings Model)

```bash
curl http://localhost:8010/detokenize-embeddings \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer your-secret-key" \
 -d '{"tokens": [24129,47,47,1098,20650]}'
```

Checking the Server Status

Monitor the health of all services (no authentication required):

```bash
# New API (port 8011)
curl http://localhost:8011/status
# Legacy API (port 8010)
curl http://localhost:8010/status
```

Returns 200 OK when all services are healthy, or 503 if any service is down.

Simple health check:

```bash
curl http://localhost:8011/health
```

Security Notes

Always set a custom API key using -e API_KEY=your-secret-key.
The default API key (m42llm) is not secure for production use.
Generate a secure key: openssl rand -hex 32

Common Issues

Container Does Not Start

Check logs: docker logs <container-id>

Services Not Responding

Check status: docker exec <container-id> supervisorctl status