Quick Start Guide: Matrix42 LLM for On-Premise Deployment
Table of Contents
Overview
This document provides installation and configuration notes regarding the Matrix42 Large Language Model (LLM) for on-premise deployment.
Prerequisites
To configure the Matrix42 LLM, you must meet the following requirements:
- Have Docker installed on your system.
- For GPU acceleration, have an NVIDIA GPU with:
- NVIDIA driver installed on the host system;
- NVIDIA Container Toolkit installed (see Installing the NVIDIA Container Toolkit).
Installation
Use the Pre-Built Image (Recommended)
The container image is distributed as a split tar archive. To install, do the following:
```bash
# 1. Combine and extract the split archive
cat localgenai_1.0.2.000.tar localgenai_1.0.2.001.tar localgenai_1.0.2.002.tar localgenai_1.0.2.003.tar > localgenai_1.0.2.tar
# 2. Load the image into Docker
docker load -i localgenai_1.0.2.tar
```Run the Server
GPU Version
```bash
# Start with GPU acceleration
docker run --gpus all -p 8010:8010 -p 8011:8011 -e API_KEY=your-secret-key localgenai:1.0.2
```The server will be available at:
-
http://localhost:8010- Legacy API (AiCore 1.0.x compatibility) -
http://localhost:8011- New API (AiCore 1.1.x)
API Endpoints
All API requests require authentication using the Authorization: Bearer header with your API key.
Port 8011 - New API (AiCore 1.1.x)
Endpoints are organized by service prefix: /llm, /embeddings, /reranker.
| Endpoint | Description | Auth. Required |
|---|---|---|
/llm/v1/chat/completions |
Chat completions (OpenAI-compatible) | Yes |
/llm/v1/models |
List LLM models | Yes |
/llm/tokenize |
Tokenize text | Yes |
/llm/detokenize |
Detokenize tokens | Yes |
/llm/health |
LLM service health | No |
/embeddings/v1/embeddings |
Generate text embeddings | Yes |
/embeddings/v1/models |
List embedding models | Yes |
/embeddings/tokenize |
Tokenize text | Yes |
/embeddings/detokenize |
Detokenize tokens | Yes |
/embeddings/health |
Embeddings service health | No |
/reranker/v1/reranking |
Rerank documents by relevance | Yes |
/reranker/v1/models |
List reranker models | Yes |
/reranker/tokenize |
Tokenize text | Yes |
/reranker/detokenize |
Detokenize tokens | Yes |
/reranker/health |
Reranker service health | No |
/status |
Global service health status | No |
/status |
Simple health check | No |
Port 8010 - Legacy APPI (AiCore 1.0.x)
For backward compatibility with aicore 1.0.x clients.
| Endpoint | Description | Auth. Required |
|---|---|---|
/v1/chat/completions |
Chat completions (OpenAI-compatible) | Yes |
/v1/models |
List available models | Yes |
/v1/embeddings |
Generate text embeddings | Yes |
/v1/reranking |
Rerank documents by relevance | Yes |
/tokenize |
Tokenize text (completions model) | Yes |
/tokenize-embeddings |
Tokenize text (embeddings model) | Yes |
/detokenize-embeddings |
Detokenize text (embeddings model) | Yes |
/status |
Service health status | No |
/health |
Simple health check | No |
Usage Examples (New API - Port 8011)
Chat Completions
```bash
curl http://localhost:8011/llm/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-secret-key" \
-d '{"messages": [{"role": "user", "content": "Hello!"}]}'
```List LLM Models
```bash
curl http://localhost:8011/llm/v1/models \
-H "Authorization: Bearer your-secret-key"
```Embeddings
```bash
curl http://localhost:8011/embeddings/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-secret-key" \
-d '{"input": "Your text here"}'
```List Embedding Models
```bash
curl http://localhost:8011/embeddings/v1/models \
-H "Authorization: Bearer your-secret-key"
```Reranking
```bash
curl http://localhost:8011/reranker/v1/reranking \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-secret-key" \
-d '{"query": "search query", "documents": ["doc1", "doc2"]}'
```List Reranker Models
```bash
curl http://localhost:8011/reranker/v1/models \
-H "Authorization: Bearer your-secret-key"
```Tokenize (LLM)
```bash
curl http://localhost:8011/llm/tokenize \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-secret-key" \
-d '{"content": "Text to tokenize"}'
```Detokenize (LLM)
```bash
curl http://localhost:8011/llm/detokenize \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-secret-key" \
-d '{"tokens": [1874,311,1464,4476,553]}'
```Tokenize (Embeddings)
```bash
curl http://localhost:8011/embeddings/tokenize \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-secret-key" \
-d '{"content": "Text to tokenize"}'
```Detokenize (Embeddings)
```bash
curl http://localhost:8011/embeddings/detokenize \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-secret-key" \
-d '{"tokens": [24129,47,47,1098,20650]}'
```Tokenize (Reranker)
```bash
curl http://localhost:8011/embeddings/detokenize \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-secret-key" \
-d '{"tokens": [24129,47,47,1098,20650]}'
```Detokenize (Reranker)
```bash
curl http://localhost:8011/embeddings/detokenize \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-secret-key" \
-d '{"tokens": [24129,47,47,1098,20650]}'
```Usage Examples (Legacy API - Port 8010)
Chat Completions
```bash
curl http://localhost:8010/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-secret-key" \
-d '{"messages": [{"role": "user", "content": "Hello!"}]}'
```List Models
```bash
curl http://localhost:8010/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-secret-key" \
-d '{"messages": [{"role": "user", "content": "Hello!"}]}'
```Embeddings
```bash
curl http://localhost:8010/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-secret-key" \
-d '{"messages": [{"role": "user", "content": "Hello!"}]}'
```Reranking
```bash
curl http://localhost:8010/v1/reranking \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-secret-key" \
-d '{"query": "search query", "documents": ["doc1", "doc2"]}'
```Tokenize (Completions Model)
```bash
curl http://localhost:8010/tokenize \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-secret-key" \
-d '{"content": "Text to tokenize"}'
```Tokenize (Embeddings Model)
```bash
curl http://localhost:8010/tokenize-embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-secret-key" \
-d '{"content": "Text to tokenize"}'
```Detokenize (Embeddings Model)
```bash
curl http://localhost:8010/detokenize-embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-secret-key" \
-d '{"tokens": [24129,47,47,1098,20650]}'
```Checking the Server Status
Monitor the health of all services (no authentication required):
```bash
# New API (port 8011)
curl http://localhost:8011/status
# Legacy API (port 8010)
curl http://localhost:8010/status
```Returns 200 OK when all services are healthy, or 503 if any service is down.
Simple health check:
```bash
curl http://localhost:8011/health
```Security Notes
- Always set a custom API key using
-e API_KEY=your-secret-key. - The default API key (
m42llm) is not secure for production use. - Generate a secure key:
openssl rand -hex 32
Common Issues
Container Does Not Start
Check logs: docker logs <container-id>
Services Not Responding
Check status: docker exec <container-id> supervisorctl status