In today's digital world, Large Language Models (LLMs) are playing an increasingly important role. However, many companies are faced with the challenge of keeping their sensitive data secure while using powerful AI models. One solution: installing local Large Language Models. In this article, we will show you how to set up a local LLM on a Linux server using the Ollama software. This allows you to take advantage of powerful language models without sending sensitive data to the cloud.
Why local LLMs?
Many companies prefer local LLMs to maintain control over their data. Cloud-based solutions such as Microsoft Azure or AWS offer immense computing power, but data sovereignty often remains a critical point. Local installations make it possible to process highly sensitive information internally while exploiting the power of modern language models. To install Ollama on a Linux server you need:
- Linux distribution: any Linux distribution, we use Ubuntu Server 24.04 LTS
- Nvidia graphics card: A powerful card like the Nvidia RTX A5000 provides the computing power that LLMs need
- docker: To start Open WebUI as a Docker container.
Step 1: Install Ollama
Ollama enables the management and use of local LLMs. Installation is straightforward:
sudo curl -fsSL https://ollama.com/install.sh | sh
After installation, you should restart the server to ensure that all kernel components are loaded correctly.
Step 2: Check the Nvidia graphics card
Use nvidia-smi
to monitor the status of your graphics card:
nvidia-smi -l 1
Step 3: Install Docker
Docker is required to run Open WebUI. Installation instructions can be found here.
Step 4: Start Open WebUI
docker run -d --network=host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart always ghcr.io/open- webui/open-webui:main
Step 5: Access the user interface
http://192.168.0.5:8080
Step 6: Install and use models
ollama pull llama3
Tips for hardware optimization
example configurations
Entry-level configuration (for Llama 7B and simple applications)
- CPU: AMD Ryzen 9 or Intel i9
- GPU: NVIDIA RTX 3060 with 12GB VRAM
- RAM: 32 GB
- Storage: 1 TB NVMe SSD
Advanced configuration (for Llama 13B to 30B)
- CPU: AMD Threadripper or Intel Xeon
- GPU: NVIDIA RTX 3090 or A6000 with at least 24 GB VRAM
- RAM: 64-128 GB
- Storage: 2 TB NVMe SSD
High-end configuration (for Llama 65B and demanding applications)
- CPU: Dual AMD EPYC or Intel Xeon
- GPU: NVIDIA A100 or H100 (40 GB or more VRAM) or a cluster of multiple GPUs
- RAM: 128 GB or more
- Storage: 4 TB NVMe SSD
Conclusion
With Ollama, you can efficiently run local LLMs on your Linux server while retaining full control over your data. This solution is particularly suitable for companies that process sensitive information and still want to take advantage of modern language models. Click here for the practical video tutorial: