Apertus Basics and Development
Built as Switzerland's open-source LLM, v1.0 launched in September 2025 and v1.1 Mini followed in June 2026. We'll walk through the background and the overall shape of the model family.
Apertus Development Timeline
What "Fully Open" Means, and Who Built It
Apertus (Latin for "open") is a large language model developed jointly by EPFL (École polytechnique fédérale de Lausanne), ETH Zurich, and the Swiss National Supercomputing Centre (CSCS). Switzerland's largest telecom operator Swisscom participates as a strategic partner.
The point worth noting is that Apertus publishes not only the model weights but also the training data composition, preprocessing pipeline, and training procedure in full. Meta's Llama and Mistral AI's models open the weights but keep the training data details private. The official site's banner — "Open weights, open data, open science" — captures this, and the transparency gap is what underpins the EU AI Act alignment.
Open weights, open data, open science — the project's stated guiding principle, displayed at the top of the site
Model Sizes and Technical Architecture
Apertus splits broadly into two generations. The v1.0 base models are 8B (8 billion parameters) and 70B (70 billion parameters). v1.1 Mini is a family of lightweight models distilled from the 8B teacher; the official site refers to "16 small language models" (a demonstration of distillation and quantization techniques as much as a standalone product family).
Looking at the v1.1-4B-Instruct specs, the architecture is a dense Transformer decoder combined with grouped-query attention (GQA): 24 layers, model dim 3,072, MLP dim 16,384, attention heads 24 (query) / 8 (key-value), and the xIELU activation function. Parameter counts are 3.8B for compute and 4.6B for storage.
| Item | 8B / 70B (v1.0) | 4B (v1.1 Mini) |
|---|---|---|
| Architecture | Dense Transformer + GQA | Dense Transformer + GQA |
| Context length | See official site | 4,096 tokens |
| Training tokens | 15T | 1.7T (distillation) |
| License | Apache 2.0 | Apache 2.0 |
| Training infrastructure | Alps supercomputer | 64 GH200 GPUs |
Dense transformer decoder, grouped-query attention, 24 layers, model dim 3072, MLP dim 16384, attention heads 24/8 (Q/KV), xIELU activation, 3.8B compute params / 4.6B storage params — from the Model Architecture section
Multilingual Design Covering 1,000+ Languages
The most striking design choice in Apertus is that the training data covers more than 1,000 languages. The Hugging Face model card is more specific, listing "1,811 languages." Coverage spans both major European languages and low-resource ones.
Training used 15 trillion tokens. The compute backbone is the Alps supercomputer in Lugano, Switzerland. According to the model card, the v1.1 Mini 4B model was distilled on 64 GH200 GPUs with roughly 2.0×10²² FLOPs of total compute.
languages: 1811 ... num_tokens: 15000000000000 ... Hardware: 64 GH200 GPUs. Total FLOPs: ~2.0E22 — from the Training section of the model card
This multilingual breadth is not just about coverage for its own sake. Switzerland is a country with four official languages — German, French, Italian, and Romansh — and routinely processes administrative documents across them. Apertus was designed from a context where multilingual processing is a real, operational need, not an add-on.
Why Apertus Stands Out, and How It Differs from Competitors
What sets Apertus apart from other open-source LLMs is not benchmark scores but a deliberate choice to differentiate on "design principles for trust and transparency."
Open-source LLM Transparency Comparison
Conceptual view of disclosure scope (compiled from official information)
EU AI Act Compliance and Data Sovereignty
The clearest design principle of Apertus is alignment with the EU AI Act. The official site labels three measures under "Built to meet EU AI Act requirements": respecting opt-outs, removing personally identifiable information (PII), and preventing memorization (the model memorizing training data verbatim).
With the EU AI Act phasing in through 2026, any model that cannot explain "what data it was trained on and how" carries legal risk in the European market. Apertus meets that requirement from the design stage, which lowers the deployment barrier for European government bodies and regulated industries.
Built to meet EU AI Act requirements: the model respects opt-outs, removes PII, prevents memorization — official statement on EU AI Act alignment
US moves on AI model export controls have also pushed sovereign AI up the agenda. The visible risk of depending on US-origin models has made the policy case for an indigenous European AI foundation more concrete.
Differences from Major Open-source LLMs
Placing Apertus side by side with other open-source LLMs makes its positioning clearer.
| Axis | Apertus | Llama 3 (Meta) | Mistral Large |
|---|---|---|---|
| Training data disclosure | Composition and preprocessing fully published | Not disclosed | Not disclosed |
| License | Apache 2.0 | Llama Community License | Apache 2.0 |
| Multilingual support | 1,000+ languages | English-optimized | European languages focus |
| EU AI Act alignment | Explicit | Not stated | Not stated |
| Commercial use | Unrestricted | Contract required above a certain scale | Unrestricted |
| Data residency guarantee | Switzerland + user-controlled | None | EU-hosted option available |
On raw benchmark scores, Llama 3's 70B and Mistral Large come out ahead. But on the axes of "can you audit the data provenance," "are you exposed to specific countries' export controls," and "can you operate without sending data outside your jurisdiction," Apertus has a clear edge.
The kind of general high-accuracy work that frontier models like Claude and ChatGPT handle well, and the data sovereignty / transparency / multilingual axis that Apertus owns, are not really competing — they cover different ground.
Real-world Government Deployment
Apertus is already running in production. The official site's news feed reports that in March 2026, the Swiss Canton of Ticino adopted Apertus for translating administrative documents across multiple languages. As an Italian-speaking canton that needs to translate federal-level German and French documents for residents, Ticino chose Apertus specifically because the data does not have to leave the country.
Apertus for Ticino (Mar 17) - Fine-tuned model powers in-house AI translation — from the news headline
In settings where "data cannot leave a specific jurisdiction" is a hard constraint — government being the canonical case — sovereign AI is moving from "option" to "prerequisite." Apertus models are freely downloadable from the swiss-ai organization on Hugging Face, which puts municipalities and research institutes with in-house engineering in a position to validate and deploy on their own.
How to Deploy Apertus, and Its Current Limitations
Here we look at the paths for actually trying Apertus and the performance constraints to understand before deploying.
Three Paths to Try Apertus
Download the model and run inference on your own server or local machine. Requires the technical knowledge to build out the environment.
API access with the guarantee that data stays in Switzerland. Aimed at enterprises and government.
A public inference service offered by the Swiss AI initiative. Aimed at research institutions.
Deploying from Hugging Face
For individuals and small teams who want a quick try, downloading the model from Hugging Face is the shortest path. From the swiss-ai organization page, pick a size and format (Base / Instruct) appropriate to your use case.
For a local machine, the v1.1 Mini 4B model is the realistic choice. Around 16GB of VRAM is enough to run the quantized variant. GGUF formats for Ollama and llama.cpp are also distributed, so anyone with local-LLM experience can drop it into an existing inference setup.
For serious operation of the 8B or 70B models, you will need a cloud GPU instance or an on-premises inference server. That configuration suits translation and long-document summarization.
Benchmark Performance and Matching Workloads
The official benchmark results for v1.1-4B-Instruct (from the Hugging Face model card) show a multilingual evaluation average of 0.473.
| Benchmark | Score | What it measures |
|---|---|---|
| MMLU | 0.504 | Overall knowledge and reasoning |
| TruthfulQA | 0.506 | Factual accuracy of answers |
| ARC | 0.332 | Scientific reasoning |
| Instruction Following | 0.550 | Adherence to instructions |
| LogiQA | 0.296 | Logical reasoning |
Multilingual benchmark results — MMLU: 0.504, TruthfulQA: 0.506, ARC: 0.332, IF: 0.550, LogiQA: 0.296, Average: 0.473 — from the Evaluation Results section
These are standard numbers for a model in the 4B-parameter class — comparable to lightweight variants of Gemini at similar size or Phi-3 Mini — and they sit clearly below frontier models.
Choosing Between Apertus and Frontier Models
The question to ask before adopting Apertus is not "is the performance enough" but "does this workload require data sovereignty." On performance alone, frontier models beat Apertus on almost every task. However, if any of the following apply, Apertus becomes the rational choice:
- Regulatory requirements that prevent data from leaving a specific jurisdiction (Switzerland / the EU)
- Obligations to audit and explain training data provenance (EU AI Act high-risk classification)
- A desire to diversify away from dependence on US-origin models
- Required accuracy for multilingual processing, especially minority European languages
- A need for commercial use without license fees (avoiding API metering)
Conversely, for English-centric high-accuracy reasoning, code generation, or long-form creative writing, leaving the work to a frontier model like Claude or Grok is more practical. Rather than forcing one model to cover everything, allocating between local and external models based on task character and data constraints is the most efficient design in practice.
Understand Apertus, Then Decide How It Fits Your Setup
Apertus is not a model aiming to be the world's best on raw performance. It is a model that delivers practical-grade output while guaranteeing transparency and data sovereignty. The gap to frontier models is real, but EU AI Act alignment, data residency guarantees, and full training-data transparency give it a distinct value proposition.
If your workload falls into "data cannot leave the country," "we need to explain where the training data came from," or "we process multilingual administrative documents," the first step is to download the 4B model from the swiss-ai organization on Hugging Face and validate it locally.
When you pass web pages or documents to the model, converting them to Markdown first preserves the heading hierarchy and table structure, which improves how accurately the model reads them.
Apertus Official Sources
This article is built from the following primary sources. For the most accurate, up-to-date information, please refer to the primary sources directly.