sakutto
Generative AI· v1.1

What Is Apertus? Switzerland's Sovereign AI Model — Specs, Multilingual Support, and How to Deploy It

ApertusSovereign AIOpen-source LLM

Apertus Basics and Development

Built as Switzerland's open-source LLM, v1.0 launched in September 2025 and v1.1 Mini followed in June 2026. We'll walk through the background and the overall shape of the model family.

Apertus Development Timeline

Sep 2025 v1.0 (8B / 70B) released under Apache 2.0
Mar 2026 Canton of Ticino adopts Apertus for administrative translation
Jun 2026 v1.1 Mini released (lightweight distilled family)

What "Fully Open" Means, and Who Built It

Apertus (Latin for "open") is a large language model developed jointly by EPFL (École polytechnique fédérale de Lausanne), ETH Zurich, and the Swiss National Supercomputing Centre (CSCS). Switzerland's largest telecom operator Swisscom participates as a strategic partner.

The point worth noting is that Apertus publishes not only the model weights but also the training data composition, preprocessing pipeline, and training procedure in full. Meta's Llama and Mistral AI's models open the weights but keep the training data details private. The official site's banner — "Open weights, open data, open science" — captures this, and the transparency gap is what underpins the EU AI Act alignment.

View official source →
Open weights, open data, open science — the project's stated guiding principle, displayed at the top of the site

Model Sizes and Technical Architecture

Apertus splits broadly into two generations. The v1.0 base models are 8B (8 billion parameters) and 70B (70 billion parameters). v1.1 Mini is a family of lightweight models distilled from the 8B teacher; the official site refers to "16 small language models" (a demonstration of distillation and quantization techniques as much as a standalone product family).

Looking at the v1.1-4B-Instruct specs, the architecture is a dense Transformer decoder combined with grouped-query attention (GQA): 24 layers, model dim 3,072, MLP dim 16,384, attention heads 24 (query) / 8 (key-value), and the xIELU activation function. Parameter counts are 3.8B for compute and 4.6B for storage.

Item8B / 70B (v1.0)4B (v1.1 Mini)
ArchitectureDense Transformer + GQADense Transformer + GQA
Context lengthSee official site4,096 tokens
Training tokens15T1.7T (distillation)
LicenseApache 2.0Apache 2.0
Training infrastructureAlps supercomputer64 GH200 GPUs
View official source →
Dense transformer decoder, grouped-query attention, 24 layers, model dim 3072, MLP dim 16384, attention heads 24/8 (Q/KV), xIELU activation, 3.8B compute params / 4.6B storage params — from the Model Architecture section

Multilingual Design Covering 1,000+ Languages

The most striking design choice in Apertus is that the training data covers more than 1,000 languages. The Hugging Face model card is more specific, listing "1,811 languages." Coverage spans both major European languages and low-resource ones.

Training used 15 trillion tokens. The compute backbone is the Alps supercomputer in Lugano, Switzerland. According to the model card, the v1.1 Mini 4B model was distilled on 64 GH200 GPUs with roughly 2.0×10²² FLOPs of total compute.

View official source →
languages: 1811 ... num_tokens: 15000000000000 ... Hardware: 64 GH200 GPUs. Total FLOPs: ~2.0E22 — from the Training section of the model card

This multilingual breadth is not just about coverage for its own sake. Switzerland is a country with four official languages — German, French, Italian, and Romansh — and routinely processes administrative documents across them. Apertus was designed from a context where multilingual processing is a real, operational need, not an add-on.

Why Apertus Stands Out, and How It Differs from Competitors

What sets Apertus apart from other open-source LLMs is not benchmark scores but a deliberate choice to differentiate on "design principles for trust and transparency."

Open-source LLM Transparency Comparison

Apertus
Weights + data composition + preprocessing + procedure + opt-outs
Llama 3
Weights + partial technical paper
Mistral
Weights + API

Conceptual view of disclosure scope (compiled from official information)

EU AI Act Compliance and Data Sovereignty

The clearest design principle of Apertus is alignment with the EU AI Act. The official site labels three measures under "Built to meet EU AI Act requirements": respecting opt-outs, removing personally identifiable information (PII), and preventing memorization (the model memorizing training data verbatim).

With the EU AI Act phasing in through 2026, any model that cannot explain "what data it was trained on and how" carries legal risk in the European market. Apertus meets that requirement from the design stage, which lowers the deployment barrier for European government bodies and regulated industries.

View official source →
Built to meet EU AI Act requirements: the model respects opt-outs, removes PII, prevents memorization — official statement on EU AI Act alignment

US moves on AI model export controls have also pushed sovereign AI up the agenda. The visible risk of depending on US-origin models has made the policy case for an indigenous European AI foundation more concrete.

Differences from Major Open-source LLMs

Placing Apertus side by side with other open-source LLMs makes its positioning clearer.

AxisApertusLlama 3 (Meta)Mistral Large
Training data disclosureComposition and preprocessing fully publishedNot disclosedNot disclosed
LicenseApache 2.0Llama Community LicenseApache 2.0
Multilingual support1,000+ languagesEnglish-optimizedEuropean languages focus
EU AI Act alignmentExplicitNot statedNot stated
Commercial useUnrestrictedContract required above a certain scaleUnrestricted
Data residency guaranteeSwitzerland + user-controlledNoneEU-hosted option available

On raw benchmark scores, Llama 3's 70B and Mistral Large come out ahead. But on the axes of "can you audit the data provenance," "are you exposed to specific countries' export controls," and "can you operate without sending data outside your jurisdiction," Apertus has a clear edge.

The kind of general high-accuracy work that frontier models like Claude and ChatGPT handle well, and the data sovereignty / transparency / multilingual axis that Apertus owns, are not really competing — they cover different ground.

Real-world Government Deployment

Apertus is already running in production. The official site's news feed reports that in March 2026, the Swiss Canton of Ticino adopted Apertus for translating administrative documents across multiple languages. As an Italian-speaking canton that needs to translate federal-level German and French documents for residents, Ticino chose Apertus specifically because the data does not have to leave the country.

View official source →
Apertus for Ticino (Mar 17) - Fine-tuned model powers in-house AI translation — from the news headline

In settings where "data cannot leave a specific jurisdiction" is a hard constraint — government being the canonical case — sovereign AI is moving from "option" to "prerequisite." Apertus models are freely downloadable from the swiss-ai organization on Hugging Face, which puts municipalities and research institutes with in-house engineering in a position to validate and deploy on their own.

How to Deploy Apertus, and Its Current Limitations

Here we look at the paths for actually trying Apertus and the performance constraints to understand before deploying.

Three Paths to Try Apertus

Path 1: Hugging Face (free, self-hosted)

Download the model and run inference on your own server or local machine. Requires the technical knowledge to build out the environment.

Path 2: Swisscom sovereign platform (paid, managed)

API access with the guarantee that data stays in Switzerland. Aimed at enterprises and government.

Path 3: Public AI Inference Utility (research / evaluation)

A public inference service offered by the Swiss AI initiative. Aimed at research institutions.

Deploying from Hugging Face

For individuals and small teams who want a quick try, downloading the model from Hugging Face is the shortest path. From the swiss-ai organization page, pick a size and format (Base / Instruct) appropriate to your use case.

For a local machine, the v1.1 Mini 4B model is the realistic choice. Around 16GB of VRAM is enough to run the quantized variant. GGUF formats for Ollama and llama.cpp are also distributed, so anyone with local-LLM experience can drop it into an existing inference setup.

For serious operation of the 8B or 70B models, you will need a cloud GPU instance or an on-premises inference server. That configuration suits translation and long-document summarization.

Benchmark Performance and Matching Workloads

The official benchmark results for v1.1-4B-Instruct (from the Hugging Face model card) show a multilingual evaluation average of 0.473.

BenchmarkScoreWhat it measures
MMLU0.504Overall knowledge and reasoning
TruthfulQA0.506Factual accuracy of answers
ARC0.332Scientific reasoning
Instruction Following0.550Adherence to instructions
LogiQA0.296Logical reasoning
View official source →
Multilingual benchmark results — MMLU: 0.504, TruthfulQA: 0.506, ARC: 0.332, IF: 0.550, LogiQA: 0.296, Average: 0.473 — from the Evaluation Results section

These are standard numbers for a model in the 4B-parameter class — comparable to lightweight variants of Gemini at similar size or Phi-3 Mini — and they sit clearly below frontier models.

Choosing Between Apertus and Frontier Models

The question to ask before adopting Apertus is not "is the performance enough" but "does this workload require data sovereignty." On performance alone, frontier models beat Apertus on almost every task. However, if any of the following apply, Apertus becomes the rational choice:

  • Regulatory requirements that prevent data from leaving a specific jurisdiction (Switzerland / the EU)
  • Obligations to audit and explain training data provenance (EU AI Act high-risk classification)
  • A desire to diversify away from dependence on US-origin models
  • Required accuracy for multilingual processing, especially minority European languages
  • A need for commercial use without license fees (avoiding API metering)

Conversely, for English-centric high-accuracy reasoning, code generation, or long-form creative writing, leaving the work to a frontier model like Claude or Grok is more practical. Rather than forcing one model to cover everything, allocating between local and external models based on task character and data constraints is the most efficient design in practice.

Understand Apertus, Then Decide How It Fits Your Setup

Apertus is not a model aiming to be the world's best on raw performance. It is a model that delivers practical-grade output while guaranteeing transparency and data sovereignty. The gap to frontier models is real, but EU AI Act alignment, data residency guarantees, and full training-data transparency give it a distinct value proposition.

If your workload falls into "data cannot leave the country," "we need to explain where the training data came from," or "we process multilingual administrative documents," the first step is to download the 4B model from the swiss-ai organization on Hugging Face and validate it locally.

When you pass web pages or documents to the model, converting them to Markdown first preserves the heading hierarchy and table structure, which improves how accurately the model reads them.

Free ToolURL to Markdown ConverterConvert any public web page URL to Markdown. Preserves headings, tables, lists, and links — perfect for LLM and RAG preprocessing, research notes, and archiving web articles.Try it now →

Apertus Official Sources

This article is built from the following primary sources. For the most accurate, up-to-date information, please refer to the primary sources directly.

APERTVS.ai (official site)View official source →
Hugging Face — swiss-ai OrganizationView official source →

FAQ

Q. Is Apertus free to use?
Yes. It is released under the Apache 2.0 license and can be downloaded for free from Hugging Face. Commercial use is unrestricted, and there are no license fees or API charges — you can deploy it on your own servers or cloud without paying for the model itself. If you use it through Swisscom's sovereign platform you pay for the infrastructure, but not for the model license.
Open weights, open data, open science APERTVS.ai official site
Q. Which languages does Apertus support?
The official site states that it supports more than 1,000 languages. The Hugging Face model card is more specific, recording 1,811 languages used in training. Coverage spans both major European languages and low-resource languages, and it is particularly well suited for Switzerland's official languages: German, French, Italian, and Romansh.
languages: 1811 Hugging Face model card (swiss-ai/Apertus-v1.1-4B-Instruct)
Q. How is Apertus different from Llama or Mistral?
The decisive difference is that Apertus fully publishes its training data composition and preprocessing pipeline. Llama is open-weight but keeps training data details private, and Mistral is similar. Apertus is transparent down to PII removal and opt-out handling in the training pipeline, which is the basis for its EU AI Act transparency claim.
Open weights, open data, open science APERTVS.ai official site
Q. What is Apertus' context window?
According to the Hugging Face model card, the context length of v1.1-4B-Instruct is 4,096 tokens. The 8B and 70B models support longer contexts. For long-document processing or translation, the larger models are a better fit.
max_position_embeddings: 4096 Hugging Face model card (swiss-ai/Apertus-v1.1-4B-Instruct)
Q. Are there restrictions on commercial use?
Apache 2.0 places no restrictions on commercial use. Embedding in your own product, serving via your own API, and integrating into internal systems are all permitted. You must include the license notice, but no royalties or usage fees apply.
license: apache-2.0 Hugging Face model card (swiss-ai/Apertus-v1.1-4B-Instruct)
Q. How much compute went into training Apertus?
Training runs on the Alps supercomputer operated by the Swiss National Supercomputing Centre (CSCS). The v1.1-4B-Instruct model card states that the 4B distillation used 64 GH200 GPUs and roughly 2.0×10²² FLOPs of total compute.
Hardware: 64 GH200 GPUs. Total FLOPs: ~2.0E22 Hugging Face model card (swiss-ai/Apertus-v1.1-4B-Instruct)
Q. What use cases is Apertus best suited for?
Tasks that demand multilingual processing are its strong suit. The official site reports adoption by the Canton of Ticino for translating administrative documents. It is a good fit for government, healthcare, and financial organizations that need to process data without sending it outside their jurisdiction.
Apertus for Ticino (Mar 17) - Fine-tuned model powers in-house AI translation APERTVS.ai news feed
Q. What is Apertus v1.1 Mini?
It is a family of lightweight models released in June 2026. The 8B model was used as a teacher to distill multiple smaller sizes. The 4B model is trained with a loss of 90% KL-divergence from the teacher and 10% label cross-entropy. The design targets inference on edge devices and resource-constrained environments.
Distillation loss: 90% KL-divergence from teacher / 10% label cross-entropy Hugging Face model card (swiss-ai/Apertus-v1.1-4B-Instruct)
Q. Can Apertus replace GPT-4 or Claude?
Not as a wholesale replacement for frontier models (GPT-4, Claude, Gemini) at this point. The 4B model's MMLU score is 0.504, well below the 0.85–0.90 band of frontier models. Apertus' strengths lie not on the peak-performance axis but on data sovereignty, transparency, and multilingual coverage. The practical approach is to use it for tasks that match those strengths.
MMLU: 0.504 Hugging Face model card (swiss-ai/Apertus-v1.1-4B-Instruct)
Q. How does Apertus relate to the EU AI Act?
Transparency requirements from the EU AI Act are built into the design from the outset. The official site explicitly commits to respecting opt-outs, removing personally identifiable information (PII), and preventing memorization.
Built to meet EU AI Act requirements: the model respects opt-outs, removes PII, prevents memorization APERTVS.ai official site

Related Tools

Related Tool Categories

Articles