Can local LLMs fully replace Claude or GPT?

Today they are well suited to self-contained, routine tasks such as code cleanup, test generation, and proofreading that do not need fresh information. Checking the latest library specs, making large design decisions, and work that demands strict accuracy still favor top-tier cloud models. Rather than a full replacement, splitting work by task is the realistic approach.

Is it safe to let a local LLM read my company's documents?

A local LLM processes everything on your own machine, so customer data and source code can be handled without being sent outside. The ability to process internal documents that cannot be sent to the cloud is especially valuable in confidential fields such as law, medicine, and public administration.

Local LLMs Reach Practical Coding: A 2022 Mac Hits ~75% of Frontier Cloud AI

Q: Do you need an expensive PC to run a local LLM (local AI)?

A machine with around 64GB of memory is a reasonable target. Before buying new hardware, the safe first step is to load quantized versions of 'Qwen 2.5 Coder' or 'Gemma 4' via 'Ollama' on your current PC and test how far your routine tasks get.

Why Local LLMs (Local AI) Have Reached a Practical Stage

According to Boykis, the machine she uses is an "M2 Mac" released back in 2022, with 64GB of memory. On this far-from-latest device, she reports reaching about 75% of the accuracy and speed of frontier cloud AI. Work that was impossible six months ago, she says, now finishes entirely on a laptop at hand.

What stands out is this "jump in half a year." Until recently, local LLMs were generally seen as "they run, but they're slow and not very smart." Yet Boykis notes that the double-checking she used to do—re-verifying cloud API output with a separate model—has become largely unnecessary as local accuracy improved. In other words, there are now more situations where you can trust the answer your own machine returns and use it directly.

Reaching ~75% accuracy	A 2022 "M2 Mac" (64GB memory) achieves accuracy and speed close to frontier cloud AI.
Models usable in practice	Quantized models such as "Gemma 4," "Qwen 2.5 Coder," and OpenAI's open-weight "GPT-OSS-20B" run via "Ollama," "LM Studio," and "llama.cpp."
Close on benchmarks	"Qwen 3.6" on a single GPU (around 18GB VRAM) scores roughly 77% on "SWE-bench," with "Kimi K2.6" and "Devstral" approaching paid models.
Cost reversal	A home server with four "RTX 3090" cards reportedly runs for about $6/month equivalent—cheaper than cloud metered billing even including electricity.

The Models and Coding Tasks Local LLMs Handle in Practice

The models Boykis uses day to day include Google's "Gemma 4" family (gemma-4-26b, gemma-4-12b-qat), Alibaba's "Qwen 3 MoE" and "Qwen 2.5 Coder," and OpenAI's open-weight "GPT-OSS-20B"—all quantized models. For inference she switches between "LM Studio," "Ollama," and "llama.cpp." The tasks range from refactoring Python scripts into five or six modules, fixing lint around type hints, generating unit tests, scaffolding a recommendation system, to building an app that extracts trending arXiv topics—all self-contained work that needs no fresh information from the web.

This trend goes beyond one person's impression. Benchmark roundups show "Qwen 3.6" on a single GPU (around 18GB VRAM) scoring roughly 77% on "SWE-bench," a test of real bug-fix tasks, with open models such as "Kimi K2.6" and "Devstral" closing in on paid cloud models. Combining "OpenCode" and "Ollama," developers report reproducing a "Claude Code"-style terminal coding agent locally, with no cloud dependency.

Open-Weight Releases and a Shifting Cloud Landscape Behind the Local LLM Boom

The backdrop is a wave of major players releasing high-performance models as open weights. OpenAI's "GPT-OSS," Google's "Gemma 4," and Alibaba's "Qwen"—performance that used to live inside proprietary services is now downloadable by anyone. The more capable models are handed out for free, the more realistic the local option becomes.

The market numbers are shifting too. Sensor Tower reports that ChatGPT's user share dropped below 50% for the first time, with Gemini and Claude catching up. On the same day, June 17, 2026, Anthropic published an analysis of who uses its coding AI "Claude Code" and for what. Together these suggest that AI usage is moving from experiments by a handful of power users toward everyday work on the ground—and that the single-cloud-giant picture is quietly starting to wobble.

The Hacker News Reaction: Local vs. Cloud Is an "Economic Choice"

The 510 comments in the Hacker News discussion are far from uniform praise. Skeptical voices ("frontier models still win in the end") sit alongside testimonials ("I can't go back to the cloud")—the familiar mix of hype and doubt that accompanies any new technology.

One post that drew particular attention described running a roughly $6/month-equivalent AI setup on a home server with four NVIDIA "RTX 3090" cards, with figures showing it comes out cheaper than cloud metered billing even after electricity. Through exchanges like this, a calm view is spreading: local versus cloud is not a matter of ideology or taste, but an economic choice decided by workload and cost.

What Local LLMs Can and Cannot Do

That said, the "75%" figure needs to be read carefully. Boykis herself states plainly that local models are "not yet suited to production software development." Where local LLMs excel is self-contained work that needs no fresh information: cleaning up code, generating tests, proofreading text, and drafting skeletons. By contrast, checking the latest library specifications, making large-scale design decisions, and work demanding strict accuracy still favor top-tier cloud models.

In other words, what is happening is not a replacement of the cloud but a division of labor. Sending the roughly 80% of everyday routine work to local models and reserving the remaining 20% of hard problems for paid APIs is becoming the realistic middle ground. There is no need to push everything onto local hardware, nor to entrust everything to the cloud; allocating work between your own machine and outside services by the nature of the task is what balances cost and quality.

Preparing to Try Local LLMs as an Individual Developer or Small Business

For individual developers and small businesses, the significance of this shift is not small. Monthly API charges and per-token billing drop to zero, replaced by a one-time investment in hardware whose cost is predictable. Because customer data and source code can be processed locally without being sent outside, both data-leak risk and rate limits fall away. Not being at the mercy of a provider's access restrictions, price hikes, or service shutdowns is another advantage. The ability to handle internal documents that cannot be sent to the cloud matters most in fields like law, medicine, and public administration.

Of course, walls remain: the upfront investment in a machine with around 64GB of memory, plus the know-how for model selection, quantization, and inference engines. Before buying new hardware, the safe first step is to load quantized versions of "Qwen 2.5 Coder" or "Gemma 4" via "Ollama" on your current PC and test how far your routine tasks get. Also, when feeding long documents to a local LLM, converting them to Markdown first preserves heading hierarchy and table structure and improves how accurately the model reads them. Turning web pages and materials into Markdown makes them easier for local models to handle.

Free Tool

URL to Markdown Converter

Convert any public web page URL to Markdown. Preserves headings, tables, lists, and links — perfect for LLM and RAG preprocessing, research notes, and archiving web articles.

Try it now →

Official Sources for This Local LLM News

This article is based on the following primary sources (a personal blog and discussion on a social news site). Always check the primary sources for the latest, accurate information.

Official sourceVicki Boykis, 'Running local models is good now'View official source →