Did Japan's Sakana Fugu really beat Claude and GPT-5.5?

Sakana AI claims that on its own benchmarks, Fugu Ultra matches or beats Claude and GPT-5.5 on coding, reasoning, and science tests, with figures like 73.7 percent on SWE-Bench Pro and 95.5 on GPQA-Diamond. The important caveat is that every one of those numbers was measured by Sakana itself and has not yet been reproduced by any independent lab. Early hands-on testing, including by the Wharton professor Ethan Mollick within a day of launch, found a clear gap between the benchmark scores and ordinary real-world use, with coding tasks running slowly and output that was acceptable but did not match Claude Fable 5 in practice. So the honest answer is that Fugu posted very strong self-reported scores, but "beat Claude" is a vendor claim awaiting independent confirmation, not an established fact. Treat it as a promising result to watch rather than a settled ranking.

What exactly is Sakana Fugu, and how is it different from Claude or ChatGPT?

Fugu is not a single AI model. It is a multi-model orchestration system that presents itself as one. When you send it a request, a conductor model reads the task, decides which specialist models should handle each part, and assigns them roles that Sakana calls Thinker, Worker, and Verifier, so one engine plans, another executes, and a third checks the result before it reaches you. Claude and ChatGPT, by contrast, are single large models trained to do everything themselves. The practical difference is that Fugu is a switchboard routing your work to whatever it judges best for each piece, which in theory gives better fit and resilience against any one vendor failing, but in practice adds a layer that can introduce latency and complexity. The orchestration approach is arguably the most interesting thing about the launch, because routing each task to the right model is a direction the whole industry is moving.

Can I use Sakana Fugu in Europe?

Not at launch. Sakana AI has not made Fugu available in European Union or EEA member states, and says GDPR compliance is still in progress. European users cannot access the Fugu API or its subscription plans right now. For a business in the EU or EEA, that settles the question for the moment: there is nothing to evaluate until Sakana opens regional access and clarifies how it handles personal data. It also signals something about maturity, because tools that serve European businesses well generally treat data-protection compliance as a launch requirement rather than a later addition. If you are a European business, the sensible move is to wait until EU availability and a clear GDPR posture exist before spending any time assessing it, and to keep using your current region-available tools in the meantime.

What does the Fugu launch mean for the AI market overall?

It signals that the frontier of AI capability is no longer guarded only by the companies that can afford to train the biggest models. A small Tokyo lab founded by a handful of researchers can now credibly claim a seat at the same table as firms spending billions, by being clever about how it combines existing models through orchestration rather than by building a larger one from scratch. Whether or not Fugu's specific numbers hold up, that shift matters, because it points to system design, routing, and orchestration becoming as important as raw model size. For a small business owner, this is quietly good news even if you never use Fugu, because more credible players means more competition, and more competition tends to mean better tools at lower prices and less dependence on any single vendor. The healthiest version of the AI market for buyers is the one where many strong options push each other, and Fugu is a vote for that version.

Sakana Fugu vs Claude & GPT-5.5: What It Means for SMBs

Q: How much does Sakana Fugu cost?

For users outside the EU, Sakana offers Fugu on three subscription tiers: Standard at 20 dollars a month, which includes both Fugu and Fugu Ultra, Pro at 100 dollars a month for roughly ten times the usage, and Max at 200 dollars a month for heavy workloads. At launch there is an offer of a free second month for anyone who subscribes before the end of July 2026. For developers there is also pay-as-you-go pricing, with Fugu Ultra billed at 5 dollars per million input tokens and 30 dollars per million output tokens. Access is through an OpenAI-compatible API, so you can point an existing client or coding tool at the Fugu endpoint with an API key and start sending requests without rewriting anything. That low switching cost is deliberate, and it is the responsible way to test Fugu if you are eligible: try it on one low-stakes task, measure it against your current tool, and switch back in seconds if it underperforms.

Q: Should my small business switch to Sakana Fugu?

For the overwhelming majority of small businesses, no, not yet. Nothing about Fugu's launch is a reason to cancel a tool that is already working for you. The models you likely use today, Claude, ChatGPT, or Gemini, are proven, supported, available in your region, and more than capable enough that the real bottleneck in your business is task selection and integration, not raw model power. The narrow exception is a technically comfortable team outside the EU, running heavy coding or analysis workloads, that already builds on an OpenAI-compatible API and has a non-critical workflow to experiment on, where the one-endpoint switch and 20-dollar entry price make a controlled test cheap. Even then, prove it on something that does not matter before trusting it with something that does. For everyone else, watching while your current tools keep working costs nothing, and waiting for independent benchmarks and EU availability is the smart play.

A founder I know forwarded me a headline at seven in the morning. "Japan just beat Claude." Three exclamation marks. She runs a small logistics business, she pays for one AI tool, and she wanted to know if she had bought the wrong one. It is a fair question, and it is the question thousands of small business owners asked that week, because the headline was everywhere and the truth underneath it was nowhere.

I run an AI automation agency out of Denmark, so the first thing I did was try to use the thing. That is when the story got more interesting than the headline. The model that supposedly beat Claude is not a model. The benchmarks that supposedly proved it came entirely from the company selling it. And when I went to sign up, I could not, because it does not run in Europe yet. None of that was in the headline my friend read.

So here is the version with the asterisks left in, written for the person who has a business to run and one or two AI subscriptions to justify, not for the leaderboard crowd. What Sakana AI shipped is genuinely clever and worth understanding. It is also not a reason to cancel anything you are currently paying for, and the gap between those two statements is the whole article.

That gap, between what a launch claims and what it changes for your Tuesday, is where most AI news lives in 2026.

The five-second answer

Sakana AI's Fugu and Fugu Ultra, released June 22, 2026, are a multi-model orchestration system that routes each request to several specialist models and combines the result. Sakana's own benchmarks claim Fugu Ultra matches or beats Claude and GPT-5.5 on coding, reasoning, and science, but the numbers are self-reported, not independently reproduced, and early hands-on testing found a gap between the scores and real use. It is also unavailable in the EU and EEA right now. For most small businesses it is worth watching, not switching to.

What Sakana actually shipped

Fugu is not a single large language model. It is an orchestration system that presents itself as one. When you send it a request, a conductor model reads the task, decides which specialist models should handle it, and assigns them roles before assembling a final answer. Sakana AI, the Tokyo lab founded by former Google researchers, released it on June 22, 2026, in two versions: standard Fugu and the higher-effort Fugu Ultra (Sakana AI, Fugu release).

The mechanism is the interesting part. Where Anthropic or OpenAI train one enormous model to do everything, Fugu's conductor, which Sakana says is built on its own ICLR 2026 research, breaks a task into parts and hands them to different engines. It assigns what Sakana calls Thinker, Worker, and Verifier roles dynamically, so one model plans, another executes, and a third checks the result before it reaches you. The pitch is that a well-conducted ensemble can beat any single instrument, even a very large one (DataCamp, Sakana Fugu explained).

For a business, the practical translation is simple. You do not pick a model and live with its weaknesses. You send the work to a switchboard that is supposed to route each piece to whatever handles it best, and route around any single vendor having a bad day or going down. That is a genuinely different shape for an AI product, and it is why the launch got attention beyond the usual benchmark noise. Whether the switchboard actually delivers what the scores promise is the next, harder question.

The benchmark claims, and the asterisk on all of them

The numbers are eye-catching, and every one of them carries the same asterisk: Sakana measured them itself. On SWE-Bench Pro, a software-engineering benchmark, Sakana reports Fugu Ultra scoring 73.7 percent, ahead of Claude Opus 4.8 at 69.2, GPT-5.5 at 58.6, and Gemini 3.1 Pro at 54.2. On the graduate-level science benchmark GPQA-Diamond it reports 95.5, above Claude Fable 5. On LiveCodeBench it reports 93.2, again above Fable 5 (Business Standard, June 2026).

Sakana ran the comparison against Claude Mythos Preview, Anthropic's restricted frontier model, across six benchmarks spanning coding, reasoning, and scientific problem-solving, and claimed parity or better on most. Taken at face value, that would put a small Japanese lab at the frontier alongside companies spending billions on training. The phrase "at face value" is doing a lot of work, because no independent lab has yet reproduced these results. Every figure in the launch is a vendor claim until someone outside Sakana confirms it (Verdent, reading the Fugu benchmarks).

The first real-world test was not kind. Within a day of launch, independent reviewers including the Wharton professor Ethan Mollick, who runs some of the most-watched public AI experiments, reported a clear gap between the benchmark scores and ordinary use. Coding tasks that should have been quick took around thirty minutes, and the output was, in his word, "fine," but did not match Claude Fable 5 in practice. Benchmark dominance and a good Tuesday are not the same thing, and Fugu's opening week showed the daylight between them (Geeky Gadgets, Fugu vs Mythos).

There is also a structural problem with the comparison itself. Fugu is an orchestration system that can call several models, and it is being benchmarked against single models. Some researchers argue that is not a fair fight, that a system coordinating multiple engines should outscore any one of them, and that the interesting question is cost and latency, not whether the ensemble wins (paddo.dev, a multi-agent system sold as a model).

Not sure which AI tools are actually worth paying for? The €49 audit gives you a straight answer for your business

Why a system, not a model, matters to you

The orchestration design is the part of this launch a small business should actually care about, more than any single score. The idea that you stop betting on one model and instead route each task to whatever handles it best is the direction the whole industry is drifting, and Fugu is the loudest example so far. It is the same instinct behind picking the right tool for a job rather than owning one very expensive hammer.

The upside is resilience and fit. If your reporting works best on one engine and your customer replies on another, an orchestration layer in theory gives you both without you managing two subscriptions and two integrations. It also means no single vendor outage takes you offline, because the conductor can route around a model that is slow or down. For a business that has felt the pain of a tool it depends on having a bad week, that independence has real appeal.

The cost is in the two words "in theory." Orchestration adds a layer, and layers add latency, complexity, and a new place for things to go wrong. Fugu's own early reviews flagged slowness on tasks a single model does instantly, which is exactly the tax you would expect from routing every request through a conductor and a verifier. The lesson for a small business is not "orchestration is the future so adopt it now." It is that the model you use matters less every month, and the system that picks the model matters more. That shift is worth understanding even if Fugu itself is not the tool you end up using. We made a version of this point comparing the new work agents in our Copilot Cowork, Codex, and Claude Cowork breakdown, and Fugu is the same trend viewed from the model side.

The catch for European businesses

If your business is in the European Union or the EEA, the entire debate is academic right now, because you cannot use Fugu. Sakana AI has not made it available in EU or EEA member states, and states that GDPR compliance is still in progress. There is no API access and no subscription for European users at launch (Sakana Fugu guide, regional availability).

This matters more than it sounds, and not only because it blocks access. A frontier-claiming AI product that ships without EU availability is telling you something about its priorities and its maturity. The companies that serve European small businesses well treat GDPR as a launch requirement, not a later patch. When a tool arrives with the data-protection work unfinished, a European business is right to wait until that work is done rather than be an early test case for someone else's compliance roadmap.

I sat with this directly. As a Denmark-based agency, the first useful thing I can tell a European client about Fugu is that the decision is already made for them: there is nothing to evaluate until Sakana opens EU access and clarifies how it handles personal data. That is not a knock on the technology. It is the practical reality that determines whether any of the benchmark drama touches your business this quarter, and for European readers, it does not.

What it costs and how you would actually try it

For businesses outside the EU, the pricing is straightforward and aimed squarely at teams that want to experiment without a big commitment. Sakana offers Fugu on three subscription tiers: Standard at 20 dollars a month, which includes both Fugu and Fugu Ultra, Pro at 100 dollars a month for roughly ten times the usage, and Max at 200 dollars a month for heavy workloads. There is a launch offer of a free second month for anyone who subscribes before the end of July 2026 (Sakana Fugu pricing).

For developers and automation builders there is also pay-as-you-go pricing, with Fugu Ultra billed at 5 dollars per million input tokens and 30 dollars per million output tokens, the same headline shape as the top tier of several rivals. The access path is deliberately frictionless: Sakana exposes an OpenAI-compatible API, so you point your existing client or coding tool at the Fugu endpoint with an API key from the Sakana console and start sending requests, with no rewrite required (DigitalApplied, Fugu orchestration).

That low switching cost is the genuinely smart part of the go-to-market, and it is worth noting even if you do not buy. Because Fugu mimics the OpenAI API, a business already building on OpenAI could test Fugu on a non-critical workflow by changing one endpoint, measure the speed and quality against what it has, and switch back in seconds if it disappoints. If you are outside the EU and curious, that is the responsible way to try it: one low-stakes task, measured honestly, never your customer-facing flow on day one. It is the same discipline that separates AI wins from write-offs generally, which we laid out with the numbers in our piece on the real ROI of AI agents.

Should a small business care yet?

For the overwhelming majority of small businesses, the honest answer is: watch it, do not switch to it. Nothing about Fugu's launch should make you cancel a tool that is working for you. The models you already use, Claude, ChatGPT, Gemini, are proven, supported, available in your region, and good enough that the bottleneck in your business is almost never the raw capability of the model. It is whether you have pointed it at the right task and built it into your workflow.

There is a narrow group for whom Fugu is worth a real look now: technically comfortable teams outside the EU, running heavy coding or analysis workloads, who already build on an OpenAI-compatible API and have a non-critical workflow to test on. For them the one-endpoint switch and the 20-dollar entry price make a controlled experiment cheap, and the orchestration approach might genuinely fit a multi-step workload better than a single model. Even then, the rule holds: prove it on something that does not matter before you trust it with something that does.

For everyone else, the move is to let the dust settle. Wait for independent benchmarks from labs that do not sell Fugu. Wait for EU availability and a clear GDPR posture if you are European. Wait to see whether the early reports of slowness were launch-week roughness or a structural cost of the orchestration design. None of that waiting costs you anything, because your current tools keep working while you watch. The businesses that lose with AI are not the ones who adopt a month late. They are the ones who chase every headline and never get one tool working properly.

The bigger signal

Step back from the scoreboard and the launch means something regardless of whether the numbers hold. A small lab in Tokyo, founded by a handful of researchers, can now credibly claim a seat at the same table as companies spending billions, by being clever about how it combines models rather than by training a bigger one. That is new. It says the frontier is no longer guarded only by scale, and that the next advantage may come from orchestration, routing, and system design as much as from raw model size.

For a small business owner, that is quietly good news, even if you never touch Fugu. More credible players means more competition, and more competition means better tools at lower prices and less dependence on any single American giant. The version of the AI market where one or two companies hold all the leverage is worse for you than the version where a Japanese orchestration system, a French open model, and the incumbents all push each other. Fugu is a vote for the second version.

So I wrote my friend back. No, you did not buy the wrong tool. Keep the one that works, ignore the exclamation marks, and let me know when you actually have ten minutes to make the tool you already pay for earn its keep. That is the whole game. The headline said Japan beat Claude. The truth is that the AI you can buy is getting better and cheaper from more directions at once, and the businesses that win are the ones calmly putting today's good-enough tools to work while everyone else refreshes the leaderboard.

Want to put the AI you already pay for to work instead of chasing the next launch? Start with the €49 audit

Japan's Sakana Fugu says it beats Claude and GPT-5.5, here is what that actually means for your business

What Sakana actually shipped

The benchmark claims, and the asterisk on all of them

Why a system, not a model, matters to you

The catch for European businesses

What it costs and how you would actually try it

Should a small business care yet?

The bigger signal

Sources

Common questions.

Want this in your business?

Japan's Sakana Fugu says it beats Claude and GPT-5.5, here is what that actually means for your business

What Sakana actually shipped

The benchmark claims, and the asterisk on all of them

Why a system, not a model, matters to you

The catch for European businesses

What it costs and how you would actually try it

Should a small business care yet?

The bigger signal

Sources

Common questions.

Want this in your business?

How we actually do this.

Task & Workflow Automation

Customer Support AI

Business Intelligence

Keep reading.

OpenAI's GPT-5.6 Sol, Terra, and Luna, explained for a small business that just wants to know which one to use

The EU AI Act August 2026 deadline reaches US small businesses, here is who is actually in scope

The AI hiring lawsuits every small business needs to understand before the next time they hire

Book yourAI audit

Book your
AI audit