THE HYPERSCALER HANGOVER
Why AWS, Azure & GCP are falling behind the AI curve—and why the best architectures are already built somewhere else.
Let me say something that might be unpopular in certain circles: the hyperscalers are losing the plot.
Not losing their revenue. Not losing their customers. Not even losing the narrative — yet. But when it comes to actually keeping pace with where AI infrastructure needs to go, AWS, Azure, and Google Cloud are running 6 to 24 months behind the frontier. And the gap is growing.
I've spent the better part of the last year helping organisations navigate their data and AI architectures. What I keep seeing is a painful pattern: teams waiting on managed services that don't exist yet, paying hyperscaler GPU tax on hardware that's two generations behind what the neoclouds are shipping, and shoehorning agentic workloads into platforms built for a different era of computing.
We need to talk about the exotic options. Not as curiosities. As your default stance.
The Numbers Tell the Story
Hyperscaler capital expenditure is forecast to exceed $600 billion in 2026 — a 36% jump year over year. That's a staggering commitment. But here's what's buried in the fine print: AI-related cloud services generated only around $25 billion in revenue in 2025. That's roughly 10 cents of return for every dollar spent. Meanwhile, investors are starting to blink. The average stock price correlation across the major hyperscalers dropped from 80% to just 20% since mid-2024, as markets began distinguishing between those translating capex into earnings and those burning capital hoping the model holds.
"Hyperscalers are spending $400 billion a year on AI infrastructure to chase a $37 billion market. Someone ran the numbers — to justify current capex, AI needs $2 trillion in annual revenue by decade's end. Best-case forecasts say $1.2 trillion."
The uncomfortable reality: the hyperscalers are building for a world they've bet on, not the world we're living in right now. And that creates a window. A wide one.
The Lock-In Trap Is Real and It's Getting Worse
Every hyperscaler says they want interoperability while building for lock-in. Their agent frameworks integrate beautifully — with their own storage, their own models, their own observability tooling. AWS Bedrock connects neatly to S3, DynamoDB, and their managed Kafka. Azure AI Foundry prefers to stay within Fabric. Google Vertex won't play nicely unless you're already in BigQuery.
This isn't conspiracy. It's commercial reality. An agent that moves fluidly across all vendor systems commoditises them. They don't want that. But we do.
The 2026 data engineering and AI architecture challenge isn't about picking one cloud and going deep. It's about stitching together the best specialised services — and doing it without ceding control. That requires a fundamentally different mindset. Treat your hyperscaler like a commodity utility provider for compute and storage. Be very deliberate about what you allow to become a managed dependency.
Go Exotic: NanoVMs and the Unikernel Renaissance
Here's a technology that deserves far more attention in the data engineering and AI community: unikernels. Specifically, NanoVMs and the Nanos unikernel runtime.
The premise is simple but the implications are profound. A unikernel is your application compiled down to a single-purpose, minimal virtual machine. No Linux kernel. No shell. No package manager. No SSH daemon sitting idle. Just your application and the exact OS primitives it needs, sealed into an image that boots in milliseconds.
The performance numbers are not subtle. Unikernels boot two orders of magnitude faster than Docker and run software up to 200% faster on GCP and up to 300% faster on AWS. NanoVMs can provision hundreds to thousands of VMs on the same hardware that would run a fraction of that with containers. For inference workloads — where cold start latency kills user experience and density directly determines unit economics — this is genuinely transformative.
The security story is equally compelling. By design, unikernels prevent multiple programs from running. That eliminates the shell, the package manager, and the entire tool set that attackers depend on once they get foothold. No SSH. No bash. No lateral movement. You're not patching against vulnerabilities — you're removing the surface entirely.
"We don't scan for hacked systems — we remove the tools hackers use to stop it in the first place." — NanoVMs
NanoVMs Inception now lets you run these unikernel workloads on plain EC2 instances — not bare metal, not specialised instances — at up to 4x the speed of emulated alternatives with zero cold starts. And with their Firecracker-based PAAS pattern, you can build your own multi-tenant inference layer on top of commodity infrastructure, on your terms.
For agentic AI workloads that need tight isolation, predictable performance, and aggressive density, this is not the future. It's available today.
Zero Trust is the Only Trust Model That Survives Agents
Let's talk about something that becomes urgent the moment you deploy autonomous agents: your network trust model is almost certainly wrong for this world.
Traditional perimeter security assumes that what's inside your network is friendly. That model was already strained by cloud and remote work. Agentic AI — where systems autonomously execute tasks, call external APIs, process untrusted content, and maintain persistent memory — shatters it entirely.
OpenClaw, the open-source autonomous agent platform that accumulated over 200,000 GitHub stars in months, is a vivid demonstration of both the opportunity and the risk. Agents that can autonomously manage email, run shell commands, deploy code, and interact across messaging platforms are extraordinarily powerful. They're also a nightmare if you haven't designed your network and access architecture around the assumption that any process could be compromised or manipulated.
The right architecture for an agentic world is zero trust at every layer:
- Identity. Every agent identity is attested and short-lived
- Least privilege. Access to tools, APIs, and data is scoped to the minimum required for each task
- Auditability. All agent actions are logged, auditable, and reversible where possible
- Content isolation. Ingested content — emails, web pages, third-party APIs — is treated as potentially adversarial
- Secrets hygiene. Credentials are managed in dedicated vaults, never embedded in agent prompts or memory
This isn't optional hardening. It's table stakes. The moment you give an AI agent network access and real-world tools, your security posture must be built around the assumption it will be targeted. Build for that from day one.
GPU-First Infrastructure: The Neocloud Advantage
When you need GPU compute for AI workloads, the hyperscalers should be your last call, not your first.
The neoclouds — CoreWeave, Lambda Labs, Nebius, Voltage Park, Crusoe — were built from scratch for AI. They're not retrofitting GPU instances into VM-based architectures designed for web apps. They're Kubernetes-native or bare-metal-first, running NVIDIA InfiniBand fabrics at 400Gbps per port, with direct GPU-to-network memory access that makes hyperscaler networking look like it's moving data through amber.
The performance gap is measurable. InfiniBand can speed up multi-node training by 2-3x compared to Ethernet-based alternatives. For LLM training at scale, this isn't a feature — it's a requirement. CoreWeave's architecture eliminates the hypervisor overhead that traditional clouds impose, giving you bare-metal throughput with container orchestration flexibility. Lambda Labs ships 1-Click Clusters where every GPU node comes with Quantum-2 InfiniBand and GPUDirect RDMA preconfigured and ready to train.
Beyond performance, the commercial model is cleaner. Where AWS charges egress on every byte that leaves the building, CoreWeave launched a Zero Egress Migration program specifically to draw teams away from hyperscaler data gravity. The 'hotel California' model — where the real cost of cloud is how expensive it is to leave — is being directly challenged.
"The era of general-purpose cloud dominance is ending for AI teams. The 40% average GPU utilisation problem on hyperscalers is a major hidden cost requiring workload-aware orchestration that they simply don't offer."
For inference workloads, the calculus is similar. GPU-specialised providers offer purpose-built environments without the layers of abstraction that hyperscalers impose. RunPod's serverless GPU platform, for example, handles cold starts and autoscaling for inference without the overhead of managed Kubernetes clusters that the big three require you to instrument and maintain yourself.
Agentic Frameworks: Don't Wait for Your Cloud Vendor
The agentic era is here now. Not in 18 months. Not after your cloud provider ships their managed agent service. Right now.
OpenClaw's trajectory is a useful signal. From a prototype in an hour to 200,000+ GitHub stars in months — faster growth than Docker, Kubernetes, or React ever saw. It's not a polished enterprise product. It's a working autonomous agent that connects to messaging platforms you already use and actually executes tasks. Email management. Code deployment. Calendar automation. Complex workflow orchestration. From your phone, via Telegram.
Yes, there are security issues to address before enterprise deployment. But OpenClaw is important not because you should ship it to production tomorrow, but because it demonstrates what's architecturally possible right now with open-source tooling and commodity hardware. One developer had their agent negotiate $4,200 off a car purchase over email while they slept. Another had it file a legal rebuttal to an insurance denial without prompting.
Meanwhile, we're waiting for AWS to productionise their agent orchestration service. We're on a waitlist for Azure's managed MCP endpoints. We're reading 2027 roadmap blogs from Google.
The community isn't waiting. The frameworks that matter in 2026 are being built in the open, fast, and they don't need your cloud vendor's blessing.
Messaging Protocols for the Agentic World
If you're building agent infrastructure, the protocol layer deserves serious attention. The hyperscalers are each proposing their own interoperability standards while quietly ensuring they work best within their own ecosystem. MCP, A2A, ACP — the alphabet soup of multi-vendor protocols will achieve "growing ecosystem momentum" in press releases long before any of them achieve production deployments at scale.
The more practical question is: what communication patterns actually suit autonomous agent workloads?
Agents need persistent, stateful communication — not the stateless request-response model that REST APIs were built for. They need message queues that can handle variable latency, retry logic, and backpressure from dependent tool calls. They need event streams that support audit trails and replay. They benefit from protocols that treat agent identity as first-class, not a header you bolt on.
NATS, with its built-in JetStream persistence and subject-based routing, is genuinely well-suited for multi-agent communication architectures. Kafka remains the standard for high-throughput event streaming with the audit trail guarantees that compliance requires. For local-first agent frameworks like OpenClaw, WebSocket-based control planes with persistent session state are proving remarkably effective.
The point isn't which specific protocol wins. It's that you should be designing your agent communication layer with intentionality — not defaulting to whatever your managed cloud service exposes as its integration surface.
The Architecture Principle for 2026
Architectures in the agentic AI era are fundamentally about composition. You're not building monolithic applications deployed on one cloud. You're stitching together:
- Agent layer. Open-source agent runtimes (OpenClaw, AutoGen, LangGraph) running on your own infrastructure
- Compute layer. Purpose-built GPU infrastructure (CoreWeave, Lambda Labs) for training and heavy inference
- Isolation layer. Unikernel-based microservice isolation (NanoVMs) for high-density, high-security service execution
- Security layer. Zero trust networking as the connective tissue between every component
- Messaging layer. Protocol-first messaging (NATS, Kafka, WebSockets) designed for agent communication patterns
- Data layer. Data plane choices (ClickHouse, DuckDB, Iceberg) that don't create hyperscaler gravity
Each layer should be independently replaceable. No single vendor should own more than one critical path. Egress should never be a meaningful cost. Vendor lock-in should be a conscious choice you make for specific reasons, not the default outcome of following the path of least resistance.
The teams building the best AI products right now are not asking their cloud vendor for permission. They're composing infrastructure from the best available primitives, moving fast, and keeping optionality open.
Stop Waiting for the Hyperscalers to Catch Up
The hyperscalers will catch up eventually. They always do. They'll productionise managed agent services, build GPU-native inference platforms, and figure out their unikernel story. In 18 to 24 months, there'll be re:Invent keynotes about all of it.
But the window between where the frontier is today and where the big clouds will be is exactly where competitive advantage gets built. The teams that are moving right now — with NanoVMs, with GPU neoclouds, with zero trust networking, with open-source agent frameworks — are building capability that compounds while the rest of the market waits for a managed service that ships with guardrails and a 30-day free trial.
Treat the exotic options as your defaults. Treat hyperscaler managed services as conveniences you reach for only when the abstraction genuinely earns it. Design every layer of your stack for composability and escape velocity.
The future of AI infrastructure is not a single cloud. It's a constellation of specialised services, stitched together by teams who were unwilling to wait.
Originally published on LinkedIn · March 2026
Written by Peter Hanssens
Data Engineer, founder, and community leader. Building scalable data platforms.