AIGoogleAlibabaOpenAIComputeInfrastructure

Hadi Abou DayaApril 4, 202616 min read

Three Strategic Shifts on a Single Day in AI

On April 2, 2026, Google released its most capable open-weight models under Apache 2.0, Alibaba launched a closed-source pivot that reversed its open-model strategy, OpenAI bought a live-streaming tech show for the low hundreds of millions, and a startup got a GPU price index onto Bloomberg Terminal. Each story signals a strategic shift. Together, they outline the direction of a maturing AI industry where tokens are the unit of value, compute is the scarce commodity, and controlling the narrative matters as much as controlling the model weights.

Key Takeaways

Google released Gemma 4 under Apache 2.0, ranking #3 on Arena AI with an Elo of 1,452, while Alibaba closed Qwen 3.6-Plus behind an API at 10x lower pricing than competitors.

Ornn launched GPU futures on Bloomberg Terminal, building financial derivatives infrastructure for compute as H100 rental prices surged roughly 40% in five months.

OpenAI acquired TBPN for a reported low hundreds of millions, signaling that controlling AI narrative infrastructure is now a strategic priority alongside model development.

These four events on a single day reflect a deeper convergence: tokens becoming commodities, compute becoming a scarce tradable asset, and distribution becoming a competitive moat.

Gemma 4 and Qwen 3.6 arrive on the same day with opposite strategies

Google DeepMind released Gemma 4 on April 2, 2026, calling it "byte for byte, the most capable open models" ever shipped. The family includes four variants spanning the full inference spectrum, from phones to data centers. The E2B (5.1B total, 2.3B effective parameters) and E4B (~4B effective) are edge models with 128K context windows and native audio input, small enough to run on a Raspberry Pi or smartphone with 5GB of RAM. The 26B A4B is a mixture-of-experts model with 25.2B total parameters but only 3.8B active per token, making it nearly as fast as a 4B model while performing well above its parameter class. The 31B dense model runs all 31 billion parameters and targets single- or dual-GPU server deployments with a 256K context window.

The license is the most notable detail. Gemma 4 ships under Apache 2.0, replacing Google's previous restrictive Gemma license with the most permissive open-source terms available. This means full commercial use, redistribution, and modification with no restrictions, a direct competitive move against Meta's Llama license and Alibaba's increasingly closed posture. The choice suggests Google sees ecosystem adoption as more valuable than retaining licensing leverage over fine-tuned derivatives.

On benchmarks, the results are notable for models this size. The 31B dense model ranks #3 among all open models on Arena AI's text leaderboard with an Elo of 1,452, narrowly beating Qwen 3.5-397B-A17B's score of 1,449 despite being roughly 12x smaller in total parameters. The 26B MoE sits at #6 with an Elo of 1,441. On static benchmarks, the 31B scores 85.2% on MMLU Pro, 89.2% on AIME 2026, 85.7% on GPQA Diamond, and 80.0% on LiveCodeBench v6. Google describes it as the "#1 ranked U.S. open model" on the leaderboard. The comparison with Claude Sonnet 4.6 is less direct since Sonnet operates in the proprietary tier, but the Artificial Analysis Intelligence Index places Claude Sonnet 4.6 at 52 versus Qwen 3.5-397B at 45, suggesting Gemma 4's Arena Elo performance doesn't yet close the gap with frontier proprietary models on all dimensions.

The edge AI capabilities are worth noting separately. The E2B model runs on a phone with 4-bit quantization, supports 128K context with near-zero latency, handles 140+ languages natively with audio input, and will serve as the foundation for Gemini Nano 4 on Android via AICore. Google claims up to 4x faster inference and 60% less battery consumption versus prior generations. With 400 million cumulative Gemma downloads and 100,000+ community variants since February 2024, the ecosystem has grown considerably. A llama.cpp demo showed the 26B model generating 300 tokens per second on an M2 Ultra, fast enough for real-time interactive use on consumer hardware.

Why did Alibaba close its best model behind an API?

Qwen 3.6-Plus launched on the same day with capabilities that directly challenge Western frontier models, yet its strategic posture is the opposite of Gemma 4's. The model features a native 1-million-token context window (roughly 2,000 pages of text), up to 65,536 output tokens, and always-on chain-of-thought reasoning. Its agentic coding capabilities are its main differentiator: autonomous task decomposition for repository-level work, iterative test-modify-debug loops, and visual coding from UI screenshots and wireframes. On SWE-bench Verified it scores 78.8, within striking distance of Claude Opus 4.6's 80.9, and on Terminal-Bench 2.0 it posts 61.6, beating Opus's 59.3.

The pricing significantly undercuts competitors. On Alibaba Cloud's Bailian platform, input tokens cost 2 yuan (~$0.29) per million for requests up to 256K tokens, with output at 12 yuan per million. For comparison, Claude Sonnet 4.6 charges $3.00 per million input tokens, roughly 10x more expensive. This positions Qwen 3.6-Plus as the budget frontier model, particularly appealing for high-volume agentic workloads where token consumption scales rapidly.

The key change is that Qwen 3.6-Plus is closed-source, with no downloadable weights. This departs from Alibaba's open-weight tradition that produced 113,000+ community variations on Hugging Face under Apache 2.0. Alibaba says it will continue open-sourcing "selected Qwen 3.6 models in developer-friendly sizes," but the strategic direction is clear: the flagship goes behind an API paywall, aligned with Alibaba's push for $100 billion in cloud revenue within five years.

Team departures and organizational restructuring at Alibaba Cloud

The Qwen 3.6-Plus launch came alongside significant organizational changes. Lin Junyang, the technical lead of Qwen and Alibaba's youngest P10 (born 1993), posted "me stepping down. bye my beloved qwen" on X on March 3, 2026. Alibaba CEO Eddie Wu confirmed the resignation via internal memo on March 5. Yu Bowen, head of post-training, departed the same day, and Hui Binyuan, head of Qwen Code, had already left for Meta in January 2026. Bloomberg reported Alibaba shares fell as much as 5.3% on the news.

The departures were reportedly triggered by an organizational restructuring that dismantled the vertically integrated R&D model Lin had championed. On March 16, Alibaba established the Alibaba Token Hub (ATH), consolidating five AI units under CEO Eddie Wu's direct leadership. Zhou Hao, a former Google researcher who worked on Gemini 3.0, was hired as Lin's replacement.

The user growth numbers, meanwhile, are notable. Qwen's monthly active users grew from roughly 31 million in January 2026 to 203 million in February, a 554% increase in a single month, fueled by Alibaba's Lunar New Year marketing campaign and the Qwen 3.5 release. The app hit 30 million daily active users at peak. Qwen now ranks #3 globally in AI app usage behind ChatGPT and ByteDance's Doubao. These numbers represent third-party estimates from AICPB tracking, and Alibaba's internal figures may differ, but the trajectory is directionally consistent across sources.

The emerging financial infrastructure for tokenized compute

The idea that compute is becoming a commodity with its own financial markets moved from theory to practice on April 2, 2026, when a startup called Ornn (headquartered in New York, founded by four MIT alumni) announced that its Ornn Compute Price Index (OCPI) was live on Bloomberg Terminal under the ticker ORNNH100. The index tracks actual executed transaction prices for Nvidia H100, A100, H200, B200, and RTX-class GPUs, normalized across hardware configuration, provider, and geographic location. Over 10 data partners contribute pricing data from parsed invoices, not rate cards or surveys, and more than 400 data center operators, investors, and AI companies access the platform.

Ornn's ambition extends well beyond a price index. The company executed its first-ever compute swap in December 2025 and is building toward a full derivatives exchange for GPU compute. Its product lineup includes cash-settled futures contracts, Asian-style settlement mechanisms (averaging volume-weighted prices over contract periods, since GPU-hours cannot be stored like physical commodities), and Residual Value Swaps that guarantee resale prices for GPU hardware. Contracts are live on Kalshi and Robinhood, and Architect Financial Technologies has signed on to list exchange-traded futures. Ornn operates under a CFTC de minimis exemption allowing up to $8 billion in notional swap volume while it pursues a full Designated Contract Market license.

Co-founder and CEO Kush Bavaria (former Link Ventures investor, MIT CSAIL machine-learning researcher) and CTO Wayne Nelms (former Susquehanna options trader and Google engineer) identified the gap while consulting for private-equity firms lending to data centers. Those institutional clients could not hedge GPU infrastructure risk: no benchmark, no derivatives, no way to manage exposure. As physicist and Ornn advisor Dr. Alex Wissner-Gross framed it: "Oil got a futures market in 1983. Natural gas got one in 1990. Electricity got one in 1996. Every critical commodity follows the same arc."

Why is GPU compute being financialized?

The urgency behind compute financialization is driven by genuine, documented scarcity. SemiAnalysis data shows H100 1-year rental pricing surged roughly 40% from $1.70/hour per GPU in October 2025 to $2.35/hour by March 2026. On-demand GPU rental capacity is sold out across all GPU types. All new Blackwell capacity coming online is booked through August-September 2026, and lead times for data-center GPUs run 36 to 52 weeks. HBM (high-bandwidth memory) is out of stock through 2026. Of 16 GW of data center capacity planned for 2026 globally, only 5 GW is under construction. McKinsey projects roughly $7 trillion in total data-center investment through 2030, with Alphabet, Amazon, Meta, and Microsoft staking a combined $650 billion in 2026 alone. Roughly half of that $7 trillion must come from debt markets, infrastructure funds, and sovereign wealth, entities that historically refuse to invest in any asset class without hedging tools.

Anthropic's Claude rate-limiting crisis of late March 2026 illustrated the problem in practice. After the QuitGPT movement, triggered by OpenAI's Pentagon contract, drove ChatGPT uninstalls up 295% in a single day, millions of users migrated to Claude. Anthropic's user base grew to 18.9 million professional users, and its annualized revenue reached $19 billion (up from $9 billion at end of 2025). The infrastructure could not keep pace. Max subscribers paying $200/month reported usage meters jumping from 21% to 100% on a single prompt. Anthropic acknowledged on March 31 that "people are hitting usage limits in Claude Code way faster than expected." An Implicator analysis estimated that a moderate Pro subscriber generates roughly $58.50/month in inference costs against a $20 subscription, while Max users running Opus burn approximately $570/month against $200, a structural unsustainability in flat-rate pricing. One Reddit user captured the sentiment: "Out of 30 days I get to use Claude 12." A developer called it "paying for a gym membership where they lock the squat rack during rush hour."

Nvidia as a "token factory" builder: the BEP Research thesis

The intellectual framework connecting compute scarcity to business model transformation comes most clearly from Ben Pouladian, a Los Angeles-based tech investor and former co-founder of Deco Lighting who publishes AI infrastructure analysis at BEP Research on Substack. A long-term Nvidia investor since 2016, Pouladian attended GTC 2026 in San Jose (March 16-19) and conducted exclusive interviews with three Nvidia executives.

His central thesis, laid out in a piece titled "The Token Explosion: Why GTC 2026 Was Really About Building the World's Largest Token Factory," argues that Nvidia has evolved from selling chips to building the infrastructure that produces tokens at industrial scale. Jensen Huang's own language at GTC reinforced this: "Every CEO in the world will study their business from now on in the way I'm about to describe, because this is your token factory; this is your AI factory; these are your revenues." Nvidia's internal KPIs, surfaced through its Mission Control 3.0 platform, now measure "token production per GPU, rack, and watt", a framing that treats data centers not as compute hubs but as factories whose output is measured in throughput, latency, and revenue per watt.

Pouladian highlights what he calls "the Reasoning Tax", the counterintuitive finding that smarter models don't save infrastructure but devour it. Agentic AI workloads require 12,000 GPUs paired with 400,000 CPU cores, a 33-to-1 CPU-to-GPU ratio. Nvidia's upcoming Vera Rubin platform promises a 10x reduction in inference token cost versus Blackwell and 4x fewer GPUs to train trillion-parameter models. Nvidia expects $1 trillion or more in revenue from Blackwell and Rubin chips through end of 2027.

The concept that tokens are the "atomic unit" of the AI economy is now mainstream. Deloitte wrote in January 2026: "A token is the fundamental unit of AI work. Tokens are the true unit of value. In the AI economy, they are the currency that translates opaque infrastructure decisions into tangible economic terms." All major providers now price in dollars per million tokens, creating a universal denomination: GPT-5.4 at $2.50/$15.00 input/output, Claude Sonnet 4.6 at $3.00/$15.00, Gemini 3.1 Pro at $2.00/$12.00, versus DeepSeek V3.2 at $0.28/$0.42 and Qwen 3.6-Plus at approximately $0.29 input. The cost of processing a million tokens fell from $180 to $0.75 in 18 months during 2023-24, and the gap between cheapest and most expensive models now exceeds 1,000x. Microsoft processed 50 trillion tokens in March 2026 alone, up 5x year-over-year.

OpenAI buys a tech talk show

On the same day, OpenAI announced its acquisition of TBPN, the Technology Business Programming Network, a daily three-hour live tech talk show that launched in October 2024 and was previously known as "Technology Brothers." The show is hosted by John Coogan (co-founder of Soylent, co-founder of Lucy Nicotine, former entrepreneur-in-residence at Founders Fund) and Jordi Hays (founder of Party Round/Capital, fintech founder, long-time acquaintance of Sam Altman), with Dylan Abruscato (ex-Postmates, ex-HQ Trivia) serving as president since September 2025.

The Financial Times reported a deal value in the "low hundreds of millions of dollars," making it one of the largest podcast acquisitions in history, comparable to Joe Rogan's $250 million Spotify renewal in 2024. Financial terms were not officially disclosed by either party. The Wall Street Journal reported TBPN generated roughly $5 million in advertising revenue in 2025 and was on track to exceed $30 million in 2026, significant growth for a bootstrapped, 11-person operation with zero outside investors. The show averages about 70,000 viewers per daily episode across YouTube, X, and LinkedIn, with roughly 58,000 YouTube subscribers. Major sponsors include Ramp, Plaid, Google Gemini, Cisco, Shopify, MongoDB, CrowdStrike, and Figma, plus a partnership with the New York Stock Exchange.

TBPN will sit within OpenAI's Strategy organization, reporting to Chris Lehane, the Chief Global Affairs Officer. The announcement came via a memo from Fidji Simo, OpenAI's CEO of AGI Deployment, who wrote: "The standard communications playbook just doesn't apply to us." Editorial independence guarantees were explicitly stated in the deal terms. Simo's memo promised TBPN will "continue to run their programming, choose their guests, and make their own editorial decisions." Dylan Abruscato confirmed "full control over all its editorial decisions and branding." Sam Altman posted: "I don't expect them to go any easier on us, am sure I'll do my part to help enable that with occasional stupid decisions."

Skepticism was immediate. Martin Peers of The Information wrote: "OpenAI's promise of editorial independence for TBPN is irrelevant. Can you imagine TBPN doing a hard-hitting piece on OpenAI? It's not in the show's DNA." Jessica Lessin framed it as: Elon Musk "has X," and now Sam Altman "has TBPN." Mike Isaac of the New York Times called it "a marketing expense." The deal arrived the same week OpenAI announced a $122 billion funding round valuing the company at $852 billion, making TBPN, as CNBC's Daniel Newman noted, "a fairly small bet for a lot of attention."

The acquisition fits a broader pattern

The TBPN deal is not an isolated event but part of a growing pattern of tech companies and billionaires acquiring or building media properties. Andreessen Horowitz created a dedicated "New Media" team in mid-2025, acqui-hiring Erik Torenberg's podcast network Turpentine and launching an 8-week New Media Fellowship in January 2026 to train storytellers who can be embedded inside portfolio companies. Their thesis is explicit: in a world where capital is a commodity, attention is the new competitive advantage.

The Ellison family's Paramount Skydance won the bidding war for Warner Bros. Discovery in late February 2026 with a roughly $111 billion offer, potentially placing CBS, CNN, HBO, Discovery, and dozens of other properties under a single tech-adjacent owner. David Ellison appointed Bari Weiss, whose Substack outlet The Free Press was acquired for $150 million, as editor-in-chief of CBS News. JPMorgan CEO Jamie Dimon told Axios the same week as the TBPN deal that he wants to start a media business, calling media "the great influencer." Elon Musk already controls X, acquired by xAI in April 2025 for $45 billion.

Tech-media-telecom M&A totaled $826.5 billion in 2025, up 86.5% over 2024, and PwC projects $80+ billion in media M&A deal value in 2026 alone. The pattern is consistent: technology leaders are bypassing traditional media by building or buying their own distribution channels. Chris Lehane himself drew the parallel to Microsoft-NBC launching MSNBC and Westinghouse owning CBS. The cautionary tales, however, are equally instructive. Jeff Bezos has laid off scores of Washington Post journalists, Patrick Soon-Shiong did the same at the LA Times, and Marc Benioff appears more engaged with AI than with Time magazine.

What do these four events reveal about the AI industry?

These four developments landing on the same day reflects a single underlying dynamic: the AI industry is financializing and institutionalizing. Google open-sources under Apache 2.0 to win ecosystem; Alibaba closes its flagship to win cloud revenue. Ornn builds derivatives for an asset class that didn't have a price index two years ago. OpenAI buys a media property because controlling the narrative around a $852 billion valuation is itself a strategic imperative.

The unifying thread is tokens as commodity. Compute scarcity drives rate limiting at Anthropic. Rate limiting drives user frustration. User frustration drives demand for hedging instruments. Hedging instruments drive the creation of financial infrastructure at Ornn. Financial infrastructure creates price transparency. Price transparency enables Nvidia to reframe its entire business as a "token factory" builder measured in output per watt. And the companies generating and consuming those tokens, at scales of 50 trillion per month at Microsoft alone, now need to manage public narrative as carefully as they manage GPU allocation. These are not parallel developments. They are the same story, viewed from different points along the value chain of intelligence production.

Written by

Hadi Abou Daya

AI/ML Consultant & Software Engineer

View profile

Back to Blog