Grok 3’s Massive Bet on Compute – Will It Pay Off?

Welcome to Prompt & Circumstance – your weekly deep dive into the evolving world of Generative AI and its impact on marketing and communications.

If you’ve made it this far, I’ll assume you’re getting value from this newsletterand that’s exactly why I have a small favor to ask. Share this with five colleagues who would find it useful. They’ll thank you later 😊.

Grok 3’s Massive Bet on Compute – Will It Pay Off?

We’re only two months into 2025, and I already have a serious case of narrative whiplash.

Back in January, DeepSeek’s R1 sent shockwaves through the industry, raising a fundamental question – are the billions being poured into AI hardware – GPUs, data centers, and compute power – actually worth it?

Fast forward to last week, and xAI’s Grok 3 drops – seemingly proving the opposite. Trained at an unprecedented scale with 200,000 GPUs, it reinforces the idea that compute still matters and that scaling remains a primary driver of AI performance.

For marketing and communications leaders, it’s fair to ask – what do all these new models actually mean? The debate over Gen AI development isn’t just academic — it directly impacts the tools you adopt, the content you produce, and even the environmental footprint of AI itself.

Let’s get into it.

I Tested Grok 3 — Here’s How It Stacks Up

On Monday of last week, xAI, Elon Musk’s artificial intelligence company, launched their newest model. Early benchmarks suggest that Grok 3 is a very good model. It achieved a score of 1402 on the Chatbot Arena Leaderboard, making it the leading Generative AI model on the market. Chat GPT-4o by comparison, scores 1265.

In my testing, Grok 3 proved to be both lightning-fast and impressively thorough. It not only delivered detailed, well-reasoned responses but also demonstrated a strong grasp of narrative structure and content flow – critical factors for marketing and communications use cases.

Grok 3 actively performed real-time internet searches for each of my prompts. As a result, its responses are not only contextually rich but also current, minimizing the risk of outdated information or hallucinations. This marks a notable improvement over ChatGPT-4o.

Grok 3’s biggest strength is its ability to grasp intent more effectively than ChatGPT-4o. It interprets nuance with greater precision, follows multi-step instructions more accurately, and maintains coherence across longer responses. Another standout feature is its significantly larger context window – up to 1 million tokens, compared to ChatGPT-4o’s 128,000 tokens.

Together, these features position Grok 3 as a powerful tool for marketing and communications users who need AI that can maintain context over long conversations, generate structured and insightful content, and adapt to complex strategic tasks with greater accuracy and depth.

Grok 3 – Where AI Meets Attitude?

The biggest challenge I had with Grok 3 wasn’t its performance, it was its tone and style.

Grok 3’s responses feel sharper, more opinionated, and at times, downright acerbic. Even when I prompted Grok 3 with alternative styles, it didn’t seem capable of adapting, often coming back with over the top, hyperbolic sentences and paragraphs.

This tone isn’t accidental. xAI positioned Grok 3 as an “edgy” alternative to mainstream Generative AI models. While other AI models tend to lean toward a neutral, polished, and corporate-friendly tone, Grok 3 embraces a more casual, conversational, and sometimes irreverent style.

The result? When it comes to content generation, expect to do some heavy editing to align it with your corporate tone and style.

That said, this less restrained version could lead to better idea generation and creativity. It is more likely to surface unconventional angles, break away from formulaic structures, and generate fresh, thought-provoking concepts – something more polished AI models tend to suppress.

Big Brain and Deep Search Capabilities

In addition to its multimodal capabilities – which allow it to process and generate content across different formats like text and images – Grok 3 comes equipped with Advanced Reasoning and Deep Search, two features that are quickly becoming standard across all major AI models.

  • "Big Brain Mode"/Think – Built on Chain-of-Thought (CoT) reasoning, Grok 3’s Big Brain Mode breaks down complex tasks into logical steps, improving accuracy and depth in problem-solving. CoT works by decomposing a prompt into multiple sub-steps, reasoning through each one before delivering a final response. OpenAI pioneered this approach with o1 and over the past several months, Google, DeepSeek, and others have followed suit. With more compute power allocated to reasoning tasks, AI’s ability to think critically continues to improve.

  • Deep Search – Is an emerging agentic capability that enables AI to break down a prompt into a research question, actively search the internet, and synthesize a comprehensive report of insights. Similar capabilities have been introduced by Google, OpenAI, DeepSeek, and Perplexity in recent weeks, signaling a broader shift toward AI-powered research automation. Deep Search accelerates traditionally time-consuming search tasks — whether it’s analyzing competitor strategies, identifying emerging trends, or gathering real-time market intelligence.

Grok 3 and X Data: Competitive Edge or Misinformation Nightmare?

One of the most distinctive aspects of Grok 3’s training data is its heavy reliance on X (formerly Twitter). That could be a problematic choice for enterprise users of Generative AI.

X is one of the largest sources of public, human-to-human interaction online. Training on this kind of fast-moving, real-world conversation helps the models provide diverse perspectives from global discussions that are timely.

However, it presents a disinformation challenge given X is notorious for spreading unverified claims at scale. X data isn’t neutral – since Musk’s acquisition, the platform has been criticized for political bias and there have already been a few controversies with how Grok 3 handles certain prompts here and here.

Colossus –The Engineering Feat Behind Grok 3

Grok 3 achieved the impossible and shattered expectations by successfully training on 200,000 GPUs. xAI managed to scale its training infrastructure by transforming an abandoned Electrolux factory in Memphis into “Colossus”, the largest GPU facility in existence. To overcome issues with using so many GPUs concurrently, xAI developed custom networking solutions, used Tesla MegaPack to manage power loads, and implemented a liquid-cooling system, all innovations that put them significantly ahead of competitors when applying compute power to Gen AI model development – engineering feats that now put them ahead of competitors in large-scale AI model development.

The moat in Generative AI may be the ability to effectively apply compute power at extreme scale. Until now, networking bottlenecks, power constraints, and hardware synchronization made training with this many GPUs nearly impossible. xAI just proved otherwise.

Compute Maximalism Versus Efficiency-First

With Grok 3 proving that massive compute investments can still deliver gains, the AI race is now headed in two opposing directions:

  1. The Compute-Maximalist Approach (Scaling to AGI?) – xAI, Meta, and OpenAI continue to push the boundaries of compute, leveraging massive GPU infrastructure to build ever-more powerful and capable models.

  2. The Efficiency-First Approach (Smarter models with less compute) – DeepSeek, Google, and Mistral are focusing on architectural optimization and novel AI techniques to achieve leading performance while using fewer resources.

Both approaches have merit, but the question remains – will sheer computational power be the dominant driver of AI progress, or will more efficient techniques ultimately close the gap?

OpenAI and Google’s New Models

While xAI doubled down on brute-force scaling, OpenAI and Google released new models.

At the end of January, OpenAI released their newest reasoning model - ChatGPT o3-mini. Building on their CoT models, o3 is designed to allocate more deliberation time to step-by-step logical reasoning. This enables it to generate more accurate and well-structured responses to complex problems. In testing, o3 has demonstrated impressive results. It achieved a breakthrough score of 87.5% on the ARC-AGI benchmark, surpassing previous models and even matching human performance levels.

With this update, ChatGPT now offers three chain of thought models on their platform to choose from – o1, o3-mini, and o3-mini-high. Check out our article on reasoning models and when to use them here.

Google's Gemini 2.0 Pro

A week after o3-mini’s launch, Google introduced their latest flagship model – Gemini 2.0 Pro. With the launch of Gemini 2.0, Google now offers a suite of new model variants. These include:

  • Gemini 2.0 FlashSimilar to ChatGPT - 4o. This is the core Gemini model for use in day-to-day tasks.

  • Gemini 2.0 Flash Thinking – Google’s first CoT reasoning model, similar to o3-mini, it's designed to "think" more thoroughly, allowing it to handle more complex inquiries and provide more detailed responses.

  • Gemini 2.0 Flash Thinking with Apps – This version takes things a step further by integrating the ability to interact with other Google apps like YouTube, Maps, and Search.

  • Gemini 2.0 Pro – The most powerful and capable variant, designed for highly complex tasks. It excels in areas like coding, in-depth analysis, and handling vast amounts of data.

Does Bigger Mean Better for AI in Marketing and Comms?

As the battle between compute-maximalist and efficiency-first approaches rage, the question is, what are the implications for marketing and comms leaders? Will businesses prioritize cutting-edge AI models like Grok 3, or opt for more cost-effective, fine-tuned alternatives that fit their brand and operational needs? Some thoughts:

  • Don’t Get Locked Into a Single AI Provider - The rapid pace of innovation means OpenAI, Google, xAI, and others will continue to leapfrog each other with new releases. If your AI strategy is built around a single model, you risk being locked into pricing changes, policy shifts, or a model that becomes outdated. Adopt a diversified AI stack, integrating multiple models.

  • Customization Will Drive Competitive Advantage - Not all AI models are created equal. Generalist models (Grok 3, GPT-4o, Gemini) are great for broad, high-level content creation, research, and ideation. Specialized fine-tuned models are better for brand-aligned content, industry-specific tasks, and high-accuracy outputs

  • Focus on Use Cases, Not Just Tools - Instead of asking “Which AI model should we use?”, marketing and comms leaders should be asking - What are the biggest inefficiencies in our workflow, and how can AI fix them?

  • Build AI Into the Process, Not Just the Output - The companies winning with AI aren’t just using Gen AI to create content – they’re embedding AI into their entire workflow. AI tools shouldn’t be add-ons – they should be deeply integrated into content planning, review, optimization, and distribution.

That’s all for this week, folks!

That’s a wrap for this week! If you haven’t had a chance to test our Grok 3, give it a go, it is currently available for free.

Next week, we’ll dive into one of the most polarizing topics in AI - bias, ethics, and “woke AI.” Stay tuned.