Why the AI SOC race Is really a data race

TL;DR

Databricks acquired an AI SOC platform and framed it as a data deal, building a “security lakehouse”. The signal for security leaders: value in AI for cybersecurity is accruing to the data and context layer, not the model.
Teams evaluating agentic SOC vendors often lead with model choice. It barely moves the needle when every platform plugs into the same frontier models.
A “security lakehouse” aggregates more telemetry for detection after alerts fire. That only covers one part of the overall story. Reasoning about exposure before the alert needs organizational context, calibrated to your crown jewels and policies.
The infrastructure that makes a security AI agent genuinely accurate requires dedicating heavy resources, which is why Databricks chose to buy it rather than build it.

When Databricks reportedly paid $1.4bn for an AI SOC platform and described it as a data deal, it settled an argument the security industry had circled for two years.

The race to build autonomous security agents was never really about the agents. The value sits in the data and context layer underneath them.

For a security leader evaluating AI solutions, that reframes the entire vendor conversation. A platform earns its place by understanding your environment well enough to tell you what genuinely matters, not by running the cleverest model. The reason comes down to the substrate.

Follow the acquisition, find the real value layer

When a data infrastructure company spends $1.4 billion on a security platform and then talks about the deal like it’s a data play, pay attention to the vocabulary.

On June 16, 2026, Databricks announced its intent to acquire Panther Labs, an AI SOC platform, and framed it around the “security lakehouse,” not around detection. Databricks didn’t pay for Panther’s triage agents or its detection logic. It paid for the substrate underneath them: 100+ deeply parsed integrations across cloud infrastructure, identity providers, endpoints, and SaaS, the kind of coverage that takes legacy SIEMs months of mapping to approximate.

Panther had already placed the same bet one layer down. In October 2025, it bought a telemetry pipeline company, specifically for normalization and no-code pipeline management. Databricks CEO Ali Ghodsi said the quiet part out loud: “Legacy SIEM was never designed for AI.” The Panther deal is Databricks’ third security acquisition in a row, acquiring different solutions for data governance and for classification. Every deal targets the same layer.

When the company that holds the data trust of 70% of the Fortune 500 decides the prize in security AI is the data foundation, that tells you where the moat actually sits. The question is what that means for the platforms already pitching you, and the answer starts with the model.

Model selection is a commodity decision

Every AI SOC vendor pitching you right now has an agent. In reality, most of them run on one of the frontier generic AI models (Claude, GPT, or Gemini). When every platform plugs into the same models, the model stops being a real differentiator, and you’re left comparing wrappers around the same brain.

Gartner’s 2026 Hype Cycle for Security Operations research indicated that most AI failures trace back to weak data and context foundations rather than a ‘bad’ model. The pattern shows up in the spend data too: Gartner’s April 2026 survey of 353 data and AI leaders found that organizations with successful AI initiatives invest up to four times more in data foundations than those with poor outcomes.

We put a number on it with our cross-platform benchmark for ISPM. Testing four frontier models across 50 cross-vendor identity tasks on a live eight-platform environment, adding a cross-vendor relational map raised answer correctness by 34% and cut exploratory queries by roughly 70%. The gain held across every model tested. Accuracy moved when the context layer improved, regardless of which model ran underneath.

So the question worth asking in an evaluation has little to do with which model runs under the hood. What matters is how much the model already understands about your specific environment the moment it starts working. The security lakehouse is the market’s answer to that, and it’s where the story gets complicated.

Beyond the reactive “security lakehouse”

The security lakehouse deserves credit for the problem it solves. Centralizing messy telemetry in one governed place, making it queryable by AI agents without legacy SIEM’s punishing ingestion bill, is a genuine step forward.

The question worth pressing is what the agent does with all that data, and when in the workflow it shows up.

A detection agent working from a lakehouse reasons about threats after the alert fires. More telemetry gives it richer material for that investigation, which is of real value. The workflow stays reactive, though. The gap that logging more data never closes sits upstream: knowing which exposures are actually dangerous before anyone acts on them, weighted by what your organization specifically cares about.

Knowing which exposure matters takes more than telemetry breadth. It takes organizational context, which assets carry real blast radius, which identities touch crown-jewel systems, what your last pentest surfaced, which policy exceptions you’ve already signed off on. A lakehouse holds more logs. A relational graph understands what those logs mean for you.

Think of a lakehouse as a well-organized library: every book catalogued, every shelf searchable, more volumes arriving daily. A Security Graph is the librarian who has already read your thesis, knows your research gaps, and has the three books that close your argument already pulled for you.

Both matter, and they operate at different points in the same workflow. Conflating them is how a team ends up buying a better detection layer when the actual gap was upstream, before the alert ever existed. So the harder question becomes what it really takes to build that upstream context.

Why “context aware” is the hardest claim to verify

Upstream context doesn’t come from logging more data. It comes from infrastructure most platforms underestimate, which is exactly why an AI SOC agent can sit on a rich lakehouse and still reason only about alerts that have already fired.

By the time a vendor drops ‘context-aware’ in a pitch to you, the phrase has lost most of its signal. So it helps to look at what a real organizational context layer actually requires.

Building the kind of context that works upstream takes four things most security vendors underestimate: Graph normalization, resolving the same entity across systems that each name and model it differently. Asset modeling, building one shared ontology across a sprawling, heterogeneous environment. Deterministic ML risk scoring that runs before any language model touches the data. And organizational knowledge ingestion, pulling in internal policies, pentest history, crown-jewel designations, and past approval calls, so the system grasps what your environment looks like and what it means for your organization specifically.

None of that is lightweight. It’s a heavy infrastructure problem that has to be solved before AI deployment produces reliable answers. It’s also the clearest read on the Databricks news: when buying the substrate beats building it, the substrate is the hard part. The agent was never the bottleneck.

So this is what “context” means in real terms: the relational, business-aware foundation underneath the model. It closes the gap between detecting threats and understanding exposure. For security teams, though, it’s only part of the story, because knowing what matters is the setup. Acting on it fast is what comes next.

Key takeaways

The direction of acquisition is the tell: data infrastructure companies are buying detection platforms for their substrate, not the other way around.
Model selection is close to a commodity decision. The quality of relational context around the agent moves accuracy far more, by 34% in Sola’s cross-model benchmark.
A security lakehouse improves detection after alerts fire. Cross-domain exposure reasoning works upstream, before the alert exists.
Real organizational context (graph normalization, asset modeling, ML risk scoring, and knowledge ingestion) takes a lot of time and effort to build.

The context layer that makes the difference.

Frequently asked questions

Why did Databricks acquire an AI SOC company instead of building one?

Building the data foundation underneath an AI agent, graph normalization, entity resolution, organizational knowledge ingestion, takes roughly two years before it produces reliable results. Databricks bought Panther to own that substrate immediately rather than spend years rebuilding it. The acquisition direction confirms where the durable value sits.

What’s the difference between a security lakehouse and an exposure management platform?

A security lakehouse centralizes telemetry so detection agents can investigate alerts cost-effectively after they fire. An exposure management platform (like Sola’s) works upstream, reasoning across cloud, identity, SaaS, and endpoint to flag which exposures carry real blast radius before anyone acts. The two aren’t competing. Instead, they’re operating at different points in the security workflow.

How does Sola’s Security Graph differ from what a SIEM aggregates?

A SIEM aggregates and correlates logs to surface events. Sola’s Security Graph maps how entities in one platform relate to entities in another, across vendors, so an AI agent understands that an identity in Okta connects to a crown-jewel system in AWS. It models relationships, not only records.

What should security teams ask AI companies instead of asking about the model?

“How does it get all the context needed?”. That’s the framing analysts increasingly use. Ask the vendor to show what the platform knows about your actual environment, your crown jewels, identities, and blast radius, rather than a polished demo. Sola calibrates every signal to how your organization operates.

The substrate beats the model: Why the AI SOC race is really a data race