Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
As the scale of enterprise AI operations continues to grow, having access to data is no longer enough. Enterprises now must have reliable, consistent and accurate access to data.
That’s a realm where distributed SQL database vendors play a key role, providing a replicated database platform that can be highly resilient and available. The latest update from Cockroach Labs is all about enabling vector search and agentic AI at distributed SQL scale. CockroachDB 25.2 is out today, promising a 41% efficiency gain, an AI-optimized vector index for distributed SQL scale, and core database improvements that improve both operations and security.
CockroachDB is one of many distributed SQL options in the market today, including Yugabyte, Amazon Aurora dSQL and Google AlloyDB. Since its inception a decade ago, the company has aimed to differentiate itself from rivals by being more resilient. In fact, the name ‘cockroach’ comes from the idea that a cockroach is really hard to kill. This idea remains relevant in the AI era.
“Certainly people are interested in AI, but the reasons people chose Cockroach five years ago, two years ago or even this year seems to be pretty consistent, they need this database to survive,” Spencer Kimball co-founder and CEO of Cockroach Labs told VentureBeat. “AI in our context, is AI mixed with the operational capabilities that Cockroach brings…so to the extent that AI is becoming more important, it’s how does my AI survive, it needs to be just as mission critical as the actual metadata.”
The distributed vector indexing problem facing enterprise AI
Vector capable databases, which are used by AI systems for training as well as for Retrieval Augmented Generation (RAG) scenarios, are commonplace in 2025.
Kimball argued that vector databases today work well on single nodes. They tend to struggle on larger deployments with multiple geographically dispersed nodes, which is what distributed SQL is all about. CockroachDB’s approach tackles the complex problem of distributed vector indexing. The company’s new C-SPANN vector index uses the SPANN algorithm, which is based on Microsoft research. This specifically handles billions of vectors across a distributed, disk-based system.
Understanding the technical architecture reveals why this poses such a complex challenge. Vector indexing in CockroachDB isn’t a separate table; it’s an index type applied to columns within existing tables. Without an index, vector similarity searches perform brute-force linear scans through all data. This works fine for small datasets but becomes prohibitively slow as tables grow.
The Cockroach Labs engineering team had to solve multiple problems simultaneously: uniform efficiency at massive scale, self-balancing indexes and maintaining accuracy while underlying data changes rapidly.
Kimball explained that the C-SPANN algorithm solves this by creating a hierarchy of partitions for vectors in a very high multi-dimensional space. This hierarchical structure enables efficient similarity searches even across billions of vectors.
Security enhancements address AI compliance challenges
AI applications handle increasingly sensitive data. CockroachDB 25.2 introduces enhanced security features, including row-level security and configurable cipher suites.
These capabilities address regulatory requirements like DORA and NIS2 that many enterprises struggle to meet.
Cockroach Labs’ research shows 79% of technology leaders report being unprepared for new regulations. Meanwhile, 93% cite concerns over the financial impact of outages averaging over $222,000 annually.
“Security is something that is significantly increasing and I think that the big thing about security to realize is that like many things, it’s impacted dramatically by this AI stuff,” Kimball observed.
Operational big data for agentic AI set to drive massive growth
The coming wave of AI-driven workloads creates what Kimball terms “operational big data”—a fundamentally different challenge from traditional big data analytics.
While conventional big data focuses on batch processing large datasets for insights, operational big data demands real-time performance at massive scale for mission-critical applications.
“When you really think about the implications of agentic AI, it’s just a lot more activity hitting APIs and ultimately causing throughput requirements for the underlying databases,” Kimball explained.
The distinction matters enormously. Traditional data systems can tolerate latency and eventual consistency because they support analytical workloads. Operational big data powers live applications where milliseconds matter and consistency can’t be compromised.
AI agents drive this shift by operating at machine speed rather than human pace. Current database traffic comes primarily from humans with predictable usage patterns. Kimball emphasized that AI agents will multiply this activity exponentially.
Performance breakthrough targets AI workload economics
Better economics and efficiency are needed to cope with the growing scale of data access.
Cockroach Labs claims that CockroachDB 25.2 provides a 41% efficiency improvement. Two key optimizations in the release that will help improve overall database efficiency are generic query plans and buffered writes.
Buffered writes solve a particular problem with object-relational mapping (ORM) generated queries that tend to be “chatty.” These read and write data across distributed nodes inefficiently. The buffered writes feature keeps writes in local SQL coordinators. This eliminates unnecessary network round trips.
“What buffered writes do is that they keep all of the writes that you’re planning to do in the local SQL coordinator,” Kimball explained. “So then if you read from something that you’ve just written, it doesn’t have to go back out to the network.”
Generic query plans solve a fundamental inefficiency in high-volume applications. Most enterprise applications use a limited set of transaction types that get executed millions of times with different parameters. Instead of repeatedly replanning identical query structures, CockroachDB now caches and reuses these plans.
Implementing generic query plans in distributed systems presents unique challenges that single-node databases don’t face. CockroachDB must ensure that cached plans remain optimal across geographically distributed nodes with varying latencies.
“In distributed SQL, the generic query plans, they’re kind of a slightly heavier lift, because now you’re talking about a potentially geo-distributed set of nodes with different latencies,” Kimball explained. “You have to be careful with the generic query plan that you don’t use something that’s suboptimal because you’ve sort of conflated like, oh well, this looks the same.”
What this means for enterprises planning AI and data infrastructure
Enterprise data leaders face immediate decisions as agentic AI threatens to overwhelm the current database infrastructure.
The shift from human-driven to AI-driven workloads will create operational big data challenges that many organizations aren’t prepared for. Preparing now for the inevitable growth in data traffic from agentic AI is a strong imperative. For enterprises leading in AI adoption, it makes sense to invest in a distributed database architecture now that can handle both traditional SQL and vector operations at scale.
CockroachDB 25.2 offers one potential option, raising the performance and efficiency of distributed SQL to meet the data challenges of agentic AI. Fundamentally, it’s about having the technology in place to scale both vector and traditional data retrieval.
Daily insights on business use cases with VB Daily
If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.
Read our Privacy Policy
Thanks for subscribing. Check out more VB newsletters here.
An error occured.
