Stay Ahead, Stay ONMINE

Polars vs. Pandas — An Independent Speed Comparison

Overview Introduction — Purpose and Reasons Speed is important when dealing with large amounts of data. If you are handling data in a cloud data warehouse or similar, then the speed of execution for your data ingestion and processing affects the following: As you’ve probably understood from the title, I am going to provide a […]

Overview

  1. Introduction — Purpose and Reasons
  2. Datasets, Tasks, and Settings
  3. Results
  4. Conclusions
  5. Wrapping Up

Introduction — Purpose and Reasons

Speed is important when dealing with large amounts of data. If you are handling data in a cloud data warehouse or similar, then the speed of execution for your data ingestion and processing affects the following:

  • Cloud costs: This is probably the biggest factor. More compute time equals more costs in most billing models. In other billing based on a certain amount of preallocated resources, you could have chosen a lower service level if the speed of your ingestion and processing was higher.
  • Data timeliness: If you have a real-time stream that takes 5 minutes to process data, then your users will have a lag of at least 5 minutes when viewing the data through e.g. a Power BI rapport. This difference can be a lot in certain situations. Even for batch jobs, the data timeliness is important. If you are running a batch job every hour, it is a lot better if it takes 2 minutes rather than 20 minutes.
  • Feedback loop: If your batch job takes only a minute to run, then you get a very quick feedback loop. This probably makes your job more enjoyable. In addition, it enables you to find logical mistakes more quickly.

As you’ve probably understood from the title, I am going to provide a speed comparison between the two Python libraries Polars and Pandas. If you know anything about Pandas and Polars from before, then you know that Polars is the (relatively) new kid on the block proclaiming to be much faster than Pandas. You probably also know that Polars is implemented in Rust, which is a trend for many other modern Python tools like uv and Ruff.

There are two distinct reasons that I want to do a speed comparison test between Polars and Pandas:

Reason 1 — Investigating Claims

Polars boasts on its website with the following claim: Compared to pandas, it (Polars) can achieve more than 30x performance gains.

As you can see, you can follow a link to the benchmarks that they have. It’s commendable that they have speed tests open source. But if you are writing the comparison tests for both your own tool and a competitor’s tool, then there might be a slight conflict of interest. I’m not here saying that they are purposefully overselling the speed of Polars, but rather that they might have unconsciously selected for favorable comparisons.

Hence the first reason to do a speed comparison test is simply to see whether this supports the claims presented by Polars or not.

Reason 2 — Greater granularity

Another reason for doing a speed comparison test between Polars and Pandas is to make it slightly more transparent where the performance gains might be.

This might be already clear if you’re an expert on both libraries. However, speed tests between Polars and Pandas are mostly of interest to those considering switching up their tool. In that case, you might not yet have played around much with Polars because you are unsure if it is worth it.

Hence the second reason to do a speed comparison is simply to see where the speed gains are located.

I want to test both libraries on different tasks both within data ingestion and Data Processing. I also want to consider datasets that are both small and large. I will stick to common tasks within data engineering, rather than esoteric tasks that one seldom uses.

What I will not do

  • I will not give a tutorial on either Pandas or Polars. If you want to learn Pandas or Polars, then a good place to start is their documentation.
  • I will not cover other common data processing libraries. This might be disappointing to a fan of PySpark, but having a distributed compute model makes comparisons a bit more difficult. You might find that PySpark is quicker than Polars on tasks that are very easy to parallelize, but slower on other tasks where keeping all the data in memory reduces travel times.
  • I will not provide full reproducibility. Since this is, in humble words, only a blog post, then I will only explain the datasets, tasks, and system settings that I have used. I will not host a complete running environment with the datasets and bundle everything neatly. This is not a precise scientific experiment, but rather a guide that only cares about rough estimations.

Finally, before we start, I want to say that I like both Polars and Pandas as tools. I’m not financially or otherwise compensated by any of them obviously, and don’t have any incentive other than being curious about their performance ☺️

Datasets, Tasks, and Settings

Let’s first describe the datasets that I will be considering, the tasks that the libraries will perform, and the system settings that I will be running them on.

Datasets

A most companies, you will need to work with both small and (relatively) large datasets. In my opinion, a good data processing tool can tackle both ends of the spectrum. Small datasets challenge the start-up time of tasks, while larger datasets challenge scalability. I will consider two datasets, both can be found on Kaggle:

  • A small dataset on the format CSV: It is no secret that CSV files are everywhere! Often they are quite small, coming from Excel files or database dumps. What better example of this than the classical iris dataset (licensed with CC0 1.0 Universal License) with 5 columns and 150 rows. The iris version I linked to on Kaggle has 6 columns, but the classical one does not have a running index column. So remove this column if you want precisely the same dataset as I have. The iris dataset is certainly small data by any stretch of the imagination.
  • A large dataset on the format Parquet: The parquet format is super useful for large data as it has built-in compression column-wise (along with many other benefits). I will use the Transaction dataset (licensed with Apache License 2.0) representing financial transactions. The dataset has 24 columns and 7 483 766 rows. It is close to 3 GB in its CSV format found on Kaggle. I used Pandas & Pyarrow to convert this to a parquet file. The final result is only 905 MB due to the compression of the parquet file format. This is at the low end of what people call big data, but it will suffice for us.

Tasks

I will do a speed comparison on five different tasks. The first two are I/O tasks, while the last three are common tasks in data processing. Specifically, the tasks are:

  1. Reading data: I will read both files using the respective methods read_csv() and read_parquet() from the two libraries. I will not use any optional arguments as I want to compare their default behavior.
  2. Writing data: I will write both files back to identical copies as new files using the respective methods to_csv() and to_parquet() for Pandas and write_csv() and write_parquet() for Polars. I will not use any optional arguments as I want to compare their default behavior.
  3. Computing Numeric Expressions: For the iris dataset I will compute the expression SepalLengthCm ** 2 + SepalWidthCm as a new column in a copy of the DataFrame. For the transactions dataset, I will simply compute the expression (amount + 10) ** 2 as a new column in a copy of the DataFrame. I will use the standard way to transform columns in Pandas, while in Polars I will use the standard functions all()col(), and alias() to make an equivalent transformation.
  4. Filters: For the iris dataset, I will select the rows corresponding to the criteria SepalLengthCm >= 5.0 and SepalWidthCm <= 4.0. For the transactions dataset, I will select the rows corresponding to the categorical criteria merchant_category == 'Restaurant'. I will use the standard filtering method based on Boolean expressions in each library. In pandas, this is syntax such as df_new = df[df['col'] < 5], while in Polars this is given similarly by the filter() function along with the col() function. I will use the and-operator & for both libraries to combine the two numeric conditions for the iris dataset.
  5. Group By: For the iris dataset, I will group by the Species column and calculate the mean values for each species of the four columns SepalLengthCmSepalWidthCmPetalLengthCm, and PetalWidthCm. For the transactions dataset, I will group by the column merchant_category and count the number of instances in each of the classes within merchant_category. Naturally, I will use the groupby() function in Pandas and the group_by() function in Polars in obvious ways.

Settings

  • System Settings: I’m running all the tasks locally with 16GB RAM and an Intel Core i5–10400F CPU with 6 Cores (12 logical cores through hyperthreading). So it’s not state-of-the-art by any means, but good enough for simple benchmarking.
  • Python: I’m running Python 3.12. This is not the most current stable version (which is Python 3.13), but I think this is a good thing. Commonly the latest supported Python version in cloud data warehouses is one or two versions behind.
  • Polars & Pandas: I’m using Polars version 1.21 and Pandas 2.2.3. These are roughly the newest stable releases to both packages.
  • Timeit: I’m using the standard timeit module in Python and finding the median of 10 runs.

Especially interesting will be how Polars can take advantage of the 12 logical cores through multithreading. There are ways to make Pandas take advantage of multiple processors, but I want to compare Polars and Pandas out of the box without any external modification. After all, this is probably how they are running in most companies around the world.

Results

Here I will write down the results for each of the five tasks and make some minor comments. In the next section I will try to summarize the main points into a conclusion and point out a disadvantage that Polars has in this comparison:

Task 1 — Reading data

The median run time over 10 runs for the reading task was as follows:

# Iris Dataset
Pandas: 0.79 milliseconds
Polars: 0.31 milliseconds

# Transactions Dataset
Pandas: 14.14 seconds
Polars: 1.25 seconds

For reading the Iris dataset, Polars was roughly 2.5x faster than Pandas. For the transactions dataset, the difference is even starker where Polars was 11x faster than Pandas. We can see that Polars is much faster than Pandas for reading both small and large files. The performance difference grows with the size of the file.

Task 2— Writing data

The median run time in seconds over 10 runs for the writing task was as follows:

# Iris Dataset
Pandas: 1.06 milliseconds
Polars: 0.60 milliseconds

# Transactions Dataset
Pandas: 20.55 seconds
Polars: 10.39 seconds

For writing the iris dataset, Polars was around 75% faster than Pandas. For the transactions dataset, Polars was roughly 2x as fast as Pandas. Again we see that Polars is faster than Pandas, but the difference here is smaller than for reading files. Still, a difference of close to 2x in performance is a massive difference.

Task 3 —Computing Numeric Expressions

The median run time over 10 runs for the computing numeric expressions task was as follows:

# Iris Dataset
Pandas: 0.35 milliseconds
Polars: 0.15 milliseconds

# Transactions Dataset
Pandas: 54.58 milliseconds
Polars: 14.92 milliseconds

For computing the numeric expressions, Polars beats Pandas with a rate of roughly 2.5x for the iris dataset, and roughly 3.5x for the transactions dataset. This is a pretty massive difference. It should be noted that computing numeric expressions is fast in both libraries even for the large dataset transactions.

Task 4 — Filters

The median run time over 10 runs for the filters task was as follows:

# Iris Dataset
Pandas: 0.40 milliseconds
Polars: 0.15 milliseconds

# Transactions Dataset
Pandas: 0.70 seconds
Polars: 0.07 seconds

For filters, Polars is 2.6x faster on the iris dataset and 10x as fast on the transactions dataset. This is probably the most surprising improvement for me since I suspected that the speed improvements for filtering tasks would not be this massive.

Task 5 — Group By

The median run time over 10 runs for the group by task was as follows:

# Iris Dataset
Pandas: 0.54 milliseconds
Polars: 0.18 milliseconds

# Transactions Dataset
Pandas: 334 milliseconds 
Polars: 126 milliseconds

For the group-by task, there is a 3x speed improvement for Polars in the case of the iris dataset. For the transactions dataset, there is a 2.6x improvement of Polars over Pandas.

Conclusions

Before highlighting each point below, I want to point out that Polars is somewhat in an unfair position throughout my comparisons. It is often that multiple data transformations are performed after one another in practice. For this, Polars has the lazy API that optimizes this before calculating. Since I have considered single ingestions and transformations, this advantage of Polars is hidden. How much this would improve in practical situations is not clear, but it would probably make the difference in performance even bigger.

Data Ingestion

Polars is significantly faster than Pandas for both reading and writing data. The difference is largest in reading data, where we had a massive 11x difference in performance for the transactions dataset. On all measurements, Polars performs significantly better than Pandas.

Data Processing

Polars is significantly faster than Pandas for common data processing tasks. The difference was starkest for filters, but you can at least expect a 2–3x difference in performance across the board.

Final Verdict

Polars consistently performs faster than Pandas on all tasks with both small and large data. The improvements are very significant, ranging from a 2x improvement to a whopping 11x improvement. When it comes to reading large parquet files or performing filter statements, Polars is leaps and bound in front of Pandas.

However…Nowhere here is Polars remotely close to performing 30x better than Pandas, as Polars’ benchmarking suggests. I would argue that the tasks that I have presented are standard tasks performed on realistic hardware infrastructure. So I think that my conclusions give us some room to question whether the claims put forward by Polars give a realistic picture of the improvements that you can expect.

Nevertheless, I am in no doubt that Polars is significantly faster than Pandas. Working with Polars is not more complicated than working with Pandas. So for your next data engineering project where the data fits in memory, I would strongly suggest that you opt for Polars rather than Pandas.

Wrapping Up

Photo by Spencer Bergen on Unsplash

I hope this blog post gave you a different perspective on the speed difference between Polars and Pandas. Please comment if you have a different experience with the performance difference between Polars and Pandas than what I have presented.

If you are interested in AI, Data Science, or data engineering, please follow me or connect on LinkedIn.

Like my writing? Check out some of my other posts:

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

Netgear’s enterprise ambitions grow with SASE acquisition

Addressing the SME security gap The acquisition directly addresses a portfolio gap that Netgear (Nasdaq:NTGR) has identified through customer feedback.  According to Badjate, customers have been saying that they like the Netgear products, but they also really need more security capabilities. Netgear’s target market focuses on organizations with fewer than

Read More »

IBM’s cloud crisis deepens: 54 services disrupted in latest outage

Rawat said IBM’s incident response appears slow and ineffective, hinting at procedural or resource limitations. The situation also raises concerns about IBM Cloud’s adherence to zero trust principles, its automation in threat response, and the overall enforcement of security controls. “The recent IBM Cloud outages are part of a broader

Read More »

AMD acquires Brium to loosen Nvidia’s grip on AI software

According to Greyhound Research, nearly 67 percent of global CIOs identify software maturity, particularly in middleware and runtime optimization, as the primary barrier to adopting alternatives to Nvidia. Brium’s compiler-based approach to AI inference could ease this dependency. While Nvidia still leads among developers, AMD’s expanding open-source stack, now backed

Read More »

3 ways to streamline cloud adoption and cloud security

In today’s cloud-first world, speed and agility are the currency of innovation. Organizations are under pressure to deliver applications faster, more securely, and across increasingly distributed cloud environments. As a result, your developers are being asked to ship code at a rapid pace, often on a weekly or even daily

Read More »

Oil Rises as Traders See Trump-Xi Call as Sign of Easing Tension

Oil rallied as signs the US-China trade war may ease outweighed concerns about increased OPEC+ crude output. West Texas Intermediate rose 0.8% to settle above $63 a barrel after China’s official news agency reported that Presidents Donald Trump and Xi Jinping spoke over the phone. The conversation is injecting optimism that the leaders will find an off-ramp to a trade dispute that has threatened economic growth and fuel demand.  “A meeting between Xi and Trump is supportive of risk assets and crude oil as it raises the prospect of progress on tariff negotiations,” said Rebecca Babin, a senior energy trader at CIBC Private Wealth Group. “Tariff uncertainty has been a key overhang for crude markets, with unresolved tensions prompting downward revisions to demand forecasts.” Still, animosity between the world’s two largest economies is adding to unease about the US shale patch. China avoided buying American crude for a second straight month, the longest run since the pandemic, a negative for American producers that partly depend on overseas demand. Traders are also bracing for potential near-term crude supply crunch. Wildfires in Canada have disrupted almost 350,000 barrels of daily oil output from the country, potentially limiting flows to the storage hub at Cushing, Oklahoma, and export terminals on the US Gulf Coast. In the US, crude inventories fell by 4.3 million barrels last week, the most since November, data from the Energy Information Administration showed. Limiting crude’s gains, Saudi Arabia wants the Organization of the Petroleum Exporting Countries and its allies to continue to add at least 411,000 barrels a day of output in August and potentially September, according to people familiar with the matter. Yet rising fuel demand during the summer could help the market absorb those extra barrels. Crude has fallen about 12% this year on fears a US-led trade war will sap economic growth and energy demand. Earlier

Read More »

Petronas to Cut 10 Pct of Workforce on Profit Slump

Petronas, Malaysia’s state-owned oil and gas company, will cut about 10% of its workforce in a firm-wide restructuring as it looks to reduce costs due to falling crude prices.  The company will reduce headcount by more than 5,000 people, and all those affected will be informed in stages through next year, Chief Executive Officer Muhammad Taufik said in a briefing in Kuala Lumpur on Thursday. It will also freeze hiring until December 2026, he said. “The margins are shrinking, the fields are getting smaller,” Taufik said. “It will be challenging to meet dividend targets” with current oil prices, he said.  The oil price slump — coupled with declining output from its older assets — will pose a challenge for Malaysia’s government, which derived 10% of its revenue from Petroliam Nasional Bhd., the full name of the company, in 2024. The energy producer not only anchors the nation’s energy sector, but also plays a key role in funding infrastructure, education, and social programs through dividends and taxes. Petronas sets its budget based on the Brent oil at around $75 to $80 per barrel, Taufik said. The global benchmark is currently trading near $65, down about 13% this year, as trade tensions threaten global growth, and OPEC+ restores production. Petronas’ net income slid 32% in 2024, following a 21% drop in 2023.  WHAT DO YOU THINK? Generated by readers, the comments included herein do not reflect the views and opinions of Rigzone. All comments are subject to editorial review. Off-topic, inappropriate or insulting comments will be removed.

Read More »

Large load tariffs have a problem. Clean transition tariffs are the solution.

Large load tariffs have a problem. Clean transition tariffs are the solution. | Utility Dive Skip to main content An article from Opinion These tariffs were designed to offer large loads access to renewable energy, but they could be expanded to baseload generation to remove at-risk generation from the utility’s books. Published June 5, 2025 By Ben Hertz-Shargel Construction of Amazon’s Mid-Atlantic Region data center in Loudon County, Virginia, progresses on Feb. 10, 2024. Gerville via Getty Images Ben Hertz-Shargel is global head of grid edge at Wood Mackenzie. The obligation to satisfy an unprecedented number of large load requests, while at the same supporting corporate and state clean energy goals, has presented an enormous ratemaking challenge to utilities. They have responded by developing two types of tariffs to deal with large loads: Clean transition tariffs, which allow large load customers to contract directly with renewable developers, and large load tariffs, designed to protect utility shareholders and other customers from the cost and stranded asset risk of large load-caused infrastructure. We recently completed an analysis of 20 large load tariffs at varying stages of maturity — from recent proposals to updates of longstanding large commercial and industrial tariffs — and have concluded that these tariffs cannot protect both shareholders and other customers. One of the primary mechanisms that large load tariffs use to ensure utility cost recovery is to set long minimum contract terms for customers, with penalties for early exit. But the minimum terms are nearly always 12 years or less, with customers able to exit as soon as five years in nearly all cases. These periods are far shorter than the greater than 20-year recovery time for a new power plant. In the vast majority of cases, exit penalties amount to 3 to 5 years of minimum monthly

Read More »

Venezuela Partners With Smaller Oil Firms as Chevron Scales Back

Venezuela’s state-run oil company has signed at least nine new deals with foreign service providers, including two Chinese firms, in an effort to keep dollars flowing into the economy after US sanctions forced Chevron Corp. to end production, according to people familiar with the agreements. The contracts call for the companies to operate wells that already have been drilled and grant the exclusive right to sell the output, a departure from long-standing practice in the country where state-controlled Petroleos de Venezuela SA has always maintained exclusive trading rights, the people said, asking not to be identified discussing private contracts. At least one of the companies has decided not to go forward because it couldn’t get a US license to operate there, according to one of the people. The accords illuminate President Nicolas Maduro’s strategy for shoring up the economy and filling the void left by Chevron and other Western majors after President Donald Trump’s administration refused to extend licenses that allowed them to operate in the country despite sanctions. Chevron accounted for almost a quarter of Venezuela’s oil production, the country’s most important industry and biggest source of foreign currency.  Chevron’s license allowing it to produce and export crude to the US ended in early April and the company was allowed until May 27 to wrap up work. Permits for US service providers Halliburton Co., Schlumberger NV, Baker Hughes Co. and Weatherford International Plc expired in early May. “PDVSA has a plan to keep producing oil despite the US’s unilateral coercive measures,” Vice President and Oil Minister Delcy Rodriguez said May 29. PDVSA and Venezuela’s oil ministry didn’t reply to a request for comment. The new agreements call for each of the foreign companies to get control over at least one block of land in either Zulia state or the Orinoco Belt area, the two richest oil producing regions,

Read More »

USA Crude Oil Inventories Drop 4.3 Million Barrels Week on Week

U.S. commercial crude oil inventories, excluding those in the Strategic Petroleum Reserve (SPR), decreased by 4.3 million barrels from the week ending May 23 to the week ending May 30, the U.S. Energy Information Administration (EIA) highlighted in its latest weekly petroleum status report. That report was released on June 4 and included data for the week ending May 30. It showed that crude oil stocks, not including the SPR, stood at 436.1 million barrels on May 30, 440.4 million barrels on May 23, and 455.9 million barrels on May 31, 2024. Crude oil in the SPR stood at 401.8 million barrels on May 30, 401.3 million barrels on May 23, and 370.2 million barrels on May 31, 2024, the report revealed. Total petroleum stocks – including crude oil, total motor gasoline, fuel ethanol, kerosene type jet fuel, distillate fuel oil, residual fuel oil, propane/propylene, and other oils – stood at 1.637 billion barrels on May 30, the report highlighted. Total petroleum stocks were up 13.4 million barrels week on week and down 9.7 million barrels year on year, the report showed. “At 436.1 million barrels, U.S. crude oil inventories are about seven percent below the five year average for this time of year,” the EIA noted in its latest weekly petroleum status report. “Total motor gasoline inventories increased by 5.2 million barrels from last week and are about one percent below the five year average for this time of year. Both finished gasoline inventories and blending components inventories increased last week,” it added. “Distillate fuel inventories increased by 4.2 million barrels last week and are about 16 percent below the five year average for this time of year. Propane/propylene inventories increased by 6.8 million barrels from last week and are two percent above the five year average for this

Read More »

Diverse market regions highlight resource adequacy challenge at FERC conference

Grid operators around the country face rapidly changing and highly particular resource adequacy challenges that mean a one-sized approach to reliability is not feasible, they told the Federal Energy Regulatory Commission on Wednesday. They were speaking at a two-day, commissioner-led conference to discuss issues related to resource adequacy constructs. The first panel featured the heads of regional transmission organizations and independent system operators, as well as the North American Electric Reliability Corp., which oversees them. NERC has been conducting seasonal and longer-term resource adequacy assessments for decades, but for most of the reliability watchdog’s history “these were some of the dullest reports we ever created,” President and CEO Jim Robb told regulators. “They weren’t particularly interesting because everything looked pretty good.” But then in 2018, NERC’s long-term resource adequacy assessment showed a material expectation of unserved energy, Robb said. And in August of 2020, California experienced a significant load shed event. “Since then, our analyses have shown a growing risk of unserved energy across the continent,” Robb said. “There are a number of interrelated factors that account for that degradation and risk.” “Disorderly generator retirements” have led NERC’s list of resource adequacy concerns for several years, Robb said. And while several ISO and RTO areas appear to be resource adequate “through the narrow lens of capacity,” he noted that future energy shortfalls “are looming because the resource mix is not supported with the right levels of dispatchable generation with secure fuel to balance supply and demand fluctuations.” Since 2020, the California ISO has added about 25 GW of new generation capacity, including approximately 7 GW last year. The new capacity includes over 12 GW of battery storage, CAISO President and CEO Elliot Mainzer said. “We now have a diverse portfolio of solar, wind, natural gas, hydro electric, geothermal, nuclear and energy storage resources,

Read More »

LiquidStack launches cooling system for high density, high-powered data centers

The CDU is serviceable from the front of the unit, with no rear or end access required, allowing the system to be placed against the wall. The skid-mounted system can come with rail and overhead piping pre-installed or shipped as separate cabinets for on-site assembly. The single-phase system has high-efficiency dual pumps designed to protect critical components from leaks and a centralized design with separate pump and control modules reduce both the number of components and complexity. “AI will keep pushing thermal output to new extremes, and data centers need cooling systems that can be easily deployed, managed, and scaled to match heat rejection demands as they rise,” said Joe Capes, CEO of LiquidStack in a statement. “With up to 10MW of cooling capacity at N, N+1, or N+2, the GigaModular is a platform like no other—we designed it to be the only CDU our customers will ever need. It future-proofs design selections for direct-to-chip liquid cooling without traditional limits or boundaries.”

Read More »

Enterprises face data center power design challenges

” Now, with AI, GPUs need data to do a lot of compute and send that back to another GPU. That connection needs to be close together, and that is what’s pushing the density, the chips are more powerful and so on, but the necessity of everything being close together is what’s driving this big revolution,” he said. That revolution in new architecture is new data center designs. Cordovil said that instead of putting the power shelves within the rack, system administrators are putting a sidecar next to those racks and loading the sidecar with the power system, which serves two to four racks. This allows for more compute per rack and lower latency since the data doesn’t have to travel as far. The problem is that 1 mW racks are uncharted territory and no one knows how to manage the power, which is considerable now. ”There’s no user manual that says, hey, just follow this and everything’s going to be all right. You really need to push the boundaries of understanding how to work. You need to start designing something somehow, so that is a challenge to data center designers,” he said. And this brings up another issue: many corporate data centers have power plugs that are like the ones that you have at home, more or less, so they didn’t need to have an advanced electrician certification. “We’re not playing with that power anymore. You need to be very aware of how to connect something. Some of the technicians are going to need to be certified electricians, which is a skills gap in the market that we see in most markets out there,” said Cordovil. A CompTIA A+ certification will teach you the basics of power, but not the advanced skills needed for these increasingly dense racks. Cordovil

Read More »

HPE Nonstop servers target data center, high-throughput applications

HPE has bumped up the size and speed of its fault-tolerant Nonstop Compute servers. There are two new servers – the 8TB, Intel Xeon-based Nonstop Compute NS9 X5 and Nonstop Compute NS5 X5 – aimed at enterprise customers looking to upgrade their transaction processing network infrastructure or support larger application workloads. Like other HPE Nonstop systems, the two new boxes include compute, software, storage, networking and database resources as well as full-system clustering and HPE’s specialized Nonstop operating system. The flagship NS9 X5 features support for dual-fabric HDR200 InfiniBand interconnect, which effectively doubles the interconnect bandwidth between it and other servers compared to the current NS8 X4, according to an HPE blog detailing the new servers. It supports up to 270 networking ports per NS9 X system, can be clustered with up to 16 other NS9 X5s, and can support 25 GbE network connectivity for modern data center integration and high-throughput applications, according to HPE.

Read More »

AI boom exposes infrastructure gaps: APAC’s data center demand to outstrip supply by 42%

“Investor confidence in data centres is expected to strengthen over the remainder of the decade,” the report said. “Strong demand and solid underlying fundamentals fuelled by AI and cloud services growth will provide a robust foundation for investors to build scale.” Enterprise strategies must evolve With supply constrained and prices rising, CBRE recommended that enterprises rethink data center procurement models. Waiting for optimal sites or price points is no longer viable in many markets. Instead, enterprises should pursue early partnerships with operators that have robust development pipelines and focus on securing power-ready land. Build-to-suit models are becoming more relevant, especially for larger capacity requirements. Smaller enterprise facilities — those under 5MW — may face sustainability challenges in the long term. The report suggested that these could become “less relevant” as companies increasingly turn to specialized colocation and hyperscale providers. Still, traditional workloads will continue to represent up to 50% of total demand through 2030, preserving value in existing facilities for non-AI use cases, the report added. The region’s projected 15 to 25 GW gap is more than a temporary shortage — it signals a structural shift, CBRE said. Enterprises that act early to secure infrastructure, invest in emerging markets, and align with power availability will be best positioned to meet digital transformation goals. “Those that wait may find themselves locked out of the digital infrastructure they need to compete,” the report added.

Read More »

Cisco bolsters DNS security package

The software can block domains associated with phishing, malware, botnets, and other high-risk categories such as cryptomining or new domains that haven’t been reported previously. It can also create custom block and allow lists and offers the ability to pinpoint compromised systems using real-time security activity reports, Brunetto wrote. According to Cisco, many organizations leave DNS resolution to their ISP. “But the growth of direct enterprise internet connections and remote work make DNS optimization for threat defense, privacy, compliance, and performance ever more important,” Cisco stated. “Along with core security hygiene, like a patching program, strong DNS-layer security is the leading cost-effective way to improve security posture. It blocks threats before they even reach your firewall, dramatically reducing the alert pressure your security team manages.” “Unlike other Secure Service Edge (SSE) solutions that have added basic DNS security in a ‘checkbox’ attempt to meet market demand, Cisco Secure Access – DNS Defense embeds strong security into its global network of 50+ DNS data centers,” Brunetto wrote. “Among all SSE solutions, only Cisco’s features a recursive DNS architecture that ensures low-latency, fast DNS resolution, and seamless failover.”

Read More »

HPE Aruba unveils raft of new switches for data center, campus modernization

And in large-scale enterprise environments embracing collapsed-core designs, the switch acts as a high-performance aggregation layer. It consolidates services, simplifies network architecture, and enforces security policies natively, reducing complexity and operational cost, Gray said. In addition, the switch offers the agility and security required at colocation facilities and edge sites. Its integrated Layer 4 stateful security and automation-ready platform enable rapid deployment while maintaining robust control and visibility over distributed infrastructure, Gray said. The CX 10040 significantly expands the capacity it can provide and the roles it can serve for enterprise customers, according to one industry analyst. “From the enterprise side, this expands on the feature set and capabilities of the original 10000, giving customers the ability to run additional services directly in the network,” said Alan Weckel, co-founder and analyst with The 650 Group. “It helps drive a lower TCO and provide a more secure network.”  Aimed as a VMware alternative Gray noted that HPE Aruba is combining its recently announced Morpheus VM Essentials plug-in package, which offers a hypervisor-based package aimed at hybrid cloud virtualization environments, with the CX 10040 to deliver a meaningful alternative to Broadcom’s VMware package. “If customers want to get out of the business of having to buy VM cloud or Cloud Foundation stuff and all of that, they can replace the distributed firewall, microsegmentation and lots of the capabilities found in the old VMware NSX [networking software] and the CX 10k, and Morpheus can easily replace that functionality [such as VM orchestration, automation and policy management],” Gray said. The 650 Group’s Weckel weighed in on the idea of the CX 10040 as a VMware alternative:

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »