Stay Ahead, Stay ONMINE

Why Data Scientists Should Care about Containers — and Stand Out with This Knowledge

“I train models, analyze data and create dashboards — why should I care about Containers?” Many people who are new to the world of data science ask themselves this question. But imagine you have trained a model that runs perfectly on your laptop. However, error messages keep popping up in the cloud when others access […]

“I train models, analyze data and create dashboards — why should I care about Containers?”

Many people who are new to the world of data science ask themselves this question. But imagine you have trained a model that runs perfectly on your laptop. However, error messages keep popping up in the cloud when others access it — for example because they are using different library versions.

This is where containers come into play: They allow us to make machine learning models, data pipelines and development environments stable, portable and scalable — regardless of where they are executed.

Let’s take a closer look.

Table of Contents
1 — Containers vs. Virtual Machines: Why containers are more flexible than VMs
2 — Containers & Data Science: Do I really need Containers? And 4 reasons why the answer is yes.
3 — First Practice, then Theory: Container creation even without much prior knowledge
4 — Your 101 Cheatsheet: The most important Docker commands & concepts at a glance
Final Thoughts: Key takeaways as a data scientist
Where Can You Continue Learning?

1 — Containers vs. Virtual Machines: Why containers are more flexible than VMs

Containers are lightweight, isolated environments. They contain applications with all their dependencies. They also share the kernel of the host operating system, making them fast, portable and resource-efficient.

I have written extensively about virtual machines (VMs) and virtualization in ‘Virtualization & Containers for Data Science Newbiews’. But the most important thing is that VMs simulate complete computers and have their own operating system with their own kernel on a hypervisor. This means that they require more resources, but also offer greater isolation.

Both containers and VMs are virtualization technologies.

Both make it possible to run applications in an isolated environment.

But in the two descriptions, you can also see the 3 most important differences:

  • Architecture: While each VM has its own operating system (OS) and runs on a hypervisor, containers share the kernel of the host operating system. However, containers still run in isolation from each other. A hypervisor is the software or firmware layer that manages VMs and abstracts the operating system of the VMs from the physical hardware. This makes it possible to run multiple VMs on a single physical server.
  • Resource consumption: As each VM contains a complete OS, it requires a lot of memory and CPU. Containers, on the other hand, are more lightweight because they share the host OS.
  • Portability: You have to customize a VM for different environments because it requires its own operating system with specific drivers and configurations that depend on the underlying hardware. A container, on the other hand, can be created once and runs anywhere a container runtime is available (Linux, Windows, cloud, on-premise). Container runtime is the software that creates, starts and manages containers — the best-known example is Docker.
Created by the author

You can experiment faster with Docker — whether you’re testing a new ML model or setting up a data pipeline. You can package everything in a container and run it immediately. And you don’t have any “It works on my machine”-problems. Your container runs the same everywhere — so you can simply share it.

2 — Containers & Data Science: Do I really need Containers? And 4 reasons why the answer is yes.

As a data scientist, your main task is to analyze, process and model data to gain valuable insights and predictions, which in turn are important for management.

Of course, you don’t need to have the same in-depth knowledge of containers, Docker or Kubernetes as a DevOps Engineer or a Site Reliability Engineer (SRE). Nevertheless, it is worth having container knowledge at a basic level — because these are 4 examples of where you will come into contact with it sooner or later:

Model deployment

You are training a model. You not only want to use it locally but also make it available to others. To do this, you can pack it into a container and make it available via a REST API.

Let’s look at a concrete example: Your trained model runs in a Docker container with FastAPI or Flask. The server receives the requests, processes the data and returns ML predictions in real-time.

Reproducibility and easier collaboration

ML models and pipelines require specific libraries. For example, if you want to use a deep learning model like a Transformer, you need TensorFlow or PyTorch. If you want to train and evaluate classic machine learning models, you need Scikit-Learn, NumPy and Pandas. A Docker container now ensures that your code runs with exactly the same dependencies on every computer, server or in the cloud. You can also deploy a Jupyter Notebook environment as a container so that other people can access it and use exactly the same packages and settings.

Cloud integration

Containers include all packages, dependencies and configurations that an application requires. They therefore run uniformly on local computers, servers or cloud environments. This means you don’t have to reconfigure the environment.

For example, you write a data pipeline script. This works locally for you. As soon as you deploy it as a container, you can be sure that it will run in exactly the same way on AWS, Azure, GCP or the IBM Cloud.

Scaling with Kubernetes

Kubernetes helps you to orchestrate containers. But more on that below. If you now get a lot of requests for your ML model, you can scale it automatically with Kubernetes. This means that more instances of the container are started.

3 — First Practice, then Theory: Container creation even without much prior knowledge

Let’s take a look at an example that anyone can run through with minimal time — even if you haven’t heard much about Docker and containers. It took me 30 minutes.

We’ll set up a Jupyter Notebook inside a Docker container, creating a portable, reproducible Data Science environment. Once it’s up and running, we can easily share it with others and ensure that everyone works with the exact same setup.

0 — Install Docker Dekstop and create a project directory

To be able to use containers, we need Docker Desktop. To do this, we download Docker Desktop from the official website.

Now we create a new folder for the project. You can do this directly in the desired folder. I do this via Terminal — on Windows with Windows + R and open CMD.

We use the following command:

Screenshot taken by the author

1. Create a Dockerfile

Now we open VS Code or another editor and create a new file with the name ‘Dockerfile’. We save this file without an extension in the same directory. Why doesn’t it need an extension?

We add the following code to this file:

# Use the official Jupyter notebook image with SciPy
FROM jupyter/scipy-notebook:latest  

# Set the working directory inside the container
WORKDIR /home/jovyan/work  

# Copy all local files into the container
COPY . .

# Start Jupyter Notebook without token
CMD ["start-notebook.sh", "--NotebookApp.token=''"]

We have thus defined a container environment for Jupyter Notebook that is based on the official Jupyter SciPy Notebook image.

First, we define with FROM on which base image the container is built. jupyter/scipy-notebook:latest is a preconfigured Jupyter notebook image and contains libraries such as NumPy, SiPy, Matplotlib or Pandas. Alternatively, we could also use a different image here.

With WORKDIR we set the working directory within the container. /home/jovyan/work is the default path used by Jupyter. User jovyan is the default user in Jupyter Docker images. Another directory could also be selected — but this directory is best practice for Jupyter containers.

With COPY . . we copy all files from the local directory — in this case the Dockerfile, which is located in the jupyter-docker directory — to the working directory /home/jovyan/work in the container.

With CMD [“start-notebook.sh”, “ — NotebookApp.token=‘’’”] we specify the default start command for the container, specify the start script for Jupyter Notebook and define that the notebook is started without a token — this allows us to access it directly via the browser.

2. Create the Docker image

Next, we will build the Docker image. Make sure you have the previously installed Docker desktop open. We now go back to the terminal and use the following command:

cd jupyter-docker
docker build -t my-jupyter .

With cd jupyter-docker we navigate to the folder we created earlier. With docker build we create a Docker image from the Dockerfile. With -t my-jupyter we give the image a name. The dot means that the image will be built based on the current directory. What does that mean? Note the space between the image name and the dot.

The Docker image is the template for the container. This image contains everything needed for the application such as the operating system base (e.g. Ubuntu, Python, Jupyter), dependencies such as Pandas, Numpy, Jupyter Notebook, the application code and the startup commands. When we “build” a Docker image, this means that Docker reads the Dockerfile and executes the steps that we have defined there. The container can then be started from this template (Docker image).

We can now watch the Docker image being built in the terminal.

Screenshot taken by the author

We use docker images to check whether the image exists. If the output my-jupyter appears, the creation was successful.

docker images

If yes, we see the data for the created Docker image:

Screenshot taken by the author

3. Start Jupyter container

Next, we want to start the container and use this command to do so:

docker run -p 8888:8888 my-jupyter

We start a container with docker run. First, we enter the specific name of the container that we want to start. And with -p 8888:8888 we connect the local port (8888) with the port in the container (8888). Jupyter runs on this port. I do not understand.

Alternatively, you can also perform this step in Docker desktop:

Screenshot taken by the author

4. Open Jupyter Notebook & create a test notebook

Now we open the URL [http://localhost:8888](http://localhost:8888/) in the browser. You should now see the Jupyter Notebook interface.

Here we will now create a Python 3 notebook and insert the following Python code into it.

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 10, 100)
y = np.sin(x)

plt.plot(x, y)
plt.title("Sine Wave")
plt.show()

Running the code will display the sine curve:

Screenshot taken by the author

5. Terminate the container

At the end, we end the container either with ‘CTRL + C’ in the terminal or in Docker Desktop.

With docker ps we can check in the terminal whether containers are still running and with docker ps -a we can display the container that has just been terminated:

Screenshot taken by the author

6. Share your Docker image

If you now want to upload your Docker image to a registry, you can do this with the following command. This will upload your image to Docker Hub (you need a Docker Hub account for this). You can also upload it to a private registry of AWS Elastic Container, Google Container, Azure Container or IBM Cloud Container.

docker login

docker tag my-jupyter your-dockerhub-name/my-jupyter:latest

docker push dein-dockerhub-name/mein-jupyter:latest

If you then open Docker Hub and go to your repositories in your profile, the image should be visible.

This was a very simple example to get started with Docker. If you want to dive a little deeper, you can deploy a trained ML model with FastAPI via a container.

4 — Your 101 Cheatsheet: The most important Docker commands & concepts at a glance

You can actually think of a container like a shipping container. Regardless of whether you load it onto a ship (local computer), a truck (cloud server) or a train (data center) — the content always remains the same.

The most important Docker terms

  • Container: Lightweight, isolated environment for applications that contains all dependencies.
  • Docker: The most popular container platform that allows you to create and manage containers.
  • Docker Image: A read-only template that contains code, dependencies and system libraries.
  • Dockerfile: Text file with commands to create a Docker image.
  • Kubernetes: Orchestration tool to manage many containers automatically.

The basic concepts behind containers

  • Isolation: Each container contains its own processes, libraries and dependencies
  • Portability: Containers run wherever a container runtime is installed.
  • Reproducibility: You can create a container once and it runs exactly the same everywhere.

The most basic Docker commands

docker --version # Check if Docker is installed
docker ps # Show running containers
docker ps -a # Show all containers (including stopped ones)
docker images # List of all available images
docker info # Show system information about the Docker installation

docker run hello-world # Start a test container
docker run -d -p 8080:80 nginx # Start Nginx in the background (-d) with port forwarding
docker run -it ubuntu bash # Start interactive Ubuntu container with bash

docker pull ubuntu # Load an image from Docker Hub
docker build -t my-app . # Build an image from a Dockerfile

Final Thoughts: Key takeaways as a data scientist

👉 With Containers you can solve the “It works on my machine” problem. Containers ensure that ML models, data pipelines, and environments run identically everywhere, independent of OS or dependencies.

👉 Containers are more lightweight and flexible than virtual machines. While VMs come with their own operating system and consume more resources, containers share the host operating system and start faster.

👉 There are three key steps when working with containers: Create a Dockerfile to define the environment, use docker build to create an image, and run it with docker run — optionally pushing it to a registry with docker push.

And then there’s Kubernetes.

A term that comes up a lot in this context: An orchestration tool that automates container management, ensuring scalability, load balancing and fault recovery. This is particularly useful for microservices and cloud applications.

Before Docker, VMs were the go-to solution (see more in ‘Virtualization & Containers for Data Science Newbiews’.) VMs offer strong isolation, but require more resources and start slower.

So, Docker was developed in 2013 by Solomon Hykes to solve this problem. Instead of virtualizing entire operating systems, containers run independently of the environment — whether on your laptop, a server or in the cloud. They contain all the necessary dependencies so that they work consistently everywhere.

I simplify tech for curious minds🚀 If you enjoy my tech insights on Python, data science, Data Engineering, machine learning and AI, consider subscribing to my substack.

Where Can You Continue Learning?

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

AI dominates Gartner’s top strategic technology trends for 2026

“AI supercomputing platforms integrate CPUs, GPUs, AI ASICs, neuromorphic and alternative computing paradigms, enabling organizations to orchestrate complex workloads while unlocking new levels of performance, efficiency and innovation. These systems combine powerful processors, massive memory, specialized hardware, and orchestration software to tackle data-intensive workloads in areas like machine learning, simulation,

Read More »

IBM signs up Groq for speedy AI inferencing option

The technology involved in the partnership will let customers use watsonx capabilities in a familiar way and allow them to use their preferred tools while accelerating inference with GroqCloud, IBM stated. “This integration will address key AI developer needs, including inference orchestration, load balancing, and hardware acceleration, ultimately streamlining the

Read More »

Wi-Fi 8 is coming and it’s going to make AI a lot faster

Traditional Wi-Fi optimizes for 90/10 download-to-upload ratios. AI applications push toward 50/50 symmetry. Voice assistants, edge AI processing and sensor data all require consistent uplink capacity. “AI traffic looks different,” Szymanski explained. “It’s increasingly symmetric, with heavy uplink demands from these edge devices. These devices are pushing all this data

Read More »

Strategists Expect WoW USA Crude Inventory Drop

In a report sent to Rigzone by the Macquarie team late Monday, Macquarie strategists, including Walt Chancellor, revealed that they are forecasting that U.S. crude inventories will be down by 2.5 million barrels for the week ending October 17. “This follows a 3.5 million barrel build in the prior week, with the crude balance realizing modestly tighter than our expectations,” the strategists said in the report. “For this week’s balance, from refineries, we model an increase in crude runs (+0.3 million barrels per day) following a surprisingly weak print last week; turnaround timing represents a source of meaningful potential variability in this week’s stats,” they added. “Among net imports, we model a moderate reduction, with exports higher (+0.6 million barrels per day) and imports up slightly (+0.1 million barrels per day) on a nominal basis,” they continued. The strategists also warned in the report that the timing of cargoes remains a source of potential volatility in this week’s crude balance. “From implied domestic supply (prod.+adj.+transfers), we look for a slight decrease (-0.1 million barrels per day) on a nominal basis this week,” the strategists went on to note. “Rounding out the picture, we anticipate a slightly larger increase (+0.9 million barrels) in SPR [Strategic Petroleum Reserve] stocks this week,” they added. The strategists stated in the report that, “among products”, they “look for draws in gasoline (-4.0 million barrels) and distillate (-1.2 million barrels), with a build in jet (+0.5 million barrels)”. “We model implied demand for these three products at ~14.4 million barrels per day for the week ending October 17,” the strategists added. In its latest weekly petroleum status report at the time of writing, which was released on October 16 and included data for the week ending October 10, the U.S. Energy Information Administration (EIA) highlighted that

Read More »

Egypt Seeks to Free Up Gas for Export

Egypt is embarking on a plan to buy more oil products for power generation as the cash-strapped North African nation frees up gas for LNG exports, as part of efforts to repay money it owes to foreign operators. State-owned Egyptian General Petroleum Corp. plans to buy more than a million tons of diesel, gasoline and butane gas for delivery in November, up 60 percent from the same period last year, according to people familiar with the matter, who asked not to be identified as they’re not authorized to speak to the media. Egypt’s energy ministry didn’t respond to a request for comment. Egypt is seeking to encourage renewed investment by foreign energy companies, which have reduced financing in the country after years of waiting for the government to repay money it owes them. Declining domestic gas output amid surging local demand led Egypt to become a net importer of liquefied natural gas last year in order to avoid blackouts. That has added new financing strains on the government which is emerging from its worst economic crisis in decades. To break this cycle, Cairo decided to allow foreign energy operators to export their share of local gas production as LNG as a way to get their arrears paid and to go ahead with investments in Egypt’s gas output. Three LNG cargoes have been exported since September, including one that the government said was shipped from Egypt’s Idku terminal on behalf of Shell Plc. The government is now in talks with the foreign energy companies to allow them to produce volumes for two shipments every month for loading between November and March from Idku, according to the people.  Egypt’s production of crude oil and condensate fell to 486,000 barrels a day in July, the lowest in decades, according to data from Joint Oil

Read More »

Woodside Bumps Up Production Projection for 2025

Woodside Energy Group Ltd has raised its projected 2025 production from 188-195 million barrels of oil equivalent (MMboe) to 192-197 MMboe due to “continued strong performance across assets”. The Australian company saw a one percent increase in output in the third quarter, totaling 50.8 MMboe or 552,000 barrels of oil equivalent a day, according to a stock filing Wednesday. Production consisted of 1.83 billion standard cubic feet a day (Bscfd) of natural gas and 231,000 barrels per day (bpd) of liquids. The increase comes despite Woodside’s sale of producing oil and gas assets in Greater Angostura in Trinidad and Tobago to Perenco, completed in the quarter. Woodside reported 13,000 barrels of oil production and 242,000 oil-equivalent barrels of pipeline gas in Trinidad and Tobago in the July-September period, down from 93,000 barrels and 2.21 MMboe in the prior quarter respectively. In Australia Woodside produced 34.86 MMboe, up from 32.45 MMboe. Australian LNG and piped gas production totaled 20.9 MMboe and 7.85 MMboe respectively, up from 18.9 MMboe and 7.63 MMboe respectively. Australian crude and condensate production stood at 4.94 MMboe, up from 4.92 MMboe. All of Woodside’s liquefaction facilities – North West Shelf, Pluto and Wheatstone – increased output quarter-on-quarter. Pluto achieved 100 percent reliability. At North West Shelf, Woodside completed planned maintenance offshore at North Rankin and onshore at the Karratha Gas Plant. Woodside increased sales one percent quarter-over-quarter to 55 MMboe, consisting of 2.12 Bscfd of gas and 226,000 bpd of liquids. Revenue totaled $3.36 billion, up from $3.28 billion. While realized prices for LNG and East Coast and international piped gas fell, the average realized price rose two percent, “reflecting higher Dated Brent and West Texas Intermediate”, Woodside said. “We continued safe delivery of Woodside’s major growth projects to schedule and budget”, said chief executive Meg O’Neill. “Strong momentum

Read More »

NNPC Produced 1.4 MMbpd in First 9 Months

Nigerian National Petroleum Co Ltd (NNPC) averaged 1.37 million barrels per day (MMbpd) in crude production in the first three quarters, according to provisional figures published by the state-owned company on Tuesday. September’s oil output of 1.37 MMbpd represented the third consecutive month of decline, according to NNPC’s monthly report. Oil and condensate production totaled 1.61 MMbpd last month, with condensate accounting for 240,000 bpd. NNPC’s peak oil and condensate production in 2025 so far was 1.77 MMbpd. NNPC sold 17.81 million barrels of crude September, down for the second consecutive month. Its natural gas production and sales stood at 6.28 billion standard cubic feet a day (Bscfd) and 3.44 Bscfd in September respectively, both down for the second consecutive month. “Production levels during the period were temporarily moderated due to planned maintenance activities including those at NLNG alongside the phased recovery of previously shut-in assets and delays in the commencement of operations at OMLs 71 and 72”, the report said. NNPC reported a 77 percent petrol availability at its stations. NNPC’s upstream pipeline availability was 96 percent. NNPC logged NGN 4.27 trillion ($2.91 billion) in revenue for September. Profit after tax was NGN 216 billion. The company reported “statutory payments” of NGN 10.07 trillion. On the Ajaokuta-Kaduna-Kano Gas Pipeline project, the report said, “Sustained focus is being directed towards completion of the mainline works with substantial progress being recorded”. NNPC said in a press release July 1 the project was on track for completion by yearend. On the Obiafu-Obrikom-Oben (OB3) Gas Pipeline project, the report said the execution plan was being revised “to ensure delivery within target timelines”. “113km portion of OB3 Gas Pipeline has been commissioned and flowing circa 300 MMscfd of gas”, the report added. In other developments, Shell PLC earlier this month announced a final investment

Read More »

Oil Rises on U.S. Reserve Refill

Oil eked out a gain with Washington planning on buying 1 million barrels of crude for the national stockpile, but held near a five-month low on expectations of a looming global surplus. West Texas Intermediate traded in a more than $1 range before settling near $58 a barrel. Although the US plan to refill the Strategic Petroleum Reserve supported prices, it wasn’t enough to shift sentiment in a market that has declined by more than 10% since late September. WTI futures are on course for a third monthly loss. The amount of crude on tankers at sea has risen to a record high, signaling that a long-anticipated surplus may have started to materialize, while time spreads are starting to signal ample supply. The International Energy Agency expects world oil inventories to exceed demand by almost 4 million barrels a day next year as OPEC+ and some countries outside the alliance ramp up output, likely in a bid to recapture market share. “We’ve got supply growth running three times faster than demand growth,” Bob McNally, founder and president of Rapidan Energy Group, said in an interview on Bloomberg Television. “Near-term we have a glut.” Commodity trading advisers, meanwhile, could potentially reach a maximum-short position in the next few sessions, helping send prices lower, according to data from Bridgeton Research Group. The robot traders are currently 91% short in both Brent and WTI, and could accelerate if futures fall by roughly 1%, the firm added. Traders are also keeping an eye on relations between the US and China, the world’s top producer and consumer of oil. US President Donald Trump again signaled that an expected meeting with counterpart Xi Jinping in South Korea next week might not materialize. The US benchmark crude future’s November expiry on Tuesday also contributed to choppy trading.

Read More »

Energy Department Approves Final Export Authorization for Venture Global CP2 LNG

WASHINGTON — U.S. Secretary of Energy Chris Wright today signed the final export authorization for the Venture Global CP2 LNG Project in Cameron Parish, Louisiana, allowing exports of up to 3.96 billion cubic feet per day of U.S. natural gas as liquefied natural gas (LNG) to non-Free Trade Agreement (FTA) countries. “In less than ten months, President Trump’s administration is redefining what it means to unleash American energy by approving record new LNG exports,” said Kyle Haustveit, Assistant Secretary of the Office of Fossil Energy. “Finalizing the non-FTA authorization for CP2 LNG will enable secure and reliable American energy access for our allies and trading partners, while also providing well-paid jobs and economic opportunities at home.” Today’s authorization follows the Department’s conditional authorization to CP2 LNG in March 2025 and reflects the Federal Energy Regulatory Commission’s May 2025 decision approving the siting, construction, and operation of the facility. It also incorporates DOE’s May 2025 response to comments on the 2024 LNG Export Study, which reaffirmed that U.S. LNG exports strengthen America’s energy leadership, expand opportunities for American workers, and provide our allies with secure access to reliable U.S. energy. On day one, President Trump directed the Energy Department to end the Biden administration’s LNG export pause and to resume the consideration of pending applications to export LNG to countries without a free trade agreement (FTA). Under President Trump’s leadership, DOE has authorized more than 13.8 Bcf/d of LNG exports—greater than the volume exported today by the world’s second-largest LNG supplier. Today, U.S. exports are approximately 15 billion cubic feet per day (Bcf/d), an increase of approximately 25% from 2024 levels.

Read More »

AI gold rush sparks backlash against Core Scientific acquisition

Meanwhile, in a release issued last week, CoreWeave stated, “it has been unequivocal — to Core Scientific and publicly — that we will not modify our offer. Our offer is best and final.” Alvin Nguyen, senior analyst at Forrester Research, said what happens next with the overall data center market “depends on when AI demand slows down (when the AI bubble bursts).” He added, “if AI demand continues, prices continue to go up, and data centers change in terms of preferred locations (cooler climates, access to water, lots of space, more remote), use of microgrids/energy production, expect [major] players to continue to dominate.” However, said Nguyen, “if that slowdown is soon, then prices will drop, and the key players will need to either unload property or hold onto them until AI demand builds back up.” Generational shift occurring Asked what the overall effect of AI will be on CIOs in need of data center capacity, he said, “the new AI mega-factories alter data center placement: you don’t put them near existing communities because they demand too much power, water, land, you build them somewhere remote, and communities will pop up around them.” Smaller data centers, said Nguyen, “will still consume power and water in contention with their neighbors (industrial, commercial, and residential), potential limiting their access or causing costs to rise. CIOs and Network World readers should evaluate the trade offs/ROI of not just competing for data center services, but also for being located near a new data center.”

Read More »

Why cloud and AI projects take longer and how to fix the holdups

No. 2 problem: Unrealistic expectations lead to problematic requirements Early planning and business case validation show that the requirements set for the project can’t be met, which then requires a period of redefinition before real work can start. This situation – reported by 69% of enterprises – leads to an obvious question: Is it the requirements or the project that’s the problem? Enterprises who cite this issue say it’s the former, and that it’s how the requirements are set that’s usually the cause. In the case of the cloud, the problem is that senior management thinks that the cloud is always cheaper, that you can always cut costs by moving to the cloud. This is despite the recent stories on “repatriation,” or moving cloud applications back into the data center. In the case of cloud projects, most enterprise IT organizations now understand how to assess a cloud project for cost/benefit, so most of the cases where impossible cost savings are promised are caught in the planning phase. For AI, both senior management and line department management have high expectations with respect to the technology, and in the latter case may also have some experience with AI in the form of as-a-service generative AI models available online. About a quarter of these proposals quickly run afoul of governance policies because of problems with data security, and half of this group dies at this point. For the remaining proposals, there is a whole set of problems that emerge. Most enterprises admit that they really don’t understand what AI can do, which obviously makes it hard to frame a realistic AI project. The biggest gap identified is between an AI business goal and a specific path leading to it. One CIO calls the projects offered by user organizations as “invitations to AI fishing

Read More »

Riverbed tackles AI data bottleneck with new Oracle-based service

“Customers are looking for faster, more secure ways to move massive datasets so they can bring AI initiatives to life,” said Sachin Menon, Oracle’s vice president of cloud engineering, in a statement. “With Riverbed Data Express Service deployed on OCI, organizations will be able to accelerate time to value, reduce costs, and help ensure that their data remains protected.” Riverbed’s Aras explains that its Data Express Service uses post-quantum cryptography (PQC) to move petabyte-scale datasets through secure VPN tunnels to ensure that customer data remains protected during the transfer process. The technology is based on Riverbed’s SteelHead acceleration platform running RiOS 10 software. “Our cloud-optimized technology design delivers much higher data retrieval, data movement across the network, and data write rates, through highly performant data mover instances, instance parallelization and matched network fabric configurations. The design is tailored for each cloud, to ensure maximal performance can be achieved using cloud-specific product adjustments,” Aras says. “The time for preventing harvest-now, decrypt-later is now,” Aras says, referring to the security threat where encrypted data is intercepted and stored for decryption once quantum computers become powerful enough. The Riverbed service addresses use cases spanning AI model training, inference operations, and emerging agentic AI applications. Data Express is initially deployed on Oracle Cloud Infrastructure, but Riverbed said the service will orchestrate data movement across AWS, Azure, and Google Cloud Platform, as well as on-premises data centers. General availability is planned for Q4 2025.

Read More »

Roundup: Digital Realty Marks Major Milestones in AI, Quantum Computing, Data Center Development

Key features of the DRIL include: • High-Density AI and HPC Testing. The DRIL supports AI and high-performance computing (HPC) workloads with high-density colocation, accommodating workloads up to 150 kW per cabinet. • AI Infrastructure Optimization. The ePlus AI Experience Center lets businesses explore AI-specific power, cooling, and GPU resource requirements in an environment optimized for AI infrastructure. • Hybrid Cloud Validation. With direct cloud connectivity, users can refine hybrid strategies and onboard through cross connects. • AI Workload Orchestration. Customers can orchestrate AI workloads across Digital Realty’s Private AI Exchange (AIPx) for seamless integration and performance. • Latency Testing Across Locations. Enterprises can test latency scenarios for seamless performance across multiple locations and cloud destinations. The firm’s Northern Virginia campus is the primary DRIL location, but companies can also test latency scenarios between there and other remote locations. DRIL rollout to other global locations is already in progress, and London is scheduled to go live in early 2026. Digital Realty, Redeployable Launch Pathway for Veteran Technical Careers As new data centers are created, they need talented workers. To that end, Digital Realty has partnered with Redeployable, an AI-powered career platform for veterans, to expand access to technical careers in the United Kingdom and United States. The collaboration launched a Site Engineer Pathway, now live on the Redeployable platform. It helps veterans explore, prepare for, and transition into roles at Digital Realty. Nearly half of veterans leave their first civilian role within a year, often due to unclear expectations, poor skill translation, and limited support, according to Redeployable. The Site Engineer Pathway uses real-world relevance and replaces vague job descriptions with an experience-based view of technical careers. Veterans can engage in scenario-based “job drops” simulating real facility and system challenges so they can assess their fit for the role before applying. They

Read More »

BlackRock’s $40B data center deal opens a new infrastructure battle for CIOs

Everest Group partner Yugal Joshi said, “CIOs are under significant pressure to clearly define their data center strategy beyond traditional one-off leases. Given most of the capacity is built and delivered by fewer players, CIOs need to prepare for a higher-price market with limited negotiation power.” The numbers bear this out. Global data center costs rose to $217.30 per kilowatt per month in the first quarter of 2025, with major markets seeing increases of 17-18% year-over-year, according to CBRE. Those prices are at levels last seen in 2011-2012, and analysts expect them to remain elevated. Gogia said, “The combination of AI demand, energy scarcity, and environmental regulation has permanently rewritten the economics of running workloads. Prices that once looked extraordinary have now become baseline.” Hyperscalers get first dibs The consolidation problem is compounded by the way capacity is being allocated. North America’s data center vacancy rate fell to 1.6% in the first half of 2025, with Northern Virginia posting just 0.76%, according to CBRE Research. More troubling for enterprises: 74.3% of capacity currently under construction is already preleased, primarily to cloud and AI providers. “The global compute market is no longer governed by open supply and demand,” Gogia said. “It is increasingly shaped by pre-emptive control. Hyperscalers and AI majors are reserving capacity years in advance, often before the first trench for power is dug. This has quietly created a two-tier world: one in which large players guarantee their future and everyone else competes for what remains.” That dynamic forces enterprises into longer planning cycles. “CIOs must forecast their infrastructure requirements with the same precision they apply to financial budgets and talent pipelines,” Gogia said. “The planning horizon must stretch to three or even five years.”

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »