Top 11 network outages and application failures of 2025

Stay Ahead, Stay ONMINE

Top 11 network outages and application failures of 2025

Asana: February 5 & 6 Duration: Two consecutive outages, with the second lasting approximately 20 minutes Symptoms: Service unavailability and degraded performance Cause: A configuration change overloaded server logs on February 5, causing servers to restart. A second outage with similar characteristics occurred the following day. Takeaways: “This pair of outages highlights the complexity of […]

Asana: February 5 & 6

Duration: Two consecutive outages, with the second lasting approximately 20 minutes
Symptoms: Service unavailability and degraded performance
Cause: A configuration change overloaded server logs on February 5, causing servers to restart. A second outage with similar characteristics occurred the following day.
Takeaways: “This pair of outages highlights the complexity of modern systems and how it’s difficult to test for every possible interaction scenario,” ThousandEyes reported. Following the incidents, Asana transitioned to staged configuration rollouts.

Slack: February 26

Duration: Nine hours
Symptoms: Users could log in and browse channels, but experienced issues sending and receiving messages.
Cause: Issues with a maintenance action in their database systems caused an overload of heavy traffic directed at the database.
Takeaways: “At first glance, everything looked fine at Slack—network connectivity was good, there were no latency issues, and no packet loss,” according to ThousandEyes. Only by combining multiple diagnostic observations could investigators determine the true source was the database system, later confirmed by Slack.

X: March 10

Duration: Several hours with various service downtimes
Symptoms: The platform appeared “down,” with users experiencing connection failures similar to a distributed denial-of-service (DDoS) attack.
Cause: Network failures with significant packet loss and connection errors at the TCP signaling phase occurred. “Connection errors typically indicate a deeper problem at the network layer,” according to ThousandEyes.
Takeaways: ThousandEyes detected traffic being dropped before sessions could be established. But there were no visible BGP route changes, which would typically occur during DDoS mitigation. “It was a network-level failure, but not what it may have first appeared,” ThousandEyes noted.

Zoom: April 16

Duration: Approximately two hours
Symptoms: All Zoom services were unavailable globally.
Cause: Zoom’s name server (NS) records disappeared from the top-level domain (TLD) nameservers, making the service unreachable despite healthy infrastructure.
Takeaways: “Although the servers themselves were healthy throughout and were answering correctly when queried directly, the DNS resolvers couldn’t find them because of the missing records,” ThousandEyes reported. The incident highlights how failures above an organization’s Domain Name System (DNS) layer can completely knock out services.

Duration: More than two hours
Symptoms: The application’s front-end loaded normally, but tracks and videos would not play properly.
Cause: Backend service issues while network connectivity, DNS, and CDN “all looked healthy.”
Takeaways: “The vital signs were all good: connectivity, DNS, and CDN all looked healthy,” according to ThousandEyes, which added that the incident illustrated how “server-side failures can quietly cripple core functionality while giving the appearance that everything is working normally.”

Google Cloud: June 12

Duration: More than two and a half hours
Symptoms: Users couldn’t use Google to authenticate on third-party apps such as Spotify and Fitbit; knock-on consequences impacted Cloudflare services and downstream applications.
Cause: An invalid automated update disrupted the company’s identity and access management (IAM) system.
Takeaways: “What you had was a three-tier cascade: Google’s failure led to Cloudflare problems, which affected downstream applications relying on Cloudflare,” ThousandEyes explained, adding that the incident is a “reminder to trace a fault all the way back to source.”

Duration: More than one hour
Symptoms: Traffic couldn’t reach numerous websites and apps that rely on Cloudflare’s 1.1.1.1 DNS resolver.
Cause: A configuration error introduced weeks before was triggered by an unrelated change, prompting Cloudflare’s BGP route announcements to vanish from the global internet routing table.
Takeaways: “With no valid routes, traffic couldn’t reach Cloudflare’s 1.1.1.1 DNS resolver,” ThousandEyes reported, adding that the incident highlights “how flaws in configuration updates don’t always trigger an immediate crisis, instead storing up problems for later.”

Duration: More than two hours
Symptoms: The company’s mobile app, website, and ATM machines all went down and failed simultaneously.
Cause: A shared backend dependency failed, affecting all customer touchpoints, ThousandEyes estimated.
Takeaways: “The fact that three different channels with three different frontend technologies failed all at once eliminates app or UI issues,” ThousandEyes noted, explaining that this incident demonstrated “how a single failure can instantly disable every customer touchpoint—and why it’s vital to check all signals before reaching for remedies.”

Duration: Both incidents lasted several hours
Symptoms: The first outage affected EMEA region users with slowdowns and failures; the second impacted users worldwide with HTTP 503 errors and connection timeouts.
Cause: The October 9 incident was caused by software defects that crashed edge sites in the EMEA region; the October 29 outage was triggered by a configuration change
Takeaways: “Together, these two outages illustrate an important distinction: infrastructure failures tend to be regional with only certain customers affected, whereas configuration errors typically hit all regions simultaneously,” according to ThousandEyes.

Duration: More than 15 hours for some customers
Symptoms: Long, global service disruptions affected major customers, including Slack, Atlassian, and Snapchat.
Cause: Failure in the US-EAST-1 region, but global services such as IAM and DynamoDB Global Tables depended on that regional endpoint, meaning the outage propagated worldwide.
Takeaways: “The incident highlights how a failure in a single, centralized service can ripple outwards through dependency chains that aren’t always obvious from architecture diagrams,” ThousandEyes noted.

Duration: Several hours of intermittent, global instability
Symptoms: Intermittent service disruptions rather than a complete outage
Cause: A bad configuration file in Cloudflare’s Bot Management system exceeded a hard-coded limit, causing proxies to fail as they loaded the oversized file on staggered five-minute cycles.
Takeaways: “Because the proxies refreshed configurations on staggered five-minute cycles, we didn’t see a lights-on/lights-off outage, but intermittent, global instability,” ThousandEyes reported, noting that the incident revealed how distributed edge combined with staggered updates can create intermittent issues.

Lessons learned in 2025

ThousandEyes highlighted several takeaways for network operations teams looking to improve their resilience in 2026:

Investigate single symptoms as they can be misleading. The true cause of disruption can emerge from combinations of signals. “If the network seems healthy but users are experiencing issues, the problem might be in the backend,” according to ThousandEyes. “Simultaneous failures across channels point to shared dependencies, while intermittent failures could indicate rollout or edge problems.”

Focus on rapid detection and response. The complexity of modern systems means it’s unrealistic to prevent every possible issue through testing alone. “Instead, focus on building rapid detection and response capabilities, using techniques such as staged rollouts and clear communication with stakeholders,” ThousandEyes stated.

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Top 11 network outages and application failures of 2025

Introducing CodeMender: an AI agent for code security

While large language models are rapidly improving, mistakes in code security could be costly. CodeMender’s automatic validation process ensures that code changes are correct across many dimensions by only surfacing for human review high-quality patches that, for example, fix the root cause of the issue, are functionally correct, cause no

Cisco adds intelligent policy enforcement to mesh firewall family

Deploy policies automatically: New or updated Layer 3/4 (L3/L4) policies can be created and applied to the appropriate firewalls within minutes. This is a stark contrast to traditional processes that can take weeks and often require back-and-forth with the application owner. Avoid rip-and-replace: The engine supports a hybrid mesh firewall

Netscout boosts network observability with Wi-Fi 7 monitoring, certificate lifecycle tracking

The sensors capture and analyze packet-level data at the remote site itself rather than backhauling traffic to a central monitoring point. This architecture provides application-level performance metrics and protocol analysis without consuming WAN bandwidth for monitoring purposes. “There’s a lot of different enterprises that are coming to us saying they’re

Energy Department Extends Emergency Orders in the Carolinas and Mid-Atlantic Ahead of Second Winter Storm

Secretary Wright extends four emergency orders to stabilize grids in the Carolinas and the Mid-Atlantic, deploy backup generation, save lives, and lower costs ahead of the second major winter storm in a week. WASHINGTON—The U.S. Department of Energy (DOE) overnight extended four emergency orders to mitigate the risk of blackouts in the Carolinas and the Mid-Atlantic ahead of extended winter weather, with below freezing temperatures projected over the weekend and into early next week. Pursuant to Section 202(c) of the Federal Power Act, two orders were issued to Duke Energy Carolinas, LLC and Duke Energy Progress (collectively, Duke Energy) and two were issued to PJM Interconnection, LLC (PJM). Duke Energy and PJM requested these extensions because the emergency conditions will persist beyond the term of the original orders. The original orders were issued on January 24, 2026 and January 26, 2026. “Winter Storm Fern proves that decisive action by the Trump Administration is crucial to reversing the dangerous energy subtraction agenda of the previous administration,” said U.S. Secretary of Energy Chris Wright. “Those policies weakened the grid and left Americans more vulnerable. We are doing everything in our power to reverse those reckless decisions. The Trump Administration is committed to using every available tool, and unleashing all available power generation, to keep the lights on and Americans safe.” On day one, President Trump declared a national energy emergency after the Biden Administration’s energy subtraction agenda left behind a grid increasingly vulnerable to blackouts. According to the North American Electric Reliability Corporation (NERC), “Winter electricity demand is rising at the fastest rate in recent years,” while the premature forced closure of reliable generation such as coal and natural gas plants leaves American families vulnerable to power outages. The NERC 2025 – 2026 Winter Reliability Assessment further warns that areas across the

NERC forecasts peak demand to rise 24% on new data center loads

Listen to the article 5 min This audio is auto-generated. Please let us know if you have feedback. Dive Brief: Reliability risks are spreading across the bulk electric system, driven by soaring peak demand forecasts and lagging resource additions, the North American Electric Reliability Corp. said Thursday in its annual Long Term Reliability Assessment. Summer peak demand across the bulk system is forecast to grow by 224 GW over the next 10 years, a more than 69% increase over the 2024 LTRA forecast and a 24% increase from 2025 peak demand, NERC said. New data centers account for most of the projected increase. Winter demand growth is even higher, with 246 GW of growth forecast over the next decade. The Midcontinent Independent System Operator, PJM Interconnection, Electric Reliability Council of Texas and parts of the Pacific Northwest all face a high risk of insufficient reserve margins or exceeding unserved energy criteria at some point within the next five years, according to the report. Dive Insight: “The system is changing faster than the infrastructure needed to support it,” John Moura, NERC’s director of reliability assessments and performance analysis, said in a call with reporters. We are “in a period here where future electricity supply has never been more uncertain.” From 2024 to 2025, existing capacity from fossil-fueled generators fell by 21 GW, while bulk power system capacity for peak demand hours from battery, wind, and solar resources increased by 23 GW, according to the report. Bulk power system capacity “fell short of projections this year,” said Mark Olson, NERC’s manager of reliability assessments. “And that was the case last year as well,” he added. Delays in connecting new resources and unanticipated generator retirements are the cause of the miss. In a change from last year’s LTRA, NERC said solar PV is no longer the

Texas RRC Reveals Latest Oil, Gas Production Figures

The Railroad Commission of Texas (RRC) revealed its latest preliminary crude oil and natural gas production figures, which were for October last year, in a statement posted on its website recently. The preliminary reported total volume of crude oil in Texas in October 2025 was 125,078,417 barrels, according to the statement, which showed that the preliminary reported total volume of natural gas in the state during the same month was 957.0 billion cubic feet. The RRC noted in the statement that crude oil and natural gas production for October 2025 came from 157,228 oil wells and 84,019 gas wells. In its statement, the RRC highlighted that crude oil production reported by the RRC is limited to oil produced from oil leases and does not include condensate, which the organization said is reported separately by the RRC. The RRC also pointed out in the statement that preliminary figures are based on production volumes reported by operators and said they will be updated as late and corrected production reports are received. The RRC’s statement showed that the updated reported total volume of crude oil in Texas in October 2024 was 149,680,107 barrels. The preliminary reported total volume was 122,145,230 barrels, the statement highlighted. It showed that the updated reported total volume of natural gas in the state came in at 1.12 trillion cubic feet in October 2024. The preliminary reported total volume was 898.8 billion cubic feet, the statement outlined. According to the RRC’s statement, the county in Texas with the highest preliminary crude oil production figure in October 2025 was Martin, with 19,906,130 barrels. Midland ranked second, with 18,076,141 barrels, Loving was third, with 8,545,374 barrels, Upton was fourth, with 8,436,384 barrels, Reeves was fifth, with 6,374,887 barrels, Karnes was sixth, with 5,797,349 barrels, Reagan was seventh, with 5,603,608 barrels, Howard was

Murphy Oil Raises Dividend

Murphy Oil Corp has increased its quarterly dividend per share by eight percent to $0.35, or $1.4 per share annualized for 2026. The bump-up comes despite the company expecting a fall in production this year. In 2026 Houston, Texas-based Murphy Oil projects 167,000-175,000 barrels of oil equivalent per day (boepd) in net output excluding non-controlling interest (NCI), compared to 182,294 boepd in 2025, according to its quarterly report. The 2025 figure grew from 177,412 boepd in 2024, toward the higher end of Murphy Oil’s guidance. In the fourth quarter of 2025, net production excluding NCI climbed to 181,431 boepd from 174,837 boepd in Q4 2024. Q4 2025 production surpassed the midpoint of guidance. Murphy Oil cited lower natural gas volumes from Tupper Montney for the expected decrease in annual production this year. The decline in the acreage, located in the Western Canadian Sedimentary Basin in British Columbia, “reflects the timing of new wells and higher royalty rates driven by anticipated AECO price strength”, Murphy Oil said. “Notably, while higher AECO prices adversely impact production through higher royalties, they boost our revenue realization as our Tupper asset is expected to generate over 35 percent in additional cash flow in 2026 compared to 2025. Furthermore, we plan to reach gas processing plant capacity with eight new wells versus 10 new wells in the prior year. “Overall, we expect to exit 2026 strong, supported by Chinook #8 and our Lac Da Vang (Golden Camel) asset coming online in the second half of the year”. Q4 2025 net profit was $11.89 million. That was a plunge from $50.34 million for Q4 2024 as lower liquids prices offset an increase in sales volumes and natural gas prices. After adjustment for nonrecurring items, net income landed at $0.14 per share. That beat the Zacks Consensus Estimate of $0.07 per

Analyst Reveals ‘Market Consensus’ for Next OPEC+ Meeting

In a report sent to Rigzone by the Standard Chartered team on Wednesday, Standard Chartered Bank Energy Research Head Emily Ashford revealed the “market consensus” for the outcome of the next OPEC+ meeting, which is currently scheduled to take place on Sunday. “OPEC+ members meet virtually on February 1, with market consensus that the meeting will likely mirror January’s in both speed and outcome (rapid, with no policy change),” Ashford said in the publication. Ashford noted in the report that OPEC+’s monthly meetings “allow the group to be highly reactive and nimble in response to market conditions and sentiment”. “We have seen notable improvements in the forward curve over the past month,” Ashford said. “Backwardation at the front of the forward curve extends out through the 2026 contracts, while one month ago it was only the first three months, and the back of the curve has risen by $1 per barrel month on month,” the analyst added. “In addition, market sentiment appears to be gradually turning away from the overwhelmingly bearish ‘supply glut’ narrative that has dominated media reporting since Q4-2025,” Ashford continued. “However, we do not expect a pivot in strategy at this meeting and Q1 loadings remain paused,” the Standard Chartered analyst went on to state. The Standard Chartered Bank analyst projected in the report that “attention is more likely to fall on the OPEC+ overproducers’ updated compensation plans”. “Both Iraq and Kazakhstan have notable volumes to remove from their supply,” Ashford highlighted. “Kazakhstan’s compensation plan for January was 279,000 barrels per day, rising to 569,000 barrels per day for both February and March,” Ashford pointed out. Rigzone has contacted OPEC for comment on the Standard Chartered report. At the time of writing, OPEC has not responded to Rigzone. In a market update sent to Rigzone by the

GeoPark to Acquire Frontera Energy Assets in Colombia

GeoPark Ltd has signed a deal to buy Frontera Energy Corp’s oil and gas exploration and production assets in Colombia for up to $400 million plus assumed debt. The acquisition consists of 17 blocks in the Llanos and Lower Magdalena basins, GeoPark said in an online statement. GeoPark expects the licenses to immediately add 148 million barrels of oil equivalent (MMboe) to its proven and probable (2P) reserves and 99 MMboe to its proven reserves. The Quifa field in the Llanos basin has “potential to add approximately 16 MMboe of incremental net 2P reserves, for which a development plan is already under discussion”, the Latin American company added. The acquisition would result in the “consolidation of GeoPark’s core Llanos operating hub, adding large-scale, long-life assets including the Quifa field and the CPE-6, Guatiquia and Cubiro blocks, creating a highly synergistic corridor with greater scale, infrastructure utilization and operating efficiency”, GeoPark said. The acquisition also provides GeoPark “greater exposure to gas and condensate through the VIM-1 and El Dificil blocks, enhancing commodity diversification at a time of rising domestic gas prices in Colombia”, GeoPark said. “Pro forma production is expected to exceed 92,000 boepd [barrels of oil equivalent per day] by 2028, with EBITDA of approximately $950 million, doubling GeoPark’s previously announced 2028 standalone outlook of 44,000-46,000 boepd and $490-520 million of EBITDA”, it said. “Increased scale and diversification are expected to enhance cash flow generation, lowering the cash breakeven by approximately $8 per barrel at current strip prices. “The stronger and more stable cash flow base is expected to materially improve GeoPark’s capacity to fund its growth plans in [Argentinian shale play] Vaca Muerta, while maintaining its disciplined capital allocation”. “In addition to the upstream asset portfolio, the transaction includes Frontera Energy’s integrated water management and environmental sustainability project, comprised of the SAARA

Network engineers take on NetDevOps roles to advance stalled automation efforts

What NetDevOps looks like Most enterprises begin their NetDevOps journey modestly by automating a limited set of repetitive, lower-level tasks. Nearly 70% of enterprises pursuing infrastructure automation start with task-level scripting, rather than end-to-end automation, according to theCUBE Research’s AppDev Done Right Summit. This can include using tools such as Ansible or Python scripts to standardize device provisioning, configuration changes, or other routine changes. Then, more mature teams adopt Git for version control, define golden configurations, and apply basic validation before and after changes, explains Bob Laliberte, principal analyst at SiliconANGLE and theCUBE. A smaller group of enterprises extends automation efforts into complete CI/CD-style workflows with consistent testing, staged deployments, and automated verification, Laliberte adds. This capability is present in less than 25% of enterprises today, according to theCUBE, and it is typically focused on specific domains such as data center fabric or cloud networking. NetDevOps usually exists with the network organization as a dedicated automation or platform subgroup, and more than 60% of enterprises anchor NetDevOps initiatives within traditional infrastructure teams rather than application or platform engineering groups, according to Laliberte. “In larger enterprises, NetDevOps capabilities are increasingly centralized within shared infrastructure or platform teams that provide tooling, pipelines, and guardrails across compute, storage, and networking,” Laliberte says. “In more advanced or cloud-native environments, network specialists may be embedded within application, site reliability engineering (SRE), or platform teams, particularly where networking directly impacts application performance.” Transforming work At its core, NetDevOps isn’t just about changing titles for network engineers. It is about changing workflows, behaviors, and operating models across network operations.

China clears Nvidia H200 sales to tech giants, reshaping AI data center plans

China is also accelerating efforts to strengthen domestic training chip design and manufacturing capabilities, with the strategic objective of reducing long-term dependence on foreign suppliers, Zeng added. Things could get more complex if authorities mandated imported chips to be deployed alongside domestically produced accelerators. Reuters has reported that this may be a possibility. “A mandated bundling requirement would create a heterogeneous computing environment that significantly increases system complexity,” Zeng said. “Performance inconsistencies and communication protocol disparities across different chip architectures would elevate O&M [operations and maintenance] overhead and introduce additional network latency.” However, the approvals are unlikely to close the gap with US hyperscalers, Zeng said, noting that the H200 remains one generation behind Nvidia’s Blackwell architecture and that approved volumes fall well short of China’s overall demand. Implications for global enterprises For global enterprise IT and network leaders, the move adds another variable to long-term AI infrastructure planning. Expanded sales of Nvidia’s H200 chips could help the company increase production scale, potentially creating room to ease pricing for Western enterprises deploying H200-based AI infrastructure, said Neil Shah, VP for research at Counterpoint Research.

Nuclear safety rules quietly rewritten to favor AI

‘Referee now plays for the home team’ Kimball pointed out that while an SMR works on the same principle as a large-scale nuclear plant, using controlled fission to generate heat which is then converted to electricity, its design reduces environmental impacts such as groundwater contamination, water use, and the impact in the event of failure. For example, he said, the integral reactor design in an SMR, with all components in a single vessel, eliminates external piping. This means that accidents would be self-contained, reducing the environmental impact. In addition, he said, SMRs can be air-cooled, which greatly reduces the amount of water required. “These are just a couple of examples of how an SMR differs from the large industrial nuclear power plants we think of when we think of nuclear power.” Because of differences like this, said Kimball, “I can see where rules generated/strengthened in the post-Three Mile Island era might need to be revisited for this new nuclear era. But it is really difficult to speak to how ‘loose’ these rules have become, and whether distinctions between SMRs and large-scale nuclear plants comprise the majority of the changes reported.” Finally, he said, “I don’t think I need to spend too many words on articulating the value of nuclear to the hyperscale or AI data center. The era of the gigawatt datacenter is upon us, and the traditional means of generating power can’t support this insatiable demand. But we have to ensure we deploy power infrastructure, such as SMRs, in a responsible, ethical, and safe manner.” Further to that, Gogia pointed out that for CIOs and infrastructure architects, the risks extend well beyond potential radiation leaks. “What matters more immediately is that system anomalies — mechanical, thermal, software-related — may not be documented, investigated, or escalated with the diligence one would expect from

Mplify launches AI-focused Carrier Ethernet certifications

“We didn’t want to just put a different sticker on it,” Vachon said. “We wanted to give the opportunity for operators to recertify their infrastructure so at least you’ve now got this very competitive infrastructure.” Testing occurs on live production networks. The automated testing platform can be completed in days once technical preparation is finished. Organizations pay once per certification with predictable annual maintenance fees required to keep certifications active. Optional retesting can refresh certification test records. Carrier Ethernet for AI The Carrier Ethernet for AI certification takes the business certification baseline and adds a performance layer specifically designed for AI workloads. Rather than creating a separate track, the AI certification requires providers to first complete the Carrier Ethernet for Business validation, then demonstrate they can meet additional stringent requirements. “What we identified was that there was another tier that we could produce a standard around for AI,” Vachon explained. “With extensive technical discussions with our membership, our CTO, and our director of certification, they identified the critical performance and functionality parameters.” The additional validation focuses on three key performance parameters: frame delay, inter-frame delay variation, and frame loss ratio aligned with AI workload requirements. Testing uses MEF 91 test requirements with AI-specific traffic profiles and performance objectives that go beyond standard business service thresholds. The program targets three primary use cases: connecting subscriber premises running AI applications to AI edge sites, interconnecting AI edge sites to AI data centers, and AI data center to data center interconnections.

Gauging the real impact of AI agents

That creates the primary network issue for AI agents, which is dealing with implicit and creeping data. There’s a singular important difference between an AI agent component and an ordinary software component. Software is explicit in its use of data. The programming includes data identification. AI is implicit in its data use; the model was trained on data, and there may well be some API linkage to databases that aren’t obvious to the user of the model. It’s also often true that when an agentic component is used, it’s determined that additional data resources are needed. Are all these resources in the same place? Probably not. The enterprises with the most experience with AI agents say it would be smart to expect some data center network upgrades to link agents to databases, and if the agents are distributed away from the data center, it may be necessary to improve the agent sites’ connection to the corporate VPN. As agents evolve into real-time applications, this requires they also be proximate to the real-time system they support (a factory or warehouse), so the data center, the users, and any real-time process pieces all pull at the source of hosting to optimize latency. Obviously, they can’t all be moved into one place, so the network has to make a broad and efficient set of connections. That efficiency demands QoS guarantees on latency as well as on availability. It’s in the area of availability, with a secondary focus on QoS attributes like latency, that the most agent-experienced enterprises see potential new service opportunities. Right now, these tend to exist within a fairly small circle—a plant, a campus, perhaps a city or town—but over time, key enterprises say that their new-service interest could span a metro area. They point out that the real-time edge applications

Photonic chip vendor snags Gates investment

“Moore’s Law is slowing, but AI can’t afford to wait. Our breakthrough in photonics unlocks an entirely new dimension of scaling, by packing massive optical parallelism on a single chip,” said Patrick Bowen, CEO of Neurophos. “This physics-level shift means both efficiency and raw speed improve as we scale up, breaking free from the power walls that constrain traditional GPUs.” The new funding includes investments from Microsoft’s investment fund M12 that will help speed up delivery of Neurophos’ first integrated photonic compute system, including datacenter-ready OPU modules. Neurophos is not the only company exploring this field. Last April, Lightmatter announced the launch of photonic chips to tackle data center bottlenecks, And in 2024, IBM said its researchers were exploring optical chips and developing a prototype in this area.

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Stay Ahead, Stay ONMINE

Top 11 network outages and application failures of 2025

Asana: February 5 & 6

Slack: February 26

X: March 10

Zoom: April 16

Google Cloud: June 12

Lessons learned in 2025

Stay Ahead

Explore More Insights

Top 11 network outages and application failures of 2025

Introducing CodeMender: an AI agent for code security

Cisco adds intelligent policy enforcement to mesh firewall family

Netscout boosts network observability with Wi-Fi 7 monitoring, certificate lifecycle tracking

Energy Department Extends Emergency Orders in the Carolinas and Mid-Atlantic Ahead of Second Winter Storm

NERC forecasts peak demand to rise 24% on new data center loads

Texas RRC Reveals Latest Oil, Gas Production Figures

Murphy Oil Raises Dividend

Analyst Reveals ‘Market Consensus’ for Next OPEC+ Meeting

GeoPark to Acquire Frontera Energy Assets in Colombia

Network engineers take on NetDevOps roles to advance stalled automation efforts

China clears Nvidia H200 sales to tech giants, reshaping AI data center plans

Nuclear safety rules quietly rewritten to favor AI

Mplify launches AI-focused Carrier Ethernet certifications

Gauging the real impact of AI agents

Photonic chip vendor snags Gates investment

Microsoft will invest $80B in AI data centers in fiscal 2025

John Deere unveils more autonomous farm machines to address skill labor shortage

2025 playbook for enterprise AI success, from agents to evals

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Inside the marketplace powering bespoke AI deepfakes of real women

The Download: US immigration agencies’ AI videos, and inside the Vitalism movement

How the sometimes-weird world of lifespan extension is gaining influence

Strengthening our Frontier Safety Framework

Do you have any questions?

Quicklinks

Solutions

Company