How to run an LLM on your laptop

Stay Ahead, Stay ONMINE

How to run an LLM on your laptop

MIT Technology Review’s How To series helps you get things done. Simon Willison has a plan for the end of the world. It’s a USB stick, onto which he has loaded a couple of his favorite open-weight LLMs—models that have been shared publicly by their creators and that can, in principle, be downloaded and run with local hardware. If human civilization should ever collapse, Willison plans to use all the knowledge encoded in their billions of parameters for help. “It’s like having a weird, condensed, faulty version of Wikipedia, so I can help reboot society with the help of my little USB stick,” he says. But you don’t need to be planning for the end of the world to want to run an LLM on your own device. Willison, who writes a popular blog about local LLMs and software development, has plenty of compatriots: r/LocalLLaMA, a subreddit devoted to running LLMs on your own hardware, has half a million members. For people who are concerned about privacy, want to break free from the control of the big LLM companies, or just enjoy tinkering, local models offer a compelling alternative to ChatGPT and its web-based peers. The local LLM world used to have a high barrier to entry: In the early days, it was impossible to run anything useful without investing in pricey GPUs. But researchers have had so much success in shrinking down and speeding up models that anyone with a laptop, or even a smartphone, can now get in on the action. “A couple of years ago, I’d have said personal computers are not powerful enough to run the good models. You need a $50,000 server rack to run them,” Willison says. “And I kept on being proved wrong time and time again.” Why you might want to download your own LLM Getting into local models takes a bit more effort than, say, navigating to ChatGPT’s online interface. But the very accessibility of a tool like ChatGPT comes with a cost. “It’s the classic adage: If something’s free, you’re the product,” says Elizabeth Seger, the director of digital policy at Demos, a London-based think tank. OpenAI, which offers both paid and free tiers, trains its models on users’ chats by default. It’s not too difficult to opt out of this training, and it also used to be possible to remove your chat data from OpenAI’s systems entirely, until a recent legal decision in the New York Times’ ongoing lawsuit against OpenAI required the company to maintain all user conversations with ChatGPT. Google, which has access to a wealth of data about its users, also trains its models on both free and paid users’ interactions with Gemini, and the only way to opt out of that training is to set your chat history to delete automatically—which means that you also lose access to your previous conversations. In general, Anthropic does not train its models using user conversations, but it will train on conversations that have been “flagged for Trust & Safety review.” Training may present particular privacy risks because of the ways that models internalize, and often recapitulate, their training data. Many people trust LLMs with deeply personal conversations—but if models are trained on that data, those conversations might not be nearly as private as users think, according to some experts. “Some of your personal stories may be cooked into some of the models, and eventually be spit out in bits and bytes somewhere to other people,” says Giada Pistilli, principal ethicist at the company Hugging Face, which runs a huge library of freely downloadable LLMs and other AI resources. For Pistilli, opting for local models as opposed to online chatbots has implications beyond privacy. “Technology means power,” she says. “And so who[ever] owns the technology also owns the power.” States, organizations, and even individuals might be motivated to disrupt the concentration of AI power in the hands of just a few companies by running their own local models. Breaking away from the big AI companies also means having more control over your LLM experience. Online LLMs are constantly shifting under users’ feet: Back in April, ChatGPT suddenly started sucking up to users far more than it had previously, and just last week Grok started calling itself MechaHitler on X. Providers tweak their models with little warning, and while those tweaks might sometimes improve model performance, they can also cause undesirable behaviors. Local LLMs may have their quirks, but at least they are consistent. The only person who can change your local model is you. Of course, any model that can fit on a personal computer is going to be less powerful than the premier online offerings from the major AI companies. But there’s a benefit to working with weaker models—they can inoculate you against the more pernicious limitations of their larger peers. Small models may, for example, hallucinate more frequently and more obviously than Claude, GPT, and Gemini, and seeing those hallucinations can help you build up an awareness of how and when the larger models might also lie. “Running local models is actually a really good exercise for developing that broader intuition for what these things can do,” Willison says. How to get started Local LLMs aren’t just for proficient coders. If you’re comfortable using your computer’s command-line interface, which allows you to browse files and run apps using text prompts, Ollama is a great option. Once you’ve installed the software, you can download and run any of the hundreds of models they offer with a single command. If you don’t want to touch anything that even looks like code, you might opt for LM Studio, a user-friendly app that takes a lot of the guesswork out of running local LLMs. You can browse models from Hugging Face from right within the app, which provides plenty of information to help you make the right choice. Some popular and widely used models are tagged as “Staff Picks,” and every model is labeled according to whether it can be run entirely on your machine’s speedy GPU, needs to be shared between your GPU and slower CPU, or is too big to fit onto your device at all. Once you’ve chosen a model, you can download it, load it up, and start interacting with it using the app’s chat interface. As you experiment with different models, you’ll start to get a feel for what your machine can handle. According to Willison, every billion model parameters require about one GB of RAM to run, and I found that approximation to be accurate: My own 16 GB laptop managed to run Alibaba’s Qwen3 14B as long as I quit almost every other app. If you run into issues with speed or usability, you can always go smaller—I got reasonable responses from Qwen3 8B as well. And if you go really small, you can even run models on your cell phone. My beat-up iPhone 12 was able to run Meta’s Llama 3.2 1B using an app called LLM Farm. It’s not a particularly good model—it very quickly goes off into bizarre tangents and hallucinates constantly—but trying to coax something so chaotic toward usability can be entertaining. If I’m ever on a plane sans Wi-Fi and desperate for a probably false answer to a trivia question, I now know where to look. Some of the models that I was able to run on my laptop were effective enough that I can imagine using them in my journalistic work. And while I don’t think I’ll depend on phone-based models for anything anytime soon, I really did enjoy playing around with them. “I think most people probably don’t need to do this, and that’s fine,” Willison says. “But for the people who want to do this, it’s so much fun.”

MIT Technology Review’s How To series helps you get things done.

Simon Willison has a plan for the end of the world. It’s a USB stick, onto which he has loaded a couple of his favorite open-weight LLMs—models that have been shared publicly by their creators and that can, in principle, be downloaded and run with local hardware. If human civilization should ever collapse, Willison plans to use all the knowledge encoded in their billions of parameters for help. “It’s like having a weird, condensed, faulty version of Wikipedia, so I can help reboot society with the help of my little USB stick,” he says.

But you don’t need to be planning for the end of the world to want to run an LLM on your own device. Willison, who writes a popular blog about local LLMs and software development, has plenty of compatriots: r/LocalLLaMA, a subreddit devoted to running LLMs on your own hardware, has half a million members.

For people who are concerned about privacy, want to break free from the control of the big LLM companies, or just enjoy tinkering, local models offer a compelling alternative to ChatGPT and its web-based peers.

The local LLM world used to have a high barrier to entry: In the early days, it was impossible to run anything useful without investing in pricey GPUs. But researchers have had so much success in shrinking down and speeding up models that anyone with a laptop, or even a smartphone, can now get in on the action. “A couple of years ago, I’d have said personal computers are not powerful enough to run the good models. You need a $50,000 server rack to run them,” Willison says. “And I kept on being proved wrong time and time again.”

Why you might want to download your own LLM

Getting into local models takes a bit more effort than, say, navigating to ChatGPT’s online interface. But the very accessibility of a tool like ChatGPT comes with a cost. “It’s the classic adage: If something’s free, you’re the product,” says Elizabeth Seger, the director of digital policy at Demos, a London-based think tank.

OpenAI, which offers both paid and free tiers, trains its models on users’ chats by default. It’s not too difficult to opt out of this training, and it also used to be possible to remove your chat data from OpenAI’s systems entirely, until a recent legal decision in the New York Times’ ongoing lawsuit against OpenAI required the company to maintain all user conversations with ChatGPT.

Google, which has access to a wealth of data about its users, also trains its models on both free and paid users’ interactions with Gemini, and the only way to opt out of that training is to set your chat history to delete automatically—which means that you also lose access to your previous conversations. In general, Anthropic does not train its models using user conversations, but it will train on conversations that have been “flagged for Trust & Safety review.”

Training may present particular privacy risks because of the ways that models internalize, and often recapitulate, their training data. Many people trust LLMs with deeply personal conversations—but if models are trained on that data, those conversations might not be nearly as private as users think, according to some experts.

“Some of your personal stories may be cooked into some of the models, and eventually be spit out in bits and bytes somewhere to other people,” says Giada Pistilli, principal ethicist at the company Hugging Face, which runs a huge library of freely downloadable LLMs and other AI resources.

For Pistilli, opting for local models as opposed to online chatbots has implications beyond privacy. “Technology means power,” she says. “And so who[ever] owns the technology also owns the power.” States, organizations, and even individuals might be motivated to disrupt the concentration of AI power in the hands of just a few companies by running their own local models.

Breaking away from the big AI companies also means having more control over your LLM experience. Online LLMs are constantly shifting under users’ feet: Back in April, ChatGPT suddenly started sucking up to users far more than it had previously, and just last week Grok started calling itself MechaHitler on X.

Providers tweak their models with little warning, and while those tweaks might sometimes improve model performance, they can also cause undesirable behaviors. Local LLMs may have their quirks, but at least they are consistent. The only person who can change your local model is you.

Of course, any model that can fit on a personal computer is going to be less powerful than the premier online offerings from the major AI companies. But there’s a benefit to working with weaker models—they can inoculate you against the more pernicious limitations of their larger peers. Small models may, for example, hallucinate more frequently and more obviously than Claude, GPT, and Gemini, and seeing those hallucinations can help you build up an awareness of how and when the larger models might also lie.

“Running local models is actually a really good exercise for developing that broader intuition for what these things can do,” Willison says.

How to get started

Local LLMs aren’t just for proficient coders. If you’re comfortable using your computer’s command-line interface, which allows you to browse files and run apps using text prompts, Ollama is a great option. Once you’ve installed the software, you can download and run any of the hundreds of models they offer with a single command.

If you don’t want to touch anything that even looks like code, you might opt for LM Studio, a user-friendly app that takes a lot of the guesswork out of running local LLMs. You can browse models from Hugging Face from right within the app, which provides plenty of information to help you make the right choice. Some popular and widely used models are tagged as “Staff Picks,” and every model is labeled according to whether it can be run entirely on your machine’s speedy GPU, needs to be shared between your GPU and slower CPU, or is too big to fit onto your device at all. Once you’ve chosen a model, you can download it, load it up, and start interacting with it using the app’s chat interface.

As you experiment with different models, you’ll start to get a feel for what your machine can handle. According to Willison, every billion model parameters require about one GB of RAM to run, and I found that approximation to be accurate: My own 16 GB laptop managed to run Alibaba’s Qwen3 14B as long as I quit almost every other app. If you run into issues with speed or usability, you can always go smaller—I got reasonable responses from Qwen3 8B as well.

And if you go really small, you can even run models on your cell phone. My beat-up iPhone 12 was able to run Meta’s Llama 3.2 1B using an app called LLM Farm. It’s not a particularly good model—it very quickly goes off into bizarre tangents and hallucinates constantly—but trying to coax something so chaotic toward usability can be entertaining. If I’m ever on a plane sans Wi-Fi and desperate for a probably false answer to a trivia question, I now know where to look.

Some of the models that I was able to run on my laptop were effective enough that I can imagine using them in my journalistic work. And while I don’t think I’ll depend on phone-based models for anything anytime soon, I really did enjoy playing around with them. “I think most people probably don’t need to do this, and that’s fine,” Willison says. “But for the people who want to do this, it’s so much fun.”

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Kyndryl service aims to control agentic AI across the enterprise

Kyndryl has launched a new service aimed at helping customers manage the growing use of AI agents across the enterprise. Its Agentic AI Framework is an orchestration platform built to deploy and manage autonomous, self-learning agents across business workflows in on-prem, cloud, or hybrid IT environments, according to the company.

Why enterprises need to drive telecom standards

Cutting access costs by supporting VPN-over-FWA or standardizing SD-WAN interconnects could save enterprises as much as a quarter of their VPN costs, but neither is provided in 5G or assured in 6G. Enterprises could change that if they applied appropriate pressure. Reason No. 3: Satellite, private mobile, public mobile, and

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

This is a linkpost for https://bit.ly/cot-monitorability-fragile

Broadcom scales up Ethernet with Tomahawk Ultra for low latency HPC and AI

Broadcom Support for minimum packet size allows streaming of those packets at full bandwidth. That capability is essential for efficient communication in scientific and computational workloads. It is particularly important for scale-up networks where GPU-to-switch-to-GPU communication happens in a single hop. Lossless Ethernet gets an ‘Ultra’ boost Another specific area

Petrobras Eyes Retail Return to Hold Down Pump Prices

Petrobras is considering a return to retail fuel sales after President Luiz Inacio Lula da Silva and the state-controlled oil company’s top executive complained about high pump prices. Four years after exiting the business now known as Vibra Energia SA, Petrobras’ board of directors will meet this week to discuss amending the company’s strategic plan to include a presence in the retail sector, according to a person familiar with the matter who asked not to be named discussing private matters. It’s unclear if such a move would involve trying to fully re-nationalize Vibra or buying a stake in the convenience-store operator and distributor of cooking fuels and other petroleum products. The proposal to be discussed for the 2026-2030 strategic plan would position Petrobras as a diversified and integrated energy company, the person said. Vibra was privatized during the Jair Bolsonaro administration. Petrobras’ media-relations office declined to comment. Lula has complained that wholesale price cuts by Petrobras for gasoline, diesel and other products haven’t flowed through to consumers at the retail level. He has blamed both filling stations and state-level taxes for the disparities. “It’s not possible for Petrobras to announce such a huge discount on diesel and for this discount not to reach the consumer,” Lula said earlier this month while announcing refinery investments. “Even when Petrobras cuts back, many gas stations don’t.” Lula has also said privatization has created multiple layers in the distribution system that result in higher prices for consumers. “Petrobras currently releases a 13-kilogram gas cylinder for 37 reais and it gets at a poor person’s house for 140 reais,” Lula said at the early July event. State control of retail outlets would allow more efficient delivery of the fuel, he added. Petrobras Chief Executive Officer Magda Chambriard has also expressed concern that filling stations aren’t

Quantum Capital Says Oil in Mid $60s Is Profit Red Zone

Activity is slowing in US oil fields as drillers remain in the crude-price danger zone for profits, according to one of the biggest investors of private operators in the shale patch. “In the mid-$60s, you get dangerously close to where oil prices don’t really drive appropriate returns for new drilling,” Dwight Scott, who joined Quantum Capital Group at the start of this month as executive vice chairman, said on Bloomberg TV Wednesday. “So, activity in the oil field is slowing; I think that’s a temporary thing.” West Texas Intermediate, the US benchmark, has fallen 8% since the start of this year, trading at $65.82 a barrel on Wednesday. Scott, who helped build Blackstone Inc.’s credit arm into a $330 billion business, said while uncertainty around tariffs has contributed to reduced drilling activity, he expects the US “will continue to be a leader in oil and gas.” WHAT DO YOU THINK? Generated by readers, the comments included herein do not reflect the views and opinions of Rigzone. All comments are subject to editorial review. Off-topic, inappropriate or insulting comments will be removed.

Trump taps Project 2025 contributor to fill vacant FERC seat

The White House on Wednesday named David LaCerte, an official in the U.S. Office of Personnel Management, to fill a vacant seat at the Federal Energy Regulatory Commission. LaCerte has served as the principal White House liaison and senior advisor to the director of the OPM since January, according to his LinkedIn page. He worked at OPM during the first Trump administration. The office is the chief human resources agency and personnel policy manager for the federal government. When he joined the OPM, LaCerte was set to help craft policy on workforce relations, collective bargaining and employee accountability, according to his former law firm in New Orleans, Sternberg, Naccari & White. LaCerte contributed to Project 2025, a presidential transition effort organized by the conservative Heritage Foundation that includes The Mandate for Leadership, a road map to “deconstruct the Administrative State.” LaCerte also worked as acting managing director at the U.S. Chemical Safety and Hazard Investigation Board starting at the end of President Donald Trump’s first term. LaCerte was a special counsel at the Baker Botts law firm for two years, starting in January 2023. While there, he worked on energy litigation and environmental, safety and incident response issues. FERC regulates natural gas infrastructure, wholesale electricity and gas markets, hydroelectric projects and interstate electric transmission. If confirmed by the Senate, LaCerte would serve for the remainder of former FERC Chairman Willie Phillips’ term, which expires June 30, 2026, according to the White House. LaCerte will likely move through the Senate confirmation process with Laura Swett, an energy attorney at Vinson & Elkins who Trump nominated for a FERC seat on June 2. Swett would assume the seat held by FERC Chairman Mark Christie. It is unclear how quickly the Senate will be able to act on the nominations. If confirmed, FERC

Utilities may speed renewable projects under new tax credit timeline: Jefferies

Dive Brief: Utilities are set to accelerate the development of their renewable energy projects in order to qualify for Inflation Reduction Act tax credits within the new one-year safe harbor period set by the Republican megabill that passed earlier this month, according to a July 10 report from investment bank Jefferies. Jefferies anticipates utilities “with renewables-heavy plans” – like Xcel Energy, WEC Energy Group, CMS Energy, and Ameren – “to accelerate projects originally slated for 2030–31 into 2027–28 … While affordability concerns linger, we believe investors are too focused on potential capital pullbacks and not enough on who’s actually accelerating spend.” “The provisions of the new law provide a sufficient path for us to continue delivering new, affordable, clean energy to our customers through the end of the decade,” said Theo Keith, a senior media relations representative at Xcel Energy. “Our well-established planning process ensures we can manage policy changes while working to meet our states’ energy goals and keeping bills as low as possible for our customers.” Dive Insight: “Meeting the unprecedented demand for energy in the U.S. to support our growing economy will require a wide range of energy sources and strengthened infrastructure,” Keith said. “While we supported a longer-term phase-down of the wind and solar tax credits, we recognize that budgets require compromise … we remain focused on an ‘all-of-the-above’ approach for the energy we provide.” The Republican budget megabill, which President Donald Trump signed into law July 4, stipulates that wind and solar projects must start construction within a year of the law’s enactment to qualify for the IRA’s clean electricity production and investment tax credits, or be subjected to an end of 2027 “placed in service” deadline to be eligible. As part of a reported deal with the Freedom Caucus, Trump also issued an executive order

Trump unveils $92B in energy and AI investments for Pennsylvania

President Donald Trump made an appearance at the Pennsylvania Energy and Innovation Summit on Tuesday and announced that companies including Google, Blackstone and FirstEnergy plan to make $92 billion in energy and AI investments in the state. Blackstone will be building and operating “new natural gas-based, combined-cycle generation stations” in a joint venture with PPL Corp “to power data centers under long-term energy services agreements with regulated-like risk profiles that do not expose the companies to merchant energy and capacity price volatility,” said Edison Electric Institute in a release. FirstEnergy Chair, President and CEO Brian Tierney announced at the summit that his utility plans to invest more than $28 billion “systemwide to modernize local distribution systems and strengthen the transmission network. In Pennsylvania, that includes spending $15 billion in the infrastructure enhancements, people, processes, and facilities needed to deliver safe, reliable power.” Thar Casey, CEO of AmberSemi, a developer of power management technologies, including a power conversion solution for AI data centers, attended the summit and said his “first impression from talking to Pennsylvanians is that they’re excited about getting that kind of attention.” “It’s fantastic for the state; it tells me that [Sen. Dave McCormick, R-Pa.] is doing his job,” Casey said. However, he added, he doesn’t only see the announcements as a plus for Pennsylvania, but the U.S. in general. “I had a chance to talk to some very key influential people and speak with them about the efficiency aspect of things, in addition to the focus that they have on power,” he said. “You see it in their eyes when you bring up efficiency — it’s a subject that they’re focused on.” Trump’s announcement was criticized by environmental groups like Evergreen Action, which issued a release saying the president had “unveiled a plan to double down on expensive fossil fuels” after

With Blackstone venture, PPL emerges ‘biggest winner’ from data center summit

Dive Brief: Utility company PPL Corp. is the “biggest winner” from the Pennsylvania Energy and Innovation Summit in Pittsburgh this week with its joint venture with Blackstone Infrastructure to build gas-fired power plants to serve data centers in Pennsylvania and across the PJM Interconnection, Jefferies analysts said Wednesday. PPL and Blackstone are negotiating with multiple potential hyperscale counterparties, according to the analysts, who noted that any new power plants would be operating by 2030 at the earliest. “The joint venture is actively engaged with landowners, natural gas pipeline companies and turbine manufacturers, and has secured multiple land parcels to enable this new generation buildout; however, no [energy services agreements] with hyperscalers have been signed to date,” PPL said Tuesday. Dive Insight: Plans to build gas-fired generation in Pennsylvania comes amid a surge in data center development across the United States, fueled in part by a race to develop artificial intelligence capacity. In PPL Electric Utilities’ service territory in Pennsylvania, there is more than 13 GW of potential data center load in advanced stages of planning, according to PPL. If all those data centers are built, there would be a 6 GW generation shortfall in PPL Electric Utilities’ service territory in the next five to six years, PPL said. It would cost about $15 billion to build enough gas-fired, combined-cycle units to meet the shortfall, PPL said, noting that it expects the power plants would be built by the joint venture, independent power producers and — if legislation is passed to change Pennsylvania law — PPL Electric Utilities. Blackstone said it expects to spend $25 billion on data centers and energy infrastructure in Pennsylvania. QTS, a data center operator backed by Blackstone, has secured land sites across northeastern Pennsylvania for data centers, the private equity firm said. PPL owns 51% of

Cisco upgrades 400G optical receiver to boost AI infrastructure throughput

“In the data center, what’s really changed in the last year or so is that with AI buildouts, there’s much, much more optics that are part of 400G and 800G. It’s not so much using 10G and 25G optics, which we still sell a ton of, for campus applications. But for AI infrastructure, the 400G and 800G optics are really the dominant optics for that application,” Gartner said. Most of the AI infrastructure builds have been for training models, especially in hyperscaler environments, Gartner said. “I expect, towards the tail end of this year, we’ll start to see more enterprises deploying AI infrastructure for inference. And once they do that, because it has an Nvidia GPU attached to it, it’s going to be a 400G or 800G optic.” Core enterprise applications – such as real-time trading, high-frequency transactions, multi-cloud communications, cybersecurity analytics, network forensics, and industrial IoT – can also utilize the higher network throughput, Gartner said.

Supermicro bets big on 4-socket X14 servers to regain enterprise trust

In April, Dell announced its PowerEdge R470, R570, R670, and R770 servers with Intel Xeon 6 Processors with P-cores, but with single and double-socket servers. Similarly, Lenovo’s ThinkSystem V4 servers are also based on the Intel Xeon 6 processor but are limited to dual socket configurations. The launch of 4-socket servers by Supermicro reflects a growing enterprise need for localized compute that can support memory-bound AI and reduce the complexity of distributed architectures. “The modern 4-socket servers solve multiple pain points that have intensified with GenAI and memory-intensive analytics. Enterprises are increasingly challenged by latency, interconnect complexity, and power budgets in distributed environments. High-capacity, scale-up servers provide an architecture that is more aligned with low-latency, large-model processing, especially where data residency or compliance constraints limit cloud elasticity,” said Sanchit Vir Gogia, chief analyst and CEO at Greyhound Research. “Launching a 4-socket Xeon 6 platform and packaging it within their modular ‘building block’ strategy shows Supermicro is focusing on staying ahead in enterprise and AI data center compute,” said Devroop Dhar, co-founder and MD at Primus Partner. A critical launch after major setbacks Experts peg this to be Supermicro’s most significant product launch since it became mired in governance and regulatory controversies. In 2024, the company lost Ernst & Young, its second auditor in two years, following allegations by Hindenburg Research involving accounting irregularities and the alleged export of sensitive chips to sanctioned entities. Compounding its troubles, Elon Musk’s AI startup xAI redirected its AI server orders to Dell, a move that reportedly cost Supermicro billions in potential revenue and damaged its standing in the hyperscaler ecosystem. Earlier this year, HPE signed a $1 billion contract to provide AI servers for X, a deal Supermicro was also bidding for. “The X14 launch marks a strategic reinforcement for Supermicro, showcasing its commitment

Moving AI workloads off the cloud? A hefty data center retrofit awaits

“If you have a very specific use case, and you want to fold AI into some of your processes, and you need a GPU or two and a server to do that, then, that’s perfectly acceptable,” he says. “What we’re seeing, kind of universally, is that most of the enterprises want to migrate to these autonomous agents and agentic AI, where you do need a lot of compute capacity.” Racks of brand-new GPUs, even without new power and cooling infrastructure, can be costly, and Schneider Electric often advises cost-conscious clients to look at previous-generation GPUs to save money. GPU and other AI-related technology is advancing so rapidly, however, that it’s hard to know when to put down stakes. “We’re kind of in a situation where five years ago, we were talking about a data center lasting 30 years and going through three refreshes, maybe four,” Carlini says. “Now, because it is changing so much and requiring more and more power and cooling you can’t overbuild and then grow into it like you used to.”

My take on the Gartner Magic Quadrant for LAN infrastructure? Highly inaccurate

Fortinet being in the leader quadrant may surprise some given they are best known as a security vendor, but the company has quietly built a broad and deep networking portfolio. I have no issue with them being considered a leader and believe for security conscious companies, Fortinet is a great option. Challenger Cisco is the only company listed as a challenger, and its movement out of the leader quadrant highlights just how inaccurate this document is. There is no vendor that sells more networking equipment in more places than Cisco, and it has led enterprise networking for decades. Several years ago, when it was a leader, I could argue the division of engineering between Meraki and Catalyst could have pushed them out, but it didn’t. So why now? At its June Cisco Live event, the company launched a salvo of innovation including AI Canvas, Cisco AI Assistant, and much more. It’s also continually improved the interoperability between Meraki and Catalyst and announced several new products. AI Canvas is a completely new take, was well received by customers at Cisco Live, and reinvents the concept of AIOps. As I stated above, because of the December cutoff time for information gathering, none of this was included, but that makes Cisco’s representation false. Also, I find this MQ very vague in its “Cautions” segment. As an example, it states: “Cisco’s product strategy isn’t well-aligned with key enterprise needs.” Some details here would be helpful. In my conversations with Cisco, which includes with Chief Product Officer and President Jeetu Patel, the company has reiterated that its strategy is to help customers be AI-ready with products that are easier to deploy and manage, more automated, and with a lower cost to run. That seems well-aligned with customer needs. If Gartner is hearing customers want networks

Equinix, AWS embrace liquid cooling to power AI implementations

With AWS, it deployed In-Row Heat Exchangers (IRHX), a custom-built liquid cooling system designed specifically for servers using Nvidia’s Blackwell GPUs, it’s most powerful but also its hottest running processors used for AI training and inference. The IRHX unit has three components: a water‑distribution cabinet, an integrated pumping unit, and in‑row fan‑coil modules. It uses direct to chip liquid cooling just like the equinox servers, where cold‑plates attached to the chip draw heat from the chips and is cooled by the liquid. The warmed coolant then flows through the coils of heat exchangers, where high‑speed fans Blow on the pipes to cool them, like a car radiator. This type of cooling is nothing new, and there are a few direct to chip liquid cooling solutions on the market from Vertiv, CoolIT, Motivair, and Delta Electronics all sell liquid cooling options. But AWS separates the pumping unit from the fan-coil modules, letting a single pumping system to support large number of fan units. These modular fans can be added or removed as cooling requirements evolve, giving AWS the flexibility to adjust the system per row and site. This led to some concern that Amazon would disrupt the market for liquid cooling, but as a Dell’Oro Group analyst put it, Amazon develops custom technologies for itself and does not go into competition or business with other data center infrastructure companies.

Intel CEO: We are not in the top 10 semiconductor companies

The Q&A session came on the heels of layoffs across the company. Tan was hired in March, and almost immediately he began to promise to divest and reduce non-core assets. Gelsinger had also begun divesting the company of losers, but they were nibbles around the edge. Tan is promising to take an axe to the place. In addition to discontinuing products, the company has outsourced marketing and media relations — for the first time in more than 25 years of covering this company, I have no internal contacts at Intel. Many more workers are going to lose their jobs in coming weeks. So far about 500 have been cut in Oregon and California but many more is expected — as much as 20% of the overall company staff may go, and Intel has over 100,000 employees, according to published reports. Tan believes the company is bloated and too bogged down with layers of management to be reactive and responsive in the same way that AMD and Nvidia are. “The whole process of that (deciding) is so slow and eventually nobody makes a decision,” he is quoted as saying. Something he has decided on is AI, and he seems to have decided to give up. “On training, I think it is too late for us,” Tan said, adding that Nvidia’s position in that market is simply “too strong.” So there goes what sales Gaudi3 could muster. Instead, Tan said Intel will focus on “edge” artificial intelligence, where AI capabilities Are brought to PCs and other remote devices rather than big AI processors in data centers like Nvidia and AMD are doing. “That’s an area that I think is emerging, coming up very big and we want to make sure that we capture,” Tan said.

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Finding value from AI agents from day one

In partnership withBoomi Imagine AI so sophisticated it could read a customer’s mind? Or identify and close a cybersecurity loophole weeks before hackers strike? How