A Comprehensive Guide to LLM Temperature 🔥🌡️

Stay Ahead, Stay ONMINE

A Comprehensive Guide to LLM Temperature 🔥🌡️

While building my own LLM-based application, I found many prompt engineering guides, but few equivalent guides for determining the temperature setting. Of course, temperature is a simple numerical value while prompts can get mindblowingly complex, so it may feel trivial as a product decision. Still, choosing the right temperature can dramatically change the nature of […]

While building my own LLM-based application, I found many prompt engineering guides, but few equivalent guides for determining the temperature setting.

Of course, temperature is a simple numerical value while prompts can get mindblowingly complex, so it may feel trivial as a product decision. Still, choosing the right temperature can dramatically change the nature of your outputs, and anyone building a production-quality LLM application should choose temperature values with intention.

In this post, we’ll explore what temperature is and the math behind it, potential product implications, and how to choose the right temperature for your LLM application and evaluate it. At the end, I hope that you’ll have a clear course of action to find the right temperature for every LLM use case.

What is temperature?

Temperature is a number that controls the randomness of an LLM’s outputs. Most APIs limit the value to be from 0 to 1 or some similar range to keep the outputs in semantically coherent bounds.

From OpenAI’s documentation:

“Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.”

Intuitively, it’s like a dial that can adjust how “explorative” or “conservative” the model is when it spits out an answer.

What do these temperature values mean?

Personally, I find the math behind the temperature field very interesting, so I’ll dive into it. But if you’re already familiar with the innards of LLMs or you’re not interested in them, feel free to skip this section.

You probably know that an LLM generates text by predicting the next token after a given sequence of tokens. In its prediction process, it assigns probabilities to all possible tokens that could come next. For example, if the sequence passed to the LLM is “The giraffe ran over to the…”, it might assign high probabilities to words like “tree” or “fence” and lower probabilities to words like “apartment” or “book”.

But let’s back up a bit. How do these probabilities come to be?

These probabilities usually come from raw scores, known as logits, that are the results of many, many neural network calculations and other Machine Learning techniques. These logits are gold; they contain all the valuable information about what tokens could be selected next. But the problem with these logits is that they don’t fit the definition of a probability: they can be any number, positive or negative, like 2, or -3.65, or 20. They’re not necessarily between 0 and 1, and they don’t necessarily all add up to 1 like a nice probability distribution.

So, to make these logits usable, we need to use a function to transform them into a clean probability distribution. The function typically used here is called the softmax, and it’s essentially an elegant equation that does two important things:

It turns all the logits into positive numbers.
It scales the logits so they add up to 1.

The softmax function works by taking each logit, raising e (around 2.718) to the power of that logit, and then dividing by the sum of all these exponentials. So the highest logit will still get the highest numerator, which means it gets the highest probability. But other tokens, even with negative logit values, will still get a chance.

Now here’s where Temperature comes in: temperature modifies the logits before applying softmax. The formula for softmax with temperature is:

When the temperature is low, dividing the logits by T makes the values larger/more spread out. Then the exponentiation would make the highest value much larger than the others, making the probability distribution more uneven. The model would have a higher chance of picking the most probable token, resulting in a more deterministic output.

When the temperature is high, dividing the logits by T makes all the values smaller/closer together, spreading out the probability distribution more evenly. This means the model is more likely to pick less probable tokens, increasing randomness.

How to choose temperature

Of course, the best way to choose a temperature is to play around with it. I believe any temperature, like any prompt, should be substantiated with example runs and evaluated against other possibilities. We’ll discuss that in the next section.

But before we dive into that, I want to highlight that temperature is a crucial product decision, one that can significantly influence user behavior. It may seem rather straightforward to choose: lower for more accuracy-based applications, higher for more creative applications. But there are tradeoffs in both directions with downstream consequences for user trust and usage patterns. Here are some subtleties that come to mind:

Low temperatures can make the product feel authoritative. More deterministic outputs can create the illusion of expertise and foster user trust. However, this can also lead to gullible users. If responses are always confident, users might stop critically evaluating the AI’s outputs and just blindly trust them, even if they’re wrong.
Low temperatures can reduce decision fatigue. If you see one strong answer instead of many options, you’re more likely to take action without overthinking. This might lead to easier onboarding or lower cognitive load while using the product. Inversely, high temperatures could create more decision fatigue and lead to churn.
High temperatures can encourage user engagement. The unpredictability of high temperatures can keep users curious (like variable rewards), leading to longer sessions or increased interactions. Inversely, low temperatures might create stagnant user experiences that bore users.
Temperature can affect the way users refine their prompts. When answers are unexpected with high temperatures, users might be driven to clarify their prompts. But with low temperatures, users may be forced to add more detail or expand on their prompts in order to get new answers.

These are broad generalizations, and of course there are many more nuances with every specific application. But in most applications, the temperature can be a powerful variable to adjust in A/B testing, something to consider alongside your prompts.

Evaluating different temperatures

As developers, we’re used to unit testing: defining a set of inputs, running those inputs through a function, and getting a set of expected outputs. We sleep soundly at night when we ensure that our code is doing what we expect it to do and that our logic is satisfying some clear-cut constraints.

The promptfoo package lets you perform the LLM-prompt equivalent of unit testing, but there’s some additional nuance. Because LLM outputs are non-deterministic and often designed to do more creative tasks than strictly logical ones, it can be hard to define what an “expected output” looks like.

Defining your “expected output”

The simplest evaluation tactic is to have a human rate how good they think some output is, according to some rubric. For outputs where you’re looking for a certain “vibe” that you can’t express in words, this will probably be the most effective method.

Another simple evaluation tactic is to use deterministic metrics — these are things like “does the output contain a certain string?” or “is the output valid json?” or “does the output satisfy this javascript expression?”. If your expected output can be expressed in these ways, promptfoo has your back.

A more interesting, AI-age evaluation tactic is to use LLM-graded checks. These essentially use LLMs to evaluate your LLM-generated outputs, and can be quite effective if used properly. Promptfoo offers these model-graded metrics in multiple forms. The whole list is here, and it contains assertions from “is the output relevant to the original query?” to “compare the different test cases and tell me which one is best!” to “where does this output rank on this rubric I defined?”.

Example

Let’s say I’m creating a consumer-facing application that comes up with creative gift ideas and I want to empirically determine what temperature I should use with my main prompt.

I might want to evaluate metrics like relevance, originality, and feasibility within a certain budget and make sure that I’m picking the right temperature to optimize those factors. If I’m comparing GPT 4o-mini’s performance with temperatures of 0 vs. 1, my test file might start like this:

providers:
  - id: openai:gpt-4o-mini
    label: openai-gpt-4o-mini-lowtemp
    config:
      temperature: 0
  - id: openai:gpt-4o-mini
    label: openai-gpt-4o-mini-hightemp
    config:
      temperature: 1
prompts:
  - "Come up with a one-sentence creative gift idea for a person who is {{persona}}. It should cost under {{budget}}."tests:
  - description: "Mary - attainable, under budget, original"
    vars:
      persona: "a 40 year old woman who loves natural wine and plays pickleball"
      budget: "$100"
    assert:
      - type: g-eval
        value:
          - "Check if the gift is easily attainable and reasonable"
          - "Check if the gift is likely under $100"
          - "Check if the gift would be considered original by the average American adult"
  - description: "Sean - answer relevance"
    vars:
      persona: "a 25 year old man who rock climbs, goes to raves, and lives in Hayes Valley"
      budget: "$50"
    assert:
      - type: answer-relevance
        threshold: 0.7

I’ll probably want to run the test cases repeatedly to test the effects of temperature changes across multiple same-input runs. In that case, I would use the repeat param like:

promptfoo eval --repeat 3

Conclusion

Temperature is a simple numerical parameter, but don’t be deceived by its simplicity: it can have far-reaching implications for any LLM application.

Tuning it just right is key to getting the behavior you want — too low, and your model plays it too safe; too high, and it starts spouting unpredictable responses. With tools like promptfoo, you can systematically test different settings and find your Goldilocks zone — not too cold, not too hot, but just right. ️

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

This is a linkpost for https://bit.ly/cot-monitorability-fragile

Broadcom scales up Ethernet with Tomahawk Ultra for low latency HPC and AI

Broadcom Support for minimum packet size allows streaming of those packets at full bandwidth. That capability is essential for efficient communication in scientific and computational workloads. It is particularly important for scale-up networks where GPU-to-switch-to-GPU communication happens in a single hop. Lossless Ethernet gets an ‘Ultra’ boost Another specific area

F5 streamlines application delivery, security with new AI assistant and code generation service

The AI Assistant can also reduce documentation searches and help accelerate the handling of routine tasks, such as developing traffic configurations and deploying APIs, Ford said. In addition, it can help customers optimize traffic management and app delivery flows by offering real-time AI contextual analytics that can be applied enterprise-wide,

Nvidia to restart H20 exports to China, unveils new export-compliant GPU

China re-entry impact Nvidia’s announcements mark a bid to re-enter the world’s second-largest AI market under tightened US export controls. But this return may not mean business as usual. “Despite Nvidia’s market re-entry, Chinese companies will likely continue diversifying suppliers to strengthen supply chain resilience,” said Prabhu Ram, VP of

Georgia Power’s new IRP keeps coal plants online to serve data centers

Dive Brief: Utility regulators on Tuesday approved Georgia Power’s 2025 integrated resource plan, which calls for keeping coal plants online to serve anticipated data center demand. It also includes up to 4,000 MW of renewable energy, 1,500 MW of battery storage and a smaller amount of new gas capacity. Georgia Power said it anticipates approximately 8,500 MW of load growth over the next six years. The IRP allows for the Public Service Commission to monitor that growth, with the utility updating its load forecast and making quarterly filings regarding large load developments. Clean energy and consumer advocates were critical of the plan’s reliance on coal and gas, and an energy savings target that has not been updated in years. Positive aspects of the IRP around solar, storage and customer programs “are sadly blunted by the continued investment in fossil fuel infrastructure,” said Heather Pohnan, senior energy policy manager with the Southern Alliance for Clean Energy, in a statement. Dive Insight: The Atlanta metropolitan area is one of the hottest data center markets in the country right now, and Georgia Power’s long-term plan aims to meet the growing demand. But growth projections remain uncertain and critics of the IRP say it could leave customers on the hook for higher bills if the demand doesn’t materialize. Approval of the plan “locks in major investments based on uncertain assumptions about future data center demand, while failing to deliver meaningful benefits or cost relief to existing residential and small business customers,” said Patrick King, Georgia policy advocate with the Natural Resources Defense Council, in a statement. “The plan prioritizes speculative growth.” Data centers are driving U.S. electricity demand rapidly higher, but observers say final construction of new facilities is likely to be a fraction of what has been proposed. Astrid Atkinson, a former Google senior director of

So you’re thinking about buying an electric school bus

Linda Margison is technical advisor, Emerging Energy Resources at Hoosier Energy. Ryan Henderson is senior manager, Emerging Energy Resources at Hoosier Energy. Linda Stevens is chief strategy officer, Smart Grid and Smart City at OATI. School buses are the backbone of public transportation in the United States, with about 500,000 carrying half the nation’s children to and from school daily. In 2024, the number of those buses being converted to electric school buses (ESBs) grew to about 12,000 — or 2.5% of the total fleet — including those funded, ordered, delivered or in operation. Early adopters of ESBs in rural areas have identified five actions or insights that would have drastically improved their experience. Whether you are exploring fleet electrification or are ready to move forward, this article aims to inform your decisions about where to start, what equipment and software are necessary, and whether selling stored energy back to the electric grid is feasible. Getting the utility involved early Before considering the adoption of electric school buses, engage with the local utility early to assess the readiness of the bus depot’s electrical infrastructure. A key factor is whether the existing transformers can support the increased electrical load required to charge the buses. Charging large-capacity batteries imposes significant demands on the grid, and in many instances, the infrastructure may only be adequate with upgrades. Collaborating with the utility from the onset helps prevent unplanned expenses and aids in strategic planning for a clearer understanding of total project costs, necessary permits and timelines. Insight #1. Partner with your local utility provider as early as possible. Having the utility identify your infrastructure capacity and any improvements needed is the basis for purchasing appropriate equipment and budgeting the project cost. The utility also may offer rebates or incentive programs. Insight #2. Consider using

Virtual power plants helped save the grid during heat dome

As the eastern half of the United States baked under record heat late last month and electricity demand reached multi-year peaks, it looked like the grid might succumb. Grid operators and public officials scrambled to avoid a disaster, ordering generators to defer maintenance and customers to conserve energy. The PJM Interconnection served about 161 GW of load on June 24, its highest demand since 2011 and not far off its all-time high of 165.6 GW. But aside from scattered outages caused by heat-damaged electrical delivery equipment in parts of the New York City area, Eastern U.S. grids largely weathered the heatwave. Grid experts — and at least one grid operator — say at least some of the credit goes to distributed energy resource aggregations and flexible loads dispatching at higher rates than ever before. Those “virtual” or distributed power plants helped keep the lights on as generator reserve margins plummeted. “PJM said that demand response was essential,” Federal Energy Regulatory Commission Chair Mark Christie said in a June 30 press conference focused on the need for resource adequacy amid rising load forecasts. “That 161-GW peak would have been higher without DR, so DR is an important part of the mix too.” Major virtual power plant operators matched near-record peak loads with unprecedented dispatch activity. Sunrun dispatched more than 340 MW from customer-sited batteries on the evening of June 24. The same day, EnergyHub shed 900 MW of peak load and shifted 3.5 GWh of energy away from the highest-demand periods. Uplight managed about 350 MW of flexible load in 45 dispatch events across 16 utility programs over the course of the heat dome week. Supportive state policy, expectations for rising power demand and simple economics are pushing once-skeptical utilities to embrace VPPs, said Hannah Bascom, chief growth officer at Uplight.

Drones Hit Oil Fields in Northern Iraq in Spree of Attacks

Three oil fields in the semi-autonomous Kurdistan region in northern Iraq were attacked by drones on Wednesday, adding to a spate of hits on energy installations in the area this week. Two drones attacked the DNO ASA-operated Peshkabir field around 6am local time, the Directorate General of Counter Terrorism in Kurdistan said in a statement. Another drone hit the Tawke project about an hour later, it said. A third attack was reported at a field in Ain Sifni in the north, the Kurdistan Ministry of Natural Resources said, adding the strikes caused significant damage to infrastructure. DNO suspended output at its projects, and would restart once an assessment is completed, the company said. Gulf Keystone Petroleum Ltd., which operates the Shaikan field that produced a little over 40,000 barrels a day last year, shut operations as a precaution even though the assets haven’t been hit, the company said. The Kurdistan region hasn’t been shipping any crude to global markets since an export pipeline to Turkey’s Mediterranean coast was shut over two years ago following a payments dispute. The vast majority of Iraq’s oil production comes from the country’s south. Attacks on energy infrastructure aren’t uncommon in the north, which the Kurdistan administration often links on Iran-affiliated groups. No one has claimed responsibility for the latest spree. On Tuesday, the Sarsang field operated by US firm HKN Energy was shut after a strike caused an explosion and fire, while another project called Khurmala was targeted by two drones earlier. The US embassy in Iraq condemned the attacks in a statement on Tuesday. The Tawke field produced 29,153 barrels a day last year and Peshkabir’s output was 49,462 a day, according to DNO’s website. The Sarsang field pumped about 30,000 barrels a day of oil on average in the first quarter, according

Morgan Stanley Unpicks Oil Market Riddle as Stockpiles Swell

Global oil inventories have swollen at a rapid clip in recent months, but given the bulk of the increase has been in the Asia-Pacific, prices have been able to hold their ground for now, according to Morgan Stanley. While total stockpiles surged by about 235 million barrels in the five months to the end of June, just 10% of that has been in the OECD, the region that’s “critical for price formation,” analysts including Martijn Rats said in a July 15 note, which posed the question “Is the oil market actually tight? Or not?” Global benchmark Brent has gained ground so far this month, after advances in May and June, despite the drag from the US-led trade war and a major unwind of OPEC+ supply curbs. Although there are widespread expectations for a global glut in the coming quarters, crude’s near-term structure — with prompt prices above those further out — suggests current market tightness. “What bridges this apparent contradiction is the uneven regional distribution of global inventory builds,” the analysts said. “Most of the inventory builds have taken place in locations that have less impact on prices, whilst inventories in key pricing centers have remained unusually tight – the builds have been in the Pacific, but Brent is priced in the Atlantic.” Morgan Stanley cautioned that once the peak summer-demand season ends, a sizable surplus would be on the horizon again, although the bank still expected that only a “modest share” would show up in OECD stockpiles. These were seen rising by no more than 165 million barrels over 12 months, returning holdings to 2017 levels, when Brent fluctuated around $65 a barrel, the analysts said. Brent price forecasts were retained at $65 a barrel in the fourth quarter, and $60 for each of the four quarters in 2026.

Energy Department Announces Pilot Program to Build Advanced U.S. Nuclear Fuel Lines and End Foreign Dependence

WASHINGTON— The U.S. Department of Energy (DOE) today announced the start of a new pilot program to accelerate the development of advanced nuclear reactors and strengthen domestic supply chains for nuclear fuel. The Department issued a Request for Application (RFA) and is seeking qualified U.S. companies to build and operate nuclear fuel production lines using the DOE authorization process. This initiative will help end America’s reliance on foreign sources of enriched uranium and critical materials, while opening the door for private sector investment in America’s nuclear renaissance. Today’s action directly supports President Trump’s executive orders to reform nuclear reactor testing at the Department and deploy nuclear reactor technologies for national security, and establishes a domestic nuclear fuel supply chain for testing new reactors. “America has the resources and the expertise to lead the world in nuclear energy development, but we need secure domestic supply chains to fuel this rapidly growing energy source and achieve a true nuclear energy renaissance,” said Energy Secretary Chris Wright. “The Trump Administration is accelerating innovation, not regulation, and leveraging partnerships with the private sector to safely fuel and test new reactor designs that will unleash more reliable and affordable energy for American consumers.” Background: DOE launched a new reactor pilot program in June 2025 to expedite the testing of advanced reactor designs that will be authorized by the Department at sites located outside of the National Laboratories. DOE is currently reviewing potential applicants and anticipates selecting at least three advanced reactor designs later this summer that have the potential to achieve criticality by July 4, 2026. The United States currently lacks the sufficient domestic nuclear fuel resources to meet projected demand. DOE is relying on the same authority used to expedite testing to jumpstart fuel line development and rebuild America’s nuclear fuel production base. Applicants

Moving AI workloads off the cloud? A hefty data center retrofit awaits

“If you have a very specific use case, and you want to fold AI into some of your processes, and you need a GPU or two and a server to do that, then, that’s perfectly acceptable,” he says. “What we’re seeing, kind of universally, is that most of the enterprises want to migrate to these autonomous agents and agentic AI, where you do need a lot of compute capacity.” Racks of brand-new GPUs, even without new power and cooling infrastructure, can be costly, and Schneider Electric often advises cost-conscious clients to look at previous-generation GPUs to save money. GPU and other AI-related technology is advancing so rapidly, however, that it’s hard to know when to put down stakes. “We’re kind of in a situation where five years ago, we were talking about a data center lasting 30 years and going through three refreshes, maybe four,” Carlini says. “Now, because it is changing so much and requiring more and more power and cooling you can’t overbuild and then grow into it like you used to.”

My take on the Gartner Magic Quadrant for LAN infrastructure? Highly inaccurate

Fortinet being in the leader quadrant may surprise some given they are best known as a security vendor, but the company has quietly built a broad and deep networking portfolio. I have no issue with them being considered a leader and believe for security conscious companies, Fortinet is a great option. Challenger Cisco is the only company listed as a challenger, and its movement out of the leader quadrant highlights just how inaccurate this document is. There is no vendor that sells more networking equipment in more places than Cisco, and it has led enterprise networking for decades. Several years ago, when it was a leader, I could argue the division of engineering between Meraki and Catalyst could have pushed them out, but it didn’t. So why now? At its June Cisco Live event, the company launched a salvo of innovation including AI Canvas, Cisco AI Assistant, and much more. It’s also continually improved the interoperability between Meraki and Catalyst and announced several new products. AI Canvas is a completely new take, was well received by customers at Cisco Live, and reinvents the concept of AIOps. As I stated above, because of the December cutoff time for information gathering, none of this was included, but that makes Cisco’s representation false. Also, I find this MQ very vague in its “Cautions” segment. As an example, it states: “Cisco’s product strategy isn’t well-aligned with key enterprise needs.” Some details here would be helpful. In my conversations with Cisco, which includes with Chief Product Officer and President Jeetu Patel, the company has reiterated that its strategy is to help customers be AI-ready with products that are easier to deploy and manage, more automated, and with a lower cost to run. That seems well-aligned with customer needs. If Gartner is hearing customers want networks

Equinix, AWS embrace liquid cooling to power AI implementations

With AWS, it deployed In-Row Heat Exchangers (IRHX), a custom-built liquid cooling system designed specifically for servers using Nvidia’s Blackwell GPUs, it’s most powerful but also its hottest running processors used for AI training and inference. The IRHX unit has three components: a water‑distribution cabinet, an integrated pumping unit, and in‑row fan‑coil modules. It uses direct to chip liquid cooling just like the equinox servers, where cold‑plates attached to the chip draw heat from the chips and is cooled by the liquid. The warmed coolant then flows through the coils of heat exchangers, where high‑speed fans Blow on the pipes to cool them, like a car radiator. This type of cooling is nothing new, and there are a few direct to chip liquid cooling solutions on the market from Vertiv, CoolIT, Motivair, and Delta Electronics all sell liquid cooling options. But AWS separates the pumping unit from the fan-coil modules, letting a single pumping system to support large number of fan units. These modular fans can be added or removed as cooling requirements evolve, giving AWS the flexibility to adjust the system per row and site. This led to some concern that Amazon would disrupt the market for liquid cooling, but as a Dell’Oro Group analyst put it, Amazon develops custom technologies for itself and does not go into competition or business with other data center infrastructure companies.

Intel CEO: We are not in the top 10 semiconductor companies

The Q&A session came on the heels of layoffs across the company. Tan was hired in March, and almost immediately he began to promise to divest and reduce non-core assets. Gelsinger had also begun divesting the company of losers, but they were nibbles around the edge. Tan is promising to take an axe to the place. In addition to discontinuing products, the company has outsourced marketing and media relations — for the first time in more than 25 years of covering this company, I have no internal contacts at Intel. Many more workers are going to lose their jobs in coming weeks. So far about 500 have been cut in Oregon and California but many more is expected — as much as 20% of the overall company staff may go, and Intel has over 100,000 employees, according to published reports. Tan believes the company is bloated and too bogged down with layers of management to be reactive and responsive in the same way that AMD and Nvidia are. “The whole process of that (deciding) is so slow and eventually nobody makes a decision,” he is quoted as saying. Something he has decided on is AI, and he seems to have decided to give up. “On training, I think it is too late for us,” Tan said, adding that Nvidia’s position in that market is simply “too strong.” So there goes what sales Gaudi3 could muster. Instead, Tan said Intel will focus on “edge” artificial intelligence, where AI capabilities Are brought to PCs and other remote devices rather than big AI processors in data centers like Nvidia and AMD are doing. “That’s an area that I think is emerging, coming up very big and we want to make sure that we capture,” Tan said.

AMD: Latest news and insights

Survey: AMD continues to take server share from Intel May 20, 2025: AMD continues to take market share from Intel, growing at a faster rate and closing the gap between the two companies to the narrowest it has ever been. AMD, Nvidia partner with Saudi startup to build multi-billion dollar AI service centers May 15, 2025: As part of the avalanche of business deals that came from President Trump’s Middle East tour, both AMD and Nvidia have struck multi-billion dollar deals with an emerging Saudi AI firm. AMD targets hosting providers with affordable EPYC 4005 processors May 14, 2025: AMD launched its latest set of data center processors, targeting hosted IT service providers. The EPYC 4005 series is purpose-built with enterprise-class features and support for modern infrastructure technologies at an affordable price, the company said. Jio teams with AMD, Cisco and Nokia to build AI-enabled telecom platform March 18, 2025: Jio has teamed up with AMD, Cisco and Nokia to build an AI-enabled platform for telecom networks. The goal is to make networks smarter, more secure and more efficient to help service providers cut costs and develop new services. AMD patches microcode security holes after accidental early disclosure February 3, 2025: AMD issued two patches for severe microcode security flaws, defects that AMD said “could lead to the loss of Secure Encrypted Virtualization (SEV) protection.” The bugs were inadvertently revealed by a partner.

Nvidia hits $4T market cap as AI, high-performance semiconductors hit stride

“The company added $1 trillion in market value in less than a year, a pace that surpasses Apple and Microsoft’s previous trajectories. This rapid ascent reflects how indispensable AI chipmakers have become in today’s digital economy,” Kiran Raj, practice head, Strategic Intelligence (Disruptor) at GlobalData, said in a statement. According to GlobalData’s Innovation Radar report, “AI Chips – Trends, Market Dynamics and Innovations,” the global AI chip market is projected to reach $154 billion by 2030, growing at a compound annual growth rate (CAGR) of 20%. Nvidia has much of that market, but it also has a giant bullseye on its back with many competitors gunning for its crown. “With its AI chips powering everything from data centers and cloud computing to autonomous vehicles and robotics, Nvidia is uniquely positioned. However, competitive pressure is mounting. Players like AMD, Intel, Google, and Huawei are doubling down on custom silicon, while regulatory headwinds and export restrictions are reshaping the competitive dynamics,” he said.

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Stay Ahead, Stay ONMINE