Stay Ahead, Stay ONMINE

Mastering the Poisson Distribution: Intuition and Foundations

You’ve probably used the normal distribution one or two times too many. We all have — It’s a true workhorse. But sometimes, we run into problems. For instance, when predicting or forecasting values, simulating data given a particular data-generating process, or when we try to visualise model output and explain them intuitively to non-technical stakeholders. Suddenly, things don’t make much sense: can a user really have made -8 clicks on the banner? Or even 4.3 clicks? Both are examples of how count data doesn’t behave. I’ve found that better encapsulating the data generating process into my modelling has been key to having sensible model output. Using the Poisson distribution when it was appropriate has not only helped me convey more meaningful insights to stakeholders, but it has also enabled me to produce more accurate error estimates, better Inference, and sound decision-making. In this post, my aim is to help you get a deep intuitive feel for the Poisson distribution by walking through example applications, and taking a dive into the foundations — the maths. I hope you learn not just how it works, but also why it works, and when to apply the distribution. If you know of a resource that has helped you grasp the concepts in this blog particularly well, you’re invited to share it in the comments! Outline Examples and use cases: Let’s walk through some use cases and sharpen the intuition I just mentioned. Along the way, the relevance of the Poisson Distribution will become clear. The foundations: Next, let’s break down the equation into its individual components. By studying each part, we’ll uncover why the distribution works the way it does. The assumptions: Equipped with some formality, it will be easier to understand the assumptions that power the distribution, and at the same time set the boundaries for when it works, and when not. When real life deviates from the model: Finally, let’s explore the special links that the Poisson distribution has with the Negative Binomial distribution. Understanding these relationships can deepen our understanding, and provide alternatives when the Poisson distribution is not suited for the job. Example in an online marketplace I chose to deep dive into the Poisson distribution because it frequently appears in my day-to-day work. Online marketplaces rely on binary user choices from two sides: a seller deciding to list an item and a buyer deciding to make a purchase. These micro-behaviours drive supply and demand, both in the short and long term. A marketplace is born. Binary choices aggregate into counts — the sum of many such decisions as they occur. Attach a timeframe to this counting process, and you’ll start seeing Poisson distributions everywhere. Let’s explore a concrete example next. Consider a seller on a platform. In a given month, the seller may or may not list an item for sale (a binary choice). We would only know if she did because then we’d have a measurable count of the event. Nothing stops her from listing another item in the same month. If she does, we count those events. The total could be zero for an inactive seller or, say, 120 for a highly engaged seller. Over several months, we would observe a varying number of listed items by this seller — sometimes fewer, sometimes more — hovering around an average monthly listing rate. That is essentially a Poisson process. When we get to the assumptions section, you’ll see what we had to assume away to make this example work. Other examples Other phenomena that can be modelled with a Poisson distribution include: Sports analytics: The number of goals scored in a match between two teams. Queuing: Customers arriving at a help desk or customer support calls. Insurance: The number of claims made within a given period. Each of these examples warrants further inspection, but for the remainder of this post, we’ll use the marketplace example to illustrate the inner workings of the distribution. The mathy bit … or foundations. I find opening up the probability mass function (PMF) of distributions helpful to understanding why things work as they do. The PMF of the Poisson distribution goes like: Where λ is the rate parameter, and 𝑘 is the manifested count of the random variable (𝑘 = 0, 1, 2, 3, … events). Very neat and compact. The probability mass function of the Poisson distribution, for a few different lambdas. Contextualising λ and k: the marketplace example In the context of our earlier example — a seller listing items on our platform — λ represents the seller’s average monthly listings. As the expected monthly value for this seller, λ orchestrates the number of items she would list in a month. Note that λ is a Greek letter, so read: λ is a parameter that we can estimate from data. On the other hand, 𝑘 does not hold any information about the seller’s idiosyncratic behaviour. It’s the target value we set for the number of events that may happen to learn about its probability. The dual role of λ as the mean and variance When I said that λ orchestrates the number of monthly listings for the seller, I meant it quite literally. Namely, λ is both the expected value and variance of the distribution, indifferently, for all values of λ. This means that the mean-to-variance ratio (index of dispersion) is always 1. To put this into perspective, the normal distribution requires two parameters — 𝜇 and 𝜎², the average and variance respectively — to fully describe it. The Poisson distribution achieves the same with just one. Having to estimate only one parameter can be beneficial for parametric inference. Specifically, by reducing the variance of the model and increasing the statistical power. On the other hand, it can be too limiting of an assumption. Alternatives like the Negative Binomial distribution can alleviate this limitation. We’ll explore that later. Breaking down the probability mass function Now that we know the smallest building blocks, let’s zoom out one step: what is λᵏ, 𝑒^⁻λ, and 𝑘!, and more importantly, what is each of these components’ function in the whole? λᵏ is a weight that expresses how likely it is for 𝑘 events to happen, given that the expectation is λ. Note that “likely” here does not mean a probability, yet. It’s merely a signal strength. 𝑘! is a combinatorial correction so that we can say that the order of the events is irrelevant. The events are interchangeable. 𝑒^⁻λ normalises the integral of the PMF function to sum up to 1. It’s called the partition function of exponential-family distributions. In more detail, λᵏ relates the observed value 𝑘 to the expected value of the random variable, λ. Intuitively, more probability mass lies around the expected value. Hence, if the observed value lies close to the expectation, the probability of occurring is larger than the probability of an observation far removed from the expectation. Before we can cross-check our intuition with the numerical behaviour of λᵏ, we need to consider what 𝑘! does. Interchangeable events Had we cared about the order of events, then each unique event could be ordered in 𝑘! ways. But because we don’t, and we deem each event interchangeable, we “divide out” 𝑘! from λᵏ to correct for the overcounting. Since λᵏ is an exponential term, the output will always be larger as 𝑘 grows, holding λ constant. That is the opposite of our intuition that there is maximum probability when λ = 𝑘, as the output is larger when 𝑘 = λ + 1. But now that we know about the interchangeable events assumption — and the overcounting issue — we know that we have to factor in 𝑘! like so: λᵏ 𝑒^⁻λ / 𝑘!, to see the behaviour we expect. Now let’s check the intuition of the relationship between λ and 𝑘 through λᵏ, corrected for 𝑘!. For the same λ, say λ = 4, we should see λᵏ 𝑒^⁻λ / 𝑘! to be smaller for values of 𝑘 that are far removed from 4, compared to values of 𝑘 that lie close to 4. Like so: inline code: 4²/2 = 8 is smaller than 4⁴/24 = 10.7. This is consistent with the intuition of a higher likelihood of 𝑘 when it’s near the expectation. The image below shows this relationship more generally, where you see that the output is larger as 𝑘 approaches λ. The probability mass function without the normalising component e^-lambda. The assumptions First, let’s get one thing off the table: the difference between a Poisson process, and the Poisson distribution. The process is a stochastic continuous-time model of points happening in given interval: 1D, a line; 2D, an area, or higher dimensions. We, data scientists, most often deal with the one-dimensional case, where the “line” is time, and the points are the events of interest — I dare to say. These are the assumptions of the Poisson process: The occurrence of one event does not affect the probability of a second event. Think of our seller going on to list another item tomorrow indifferently of having done so already today, or the one from five days ago for that matter. The point here is that there is no memory between events. The average rate at which events occur, is independent of any occurrence. In other words, no event that happened (or will happen) alters λ, which remains constant throughout the observed timeframe. In our seller example, this means that listing an item today does not increase or decrease the seller’s motivation or likelihood of listing another item tomorrow. Two events cannot occur at exactly the same instant. If we were to zoom at an infinite granular level on the timescale, no two listings could have been placed simultaneously; always sequentially. From these assumptions — no memory, constant rate, events happening alone — it follows that 1) any interval’s number of events is Poisson-distributed with parameter λₜ and 2) that disjoint intervals are independent — two key properties of a Poisson process. A Note on the distribution:The distribution simply describes probabilities for various numbers of counts in an interval. Strictly speaking, one can use the distribution pragmatically whenever the data is nonnegative, can be unbounded on the right, has mean λ, and reasonably models the data. It would be just convenient if the underlying process is a Poisson one, and actually justifies using the distribution. The marketplace example: Implications So, can we justify using the Poisson distribution for our marketplace example? Let’s open up the assumptions of a Poisson process and take the test. Constant λ Why it may fail: The seller has patterned online activity; holidays; promotions; listings are seasonal goods. Consequence: λ is not constant, leading to overdispersion (mean-to-variance ratio is larger than 1, or to temporal patterns. Independence and memorylessness Why it may fail: The propensity to list again is higher after a successful listing, or conversely, listing once depletes the stock and intervenes with the propensity of listing again. Consequence: Two events are no longer independent, as the occurrence of one informs the occurrence of the other. Simultaneous events Why it may fail: Batch-listing, a new feature, was introduced to help the sellers. Consequence: Multiple listings would come online at the same time, clumped together, and they would be counted simultaneously. Balancing rigour and pragmatism As Data Scientists on the job, we may feel trapped between rigour and pragmatism. The three steps below should give you a sound foundation to decide on which side to err, when the Poisson distribution falls short: Pinpoint your goal: is it inference, simulation or prediction, and is it about high-stakes output? List the worst thing that can happen, and the cost of it for the business. Identify the problem and solution: why does the Poisson distribution not fit, and what can you do about it? list 2-3 solutions, including changing nothing. Balance gains and costs: Will your workaround improve things, or make it worse? and at what cost: interpretability, new assumptions introduced and resources used. Does it help you in achieving your goal? That said, here are some counters I use when needed. When real life deviates from your model Everything described so far pertains to the standard, or homogenous, Poisson process. But what if reality begs for something different? In the next section, we’ll cover two extensions of the Poisson distribution when the constant λ assumption does not hold. These are not mutually exclusive, but neither they are the same: Time-varying λ: a single seller whose listing rate ramps up before holidays and slows down afterward Mixed Poisson distribution: multiple sellers listing items, each with their own λ can be seen as a mixture of various Poisson processes Time-varying λ The first extension allows λ to have its own value for each time t. The PMF then becomes Where the number of events 𝐾(𝑇) in an interval 𝑇 follows the Poisson distribution with a rate no longer equal to a fixed λ, but one equal to: More intuitively, integrating over the interval 𝑡 to 𝑡 + 𝑖 gives us a single number: the expected value of events over that interval. The integral will vary by each arbitrary interval, and that’s what makes λ change over time. To understand how that integration works, it was helpful for me to think of it like this: if the interval 𝑡 to 𝑡₁ integrates to 3, and 𝑡₁ to 𝑡₂ integrates to 5, then the interval 𝑡 to 𝑡₂ integrates to 8 = 3 + 5. That’s the two expectations summed up, and now the expectation of the entire interval. Practical implication One may want to modeling the expected value of the Poisson distribution as a function of time. For instance, to model an overall change in trend, or seasonality. In generative model notation: Time may be a continuous variable, or an arbitrary function of it. Process-varying λ: Mixed Poisson distribution But then there’s a gotcha. Remember when I said that λ has a dual role as the mean and variance? That still applies here. Looking at the “relaxed” PMF*, the only thing that changes is that λ can vary freely with time. But it’s still the one and only λ that orchestrates both the expected value and the dispersion of the PMF*. More precisely, 𝔼[𝑋] = Var(𝑋) still holds. There are various reasons for this constraint not to hold in reality. Model misspecification, event interdependence and unaccounted for heterogeneity could be the issues at hand. I’d like to focus on the latter case, as it justifies the Negative Binomial distribution — one of the topics I promised to open up. Heterogeneity and overdispersionImagine we are not dealing with one seller, but with 10 of them listing at different intensity levels, λᵢ, where 𝑖 = 1, 2, 3, …, 10 sellers. Then, essentially, we have 10 Poisson processes going on. If we unify the processes and estimate the grand λ, we simplify the mixture away. Meaning, we get a correct estimate of all sellers on average, but the resulting grand λ is naive and does not know about the original spread of λᵢ. It still assumes that the variance and mean are equal, as per the axioms of the distribution. This will lead to overdispersion and, in turn, to underestimated errors. Ultimately, it inflates the false positive rate and drives poor decision-making. We need a way to embrace the heterogeneity amongst sellers’ λᵢ. Negative binomial: Extending the Poisson distributionAmong the few ways one can look at the Negative Binomial distribution, one way is to see it as a compound Poisson process — 10 sellers, sounds familiar yet? That means multiple independent Poisson processes are summed up to a single one. Mathematically, first we draw λ from a Gamma distribution: λ ~ Γ(r, θ), then we draw the count 𝑋 | λ ~ Poisson(λ). In one image, it is as if we would sample from plenty Poisson distributions, corresponding to each seller. A negative Binomial distribution arises from many Poisson distributions. The more exposing alias of the Negative binomial distribution is Gamma-Poisson mixture distribution, and now we know why: the dictating λ comes from a continuous mixture. That’s what we needed to explain the heterogeneity amongst sellers. Let’s simulate this scenario to gain more intuition. Gamma mixture of lambda. First, we draw λᵢ from a Gamma distribution: λᵢ ~ Γ(r, θ). Intuitively, the Gamma distribution tells us about the variety in the intensity — listing rate — amongst the sellers. On a practical note, one can instill their assumptions about the degree of heterogeneity in this step of the model: how different are sellers? By varying the levels of heterogeneity, one can observe the impact on the final Poisson-like distribution. Doing this type of checks (i.e., posterior predictive check), is common in Bayesian modeling, where the assumptions are set explicitly. Gamma-Poisson mixture distribution versus homogenous Poisson distribution. Τhe dashed line reflects λ, which is 4 for both distributions. In the second step, we plug the obtained λ into the Poisson distribution: 𝑋 | λ ~ Poisson(λ), and obtain a Poisson-like distribution that represents the summed subprocesses. Notably, this unified process has a larger dispersion than expected from a homogeneous Poisson distribution, but it is in line with the Gamma mixture of λ. Heterogeneous λ and inference A practical consequence of introducing flexibility into your assumed distribution is that inference becomes more challenging. More parameters (i.e., the Gamma parameters) need to be estimated. Parameters act as flexible explainers of the data, tending to overfit and explain away variance in your variable. The more parameters you have, the better the explanation may seem, but the model also becomes more susceptible to noise in the data. Higher variance reduces the power to identify a difference in means, if one exists, because — well — it gets lost in the variance. Countering the loss of power Confirm whether you indeed need to extend the standard Poisson distribution. If not, simplify to the simplest, most fit model. A quick check on overdispersion may suffice for this. Pin down the estimates of the Gamma mixture distribution parameters using regularising, informative priors (think: Bayes). During my research process for writing this blog, I learned a great deal about the connective tissue underlying all of this: how the binomial distribution plays a fundamental role in the processes we’ve discussed. And while I’d love to ramble on about this, I’ll save it for another post, perhaps. In the meantime, feel free to share your understanding in the comments section below 👍. Conclusion The Poisson distribution is a simple distribution that can be highly suitable for modelling count data. However, when the assumptions do not hold, one can extend the distribution by allowing the rate parameter to vary as a function of time or other factors, or by assuming subprocesses that collectively make up the count data. This added flexibility can address the limitations, but it comes at a cost: increased flexibility in your modelling raises the variance and, consequently, undermines the statistical power of your model. If your end goal is inference, you may want to think twice and consider exploring simpler models for the data. Alternatively, switch to the Bayesian paradigm and leverage its built-in solution to regularise estimates: informative priors. I hope this has given you what you came for — a better intuition about the Poisson distribution. I’d love to hear your thoughts about this in the comments! Unless otherwise noted, all images are by the author.Originally published at https://aalvarezperez.github.io on January 5, 2025.

You’ve probably used the normal distribution one or two times too many. We all have — It’s a true workhorse. But sometimes, we run into problems. For instance, when predicting or forecasting values, simulating data given a particular data-generating process, or when we try to visualise model output and explain them intuitively to non-technical stakeholders. Suddenly, things don’t make much sense: can a user really have made -8 clicks on the banner? Or even 4.3 clicks? Both are examples of how count data doesn’t behave.

I’ve found that better encapsulating the data generating process into my modelling has been key to having sensible model output. Using the Poisson distribution when it was appropriate has not only helped me convey more meaningful insights to stakeholders, but it has also enabled me to produce more accurate error estimates, better Inference, and sound decision-making.

In this post, my aim is to help you get a deep intuitive feel for the Poisson distribution by walking through example applications, and taking a dive into the foundations — the maths. I hope you learn not just how it works, but also why it works, and when to apply the distribution.

If you know of a resource that has helped you grasp the concepts in this blog particularly well, you’re invited to share it in the comments!

Outline

  1. Examples and use cases: Let’s walk through some use cases and sharpen the intuition I just mentioned. Along the way, the relevance of the Poisson Distribution will become clear.
  2. The foundations: Next, let’s break down the equation into its individual components. By studying each part, we’ll uncover why the distribution works the way it does.
  3. The assumptions: Equipped with some formality, it will be easier to understand the assumptions that power the distribution, and at the same time set the boundaries for when it works, and when not.
  4. When real life deviates from the model: Finally, let’s explore the special links that the Poisson distribution has with the Negative Binomial distribution. Understanding these relationships can deepen our understanding, and provide alternatives when the Poisson distribution is not suited for the job.

Example in an online marketplace

I chose to deep dive into the Poisson distribution because it frequently appears in my day-to-day work. Online marketplaces rely on binary user choices from two sides: a seller deciding to list an item and a buyer deciding to make a purchase. These micro-behaviours drive supply and demand, both in the short and long term. A marketplace is born.

Binary choices aggregate into counts — the sum of many such decisions as they occur. Attach a timeframe to this counting process, and you’ll start seeing Poisson distributions everywhere. Let’s explore a concrete example next.

Consider a seller on a platform. In a given month, the seller may or may not list an item for sale (a binary choice). We would only know if she did because then we’d have a measurable count of the event. Nothing stops her from listing another item in the same month. If she does, we count those events. The total could be zero for an inactive seller or, say, 120 for a highly engaged seller.

Over several months, we would observe a varying number of listed items by this seller — sometimes fewer, sometimes more — hovering around an average monthly listing rate. That is essentially a Poisson process. When we get to the assumptions section, you’ll see what we had to assume away to make this example work.

Other examples

Other phenomena that can be modelled with a Poisson distribution include:

  • Sports analytics: The number of goals scored in a match between two teams.
  • Queuing: Customers arriving at a help desk or customer support calls.
  • Insurance: The number of claims made within a given period.

Each of these examples warrants further inspection, but for the remainder of this post, we’ll use the marketplace example to illustrate the inner workings of the distribution.

The mathy bit

… or foundations.

I find opening up the probability mass function (PMF) of distributions helpful to understanding why things work as they do. The PMF of the Poisson distribution goes like:

Where λ is the rate parameter, and 𝑘 is the manifested count of the random variable (𝑘 = 0, 1, 2, 3, … events). Very neat and compact.

Graph: The probability mass function of the Poisson distribution, for a few different lambdas.
The probability mass function of the Poisson distribution, for a few different lambdas.

Contextualising λ and k: the marketplace example

In the context of our earlier example — a seller listing items on our platform — λ represents the seller’s average monthly listings. As the expected monthly value for this seller, λ orchestrates the number of items she would list in a month. Note that λ is a Greek letter, so read: λ is a parameter that we can estimate from data. On the other hand, 𝑘 does not hold any information about the seller’s idiosyncratic behaviour. It’s the target value we set for the number of events that may happen to learn about its probability.

The dual role of λ as the mean and variance

When I said that λ orchestrates the number of monthly listings for the seller, I meant it quite literally. Namely, λ is both the expected value and variance of the distribution, indifferently, for all values of λ. This means that the mean-to-variance ratio (index of dispersion) is always 1.

To put this into perspective, the normal distribution requires two parameters — 𝜇 and 𝜎², the average and variance respectively — to fully describe it. The Poisson distribution achieves the same with just one.

Having to estimate only one parameter can be beneficial for parametric inference. Specifically, by reducing the variance of the model and increasing the statistical power. On the other hand, it can be too limiting of an assumption. Alternatives like the Negative Binomial distribution can alleviate this limitation. We’ll explore that later.

Breaking down the probability mass function

Now that we know the smallest building blocks, let’s zoom out one step: what is λᵏ, 𝑒^⁻λ, and 𝑘!, and more importantly, what is each of these components’ function in the whole?

  • λᵏ is a weight that expresses how likely it is for 𝑘 events to happen, given that the expectation is λ. Note that “likely” here does not mean a probability, yet. It’s merely a signal strength.
  • 𝑘! is a combinatorial correction so that we can say that the order of the events is irrelevant. The events are interchangeable.
  • 𝑒^⁻λ normalises the integral of the PMF function to sum up to 1. It’s called the partition function of exponential-family distributions.

In more detail, λᵏ relates the observed value 𝑘 to the expected value of the random variable, λ. Intuitively, more probability mass lies around the expected value. Hence, if the observed value lies close to the expectation, the probability of occurring is larger than the probability of an observation far removed from the expectation. Before we can cross-check our intuition with the numerical behaviour of λᵏ, we need to consider what 𝑘! does.

Interchangeable events

Had we cared about the order of events, then each unique event could be ordered in 𝑘! ways. But because we don’t, and we deem each event interchangeable, we “divide out” 𝑘! from λᵏ to correct for the overcounting.

Since λᵏ is an exponential term, the output will always be larger as 𝑘 grows, holding λ constant. That is the opposite of our intuition that there is maximum probability when λ = 𝑘, as the output is larger when 𝑘 = λ + 1. But now that we know about the interchangeable events assumption — and the overcounting issue — we know that we have to factor in 𝑘! like so: λᵏ 𝑒^⁻λ / 𝑘!, to see the behaviour we expect.

Now let’s check the intuition of the relationship between λ and 𝑘 through λᵏ, corrected for 𝑘!. For the same λ, say λ = 4, we should see λᵏ 𝑒^⁻λ / 𝑘! to be smaller for values of 𝑘 that are far removed from 4, compared to values of 𝑘 that lie close to 4. Like so: inline code: 4²/2 = 8 is smaller than 4⁴/24 = 10.7. This is consistent with the intuition of a higher likelihood of 𝑘 when it’s near the expectation. The image below shows this relationship more generally, where you see that the output is larger as 𝑘 approaches λ.

Graph: The probability mass function without the normalising component e^-lambda.
The probability mass function without the normalising component e^-lambda.

The assumptions

First, let’s get one thing off the table: the difference between a Poisson process, and the Poisson distribution. The process is a stochastic continuous-time model of points happening in given interval: 1D, a line; 2D, an area, or higher dimensions. We, data scientists, most often deal with the one-dimensional case, where the “line” is time, and the points are the events of interest — I dare to say.

These are the assumptions of the Poisson process:

  1. The occurrence of one event does not affect the probability of a second event. Think of our seller going on to list another item tomorrow indifferently of having done so already today, or the one from five days ago for that matter. The point here is that there is no memory between events.
  2. The average rate at which events occur, is independent of any occurrence. In other words, no event that happened (or will happen) alters λ, which remains constant throughout the observed timeframe. In our seller example, this means that listing an item today does not increase or decrease the seller’s motivation or likelihood of listing another item tomorrow.
  3. Two events cannot occur at exactly the same instant. If we were to zoom at an infinite granular level on the timescale, no two listings could have been placed simultaneously; always sequentially.

From these assumptions — no memory, constant rate, events happening alone — it follows that 1) any interval’s number of events is Poisson-distributed with parameter λₜ and 2) that disjoint intervals are independent — two key properties of a Poisson process.

A Note on the distribution:
The distribution simply describes probabilities for various numbers of counts in an interval. Strictly speaking, one can use the distribution pragmatically whenever the data is nonnegative, can be unbounded on the right, has mean λ, and reasonably models the data. It would be just convenient if the underlying process is a Poisson one, and actually justifies using the distribution.

The marketplace example: Implications

So, can we justify using the Poisson distribution for our marketplace example? Let’s open up the assumptions of a Poisson process and take the test.

Constant λ

  • Why it may fail: The seller has patterned online activity; holidays; promotions; listings are seasonal goods.
  • Consequence: λ is not constant, leading to overdispersion (mean-to-variance ratio is larger than 1, or to temporal patterns.

Independence and memorylessness

  • Why it may fail: The propensity to list again is higher after a successful listing, or conversely, listing once depletes the stock and intervenes with the propensity of listing again.
  • Consequence: Two events are no longer independent, as the occurrence of one informs the occurrence of the other.

Simultaneous events

  • Why it may fail: Batch-listing, a new feature, was introduced to help the sellers.
  • Consequence: Multiple listings would come online at the same time, clumped together, and they would be counted simultaneously.

Balancing rigour and pragmatism

As Data Scientists on the job, we may feel trapped between rigour and pragmatism. The three steps below should give you a sound foundation to decide on which side to err, when the Poisson distribution falls short:

  1. Pinpoint your goal: is it inference, simulation or prediction, and is it about high-stakes output? List the worst thing that can happen, and the cost of it for the business.
  2. Identify the problem and solution: why does the Poisson distribution not fit, and what can you do about it? list 2-3 solutions, including changing nothing.
  3. Balance gains and costs: Will your workaround improve things, or make it worse? and at what cost: interpretability, new assumptions introduced and resources used. Does it help you in achieving your goal?

That said, here are some counters I use when needed.

When real life deviates from your model

Everything described so far pertains to the standard, or homogenous, Poisson process. But what if reality begs for something different?

In the next section, we’ll cover two extensions of the Poisson distribution when the constant λ assumption does not hold. These are not mutually exclusive, but neither they are the same:

  1. Time-varying λ: a single seller whose listing rate ramps up before holidays and slows down afterward
  2. Mixed Poisson distribution: multiple sellers listing items, each with their own λ can be seen as a mixture of various Poisson processes

Time-varying λ

The first extension allows λ to have its own value for each time t. The PMF then becomes

Where the number of events 𝐾(𝑇) in an interval 𝑇 follows the Poisson distribution with a rate no longer equal to a fixed λ, but one equal to:

More intuitively, integrating over the interval 𝑡 to 𝑡 + 𝑖 gives us a single number: the expected value of events over that interval. The integral will vary by each arbitrary interval, and that’s what makes λ change over time. To understand how that integration works, it was helpful for me to think of it like this: if the interval 𝑡 to 𝑡₁ integrates to 3, and 𝑡₁ to 𝑡₂ integrates to 5, then the interval 𝑡 to 𝑡₂ integrates to 8 = 3 + 5. That’s the two expectations summed up, and now the expectation of the entire interval.

Practical implication 
One may want to modeling the expected value of the Poisson distribution as a function of time. For instance, to model an overall change in trend, or seasonality. In generative model notation:

Time may be a continuous variable, or an arbitrary function of it.

Process-varying λ: Mixed Poisson distribution

But then there’s a gotcha. Remember when I said that λ has a dual role as the mean and variance? That still applies here. Looking at the “relaxed” PMF*, the only thing that changes is that λ can vary freely with time. But it’s still the one and only λ that orchestrates both the expected value and the dispersion of the PMF*. More precisely, 𝔼[𝑋] = Var(𝑋) still holds.

There are various reasons for this constraint not to hold in reality. Model misspecification, event interdependence and unaccounted for heterogeneity could be the issues at hand. I’d like to focus on the latter case, as it justifies the Negative Binomial distribution — one of the topics I promised to open up.

Heterogeneity and overdispersion
Imagine we are not dealing with one seller, but with 10 of them listing at different intensity levels, λᵢ, where 𝑖 = 1, 2, 3, …, 10 sellers. Then, essentially, we have 10 Poisson processes going on. If we unify the processes and estimate the grand λ, we simplify the mixture away. Meaning, we get a correct estimate of all sellers on average, but the resulting grand λ is naive and does not know about the original spread of λᵢ. It still assumes that the variance and mean are equal, as per the axioms of the distribution. This will lead to overdispersion and, in turn, to underestimated errors. Ultimately, it inflates the false positive rate and drives poor decision-making. We need a way to embrace the heterogeneity amongst sellers’ λᵢ.

Negative binomial: Extending the Poisson distribution
Among the few ways one can look at the Negative Binomial distribution, one way is to see it as a compound Poisson process — 10 sellers, sounds familiar yet? That means multiple independent Poisson processes are summed up to a single one. Mathematically, first we draw λ from a Gamma distribution: λ ~ Γ(r, θ), then we draw the count 𝑋 | λ ~ Poisson(λ).

In one image, it is as if we would sample from plenty Poisson distributions, corresponding to each seller.

A negative Binomial distribution arises from many Poisson distributions.
A negative Binomial distribution arises from many Poisson distributions.

The more exposing alias of the Negative binomial distribution is Gamma-Poisson mixture distribution, and now we know why: the dictating λ comes from a continuous mixture. That’s what we needed to explain the heterogeneity amongst sellers.

Let’s simulate this scenario to gain more intuition.

Gamma mixture of lambda.
Gamma mixture of lambda.

First, we draw λᵢ from a Gamma distribution: λᵢ ~ Γ(r, θ). Intuitively, the Gamma distribution tells us about the variety in the intensity — listing rate — amongst the sellers.

On a practical note, one can instill their assumptions about the degree of heterogeneity in this step of the model: how different are sellers? By varying the levels of heterogeneity, one can observe the impact on the final Poisson-like distribution. Doing this type of checks (i.e., posterior predictive check), is common in Bayesian modeling, where the assumptions are set explicitly.

Gamma-Poisson mixture distribution versus homogenous Poisson distribution. Τhe dashed line reflects λ, which is 4 for both distributions.
Gamma-Poisson mixture distribution versus homogenous Poisson distribution. Τhe dashed line reflects λ, which is 4 for both distributions.

In the second step, we plug the obtained λ into the Poisson distribution: 𝑋 | λ ~ Poisson(λ), and obtain a Poisson-like distribution that represents the summed subprocesses. Notably, this unified process has a larger dispersion than expected from a homogeneous Poisson distribution, but it is in line with the Gamma mixture of λ.

Heterogeneous λ and inference

A practical consequence of introducing flexibility into your assumed distribution is that inference becomes more challenging. More parameters (i.e., the Gamma parameters) need to be estimated. Parameters act as flexible explainers of the data, tending to overfit and explain away variance in your variable. The more parameters you have, the better the explanation may seem, but the model also becomes more susceptible to noise in the data. Higher variance reduces the power to identify a difference in means, if one exists, because — well — it gets lost in the variance.

Countering the loss of power

  1. Confirm whether you indeed need to extend the standard Poisson distribution. If not, simplify to the simplest, most fit model. A quick check on overdispersion may suffice for this.
  2. Pin down the estimates of the Gamma mixture distribution parameters using regularising, informative priors (think: Bayes).

During my research process for writing this blog, I learned a great deal about the connective tissue underlying all of this: how the binomial distribution plays a fundamental role in the processes we’ve discussed. And while I’d love to ramble on about this, I’ll save it for another post, perhaps. In the meantime, feel free to share your understanding in the comments section below 👍.

Conclusion

The Poisson distribution is a simple distribution that can be highly suitable for modelling count data. However, when the assumptions do not hold, one can extend the distribution by allowing the rate parameter to vary as a function of time or other factors, or by assuming subprocesses that collectively make up the count data. This added flexibility can address the limitations, but it comes at a cost: increased flexibility in your modelling raises the variance and, consequently, undermines the statistical power of your model.

If your end goal is inference, you may want to think twice and consider exploring simpler models for the data. Alternatively, switch to the Bayesian paradigm and leverage its built-in solution to regularise estimates: informative priors.

I hope this has given you what you came for — a better intuition about the Poisson distribution. I’d love to hear your thoughts about this in the comments!

Unless otherwise noted, all images are by the author.
Originally published at 
https://aalvarezperez.github.io on January 5, 2025.

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

Coterra’s net income surges, Kimmeridge calls for leadership change

Coterra Energy Inc. yesterday reported third-quarter 2025 net income of $322 million up sharply from $252 million from the year-earlier quarter. Year-to-date net income was nearly $1.35 billion, a 64% increase from the first 9 months of 2024. For third-quarter 2025, total barrels of oil equivalent (boe), natural gas production, and oil production were all near the high-end of the company’s guidance ranges, beating their respective mid-points by roughly 2.5%. Incurred capital expenditures from drilling, completion, and other fixed asset additions (non-GAAP) totaled $658 million, near the mid-point of Coterra’s guidance range of $625-675 million. The company turned in-line 48 net wells during the quarter. In the Permian, 38 net wells were turned in-line, below guidance of 40-50 net wells. Anadarko and Marcellus turned in-line six and four net wells, respectively, in line with guidance. Total equivalent production averaged 785,000 boe/d, near the high end of guidance (740,000-790,000 boe/d). But private investment firm Kimmeridge, describing itself as a significant Coterra shareholder, today released an open letter to Coterra’s board calling for “decisive action to address the company’s failures of governance and lack of strategic focus following the failed merger of Cabot Oil & Gas and Cimarex Energy,” up to and including a change of leadership. Coterra was created by the 2021 merger of these two companies. “Coterra’s history has been tainted by a boardroom unwilling to acknowledge its own missteps,” said Mark Viviano, managing partner at Kimmeridge. “Coterra now trades at a significant discount to both Permian and gas-focused peers, underscoring the market’s rejection of a merger that prioritized self-preservation over strategic merit. Kimmeridge maintains that Coterra’s path forward hinges on new leadership and a renewed focus on the Delaware basin. The Board should immediately appoint a non-executive chair who is independent and unassociated with the merger to restore objectivity

Read More »

Diamondback production and output ‘leveling off’ late this year and into 2026

Van’t Hof told analysts on the conference call that the demand picture looks strong these days and that “supply is the hot debate right now.” In a letter accompanying Diamondback’s third-quarter earnings report, he added that the company’s leaders are more aligned with OPEC’s forecast that oversupply through mid-2026 will be less than 500,000 b/d than they are with the International Energy Agency’s outlook of a nearly 4 million b/d surplus. Diamondback, which produced nearly 504,000 b/d of oil in Q3 from its roughly 750,000 net acres in the Permian basin, is content to hold its production levels steady but still be prepared to either boost or bring down output should market conditions change significantly. “We firmly believe there is no need for incremental oil barrels until there is a proper price signal,” Van’t Hof wrote in his letter. In the 3 months that ended Sept. 30, Diamondback’s total production came in at nearly 943,000 boe/d, up from about 920,000 boe/d in the second quarter. The company’s average price/bbl moved up to $64.60 from $63.23 in the spring but was still 12% below the figure from 2024’s third quarter. Its combined price ticked up slightly to $39.73/boe from $39.61 in Q2. Those data points translated into net income of $1.09 billion on total revenues of more than $3.9 billion. Looking to the current quarter, Van’t Hof and his team are forecasting oil output of 505,000 to 515,000 b/d. (That figure will dip to about 505,000 b/d after the company completes an asset sale to its Viper Energy mineral and royalty subsidiary.) They expect total production to be between 927,000 and 963,000 boe/d. Shares of Diamondback (Ticker: FANG) were down nearly 2% to $138.69 in early-afternoon trading Nov. 4, with broader market indices all down more than 1%. Diamondback stock is

Read More »

Uniper Posts $52B Nine-Month Sales Revenue

Uniper SE has reported EUR 44.83 billion ($51.89 billion) in sales for the first nine months of 2025, down from EUR 48.26 billion for January-September 2024 partly due to a portfolio decrease from asset sales. Net profit adjusted for nonrecurring items for the first three quarters of 2025 was EUR 268 million, compared to EUR 1.32 billion for the same period last year. Earnings per share for January-September 2025 landed at EUR 1.35, down from EUR 1.92, the German power and gas utility reported on its website. Before adjustment, net income was EUR 568 million, down from EUR 841 million year-on-year. Adjusted EBITDA for January-September 2025 totaled EUR 641 million, compared to EUR 2.18 billion for the 2024 comparable period. Adjusted EBIT came at EUR 235 million, compared to EUR 1.72 billion for January-September 2024. Green Generation adjusted EBITDA fell year-over-year from EUR 738 million to EUR 540 million. “The price level in northern Sweden remains lower than in the prior-year period, mainly because of high reservoir levels in the first half of 2025”, offsetting a higher power output in the country, Uniper said. The shutdown of the Oskarshamn 3 nuclear power station from the start of the second quarter of 2025 also “adversely affected earnings” from Sweden, Uniper said. The plant was restarted up November 2, it said. “Earnings at Uniper’s hydropower business in Germany were slightly lower, too. Pumped-storage power plants’ contribution to earnings was smaller, whereas that of run-of-river power plants, which benefited from more favorable market conditions, was larger”, Uniper added. Flexible Generation adjusted EBITDA dropped from EUR 1.06 billion to EUR 459 million. “Adverse factors included a decline in earnings on hedging transactions on the fossil trading margin and a smaller generation portfolio”, Uniper said. “The latter especially reflects the decommissioning of Ratcliffe power plant in

Read More »

Where Did Chevron’s Oil and Gas Production Come From in 3Q?

Chevron Corporation revealed a breakdown of its oil and gas production in the third quarter of this year in its latest results statement, which was posted on the company’s website recently. According to this statement, the company’s net oil equivalent production came in at 4.086 million barrels per day in the third quarter. Chevron’s statement showed that this output was almost evenly distributed across its U.S. upstream segment and its international upstream segment. In the third quarter, Chevron’s net oil equivalent production from its U.S. upstream segment was 2.040 million barrels per day and its net oil equivalent production from its international upstream segment was 2.046 million barrels per day, the statement highlighted. Of the U.S. upstream net oil equivalent output, liquids production made up 1.496 million barrels per day and natural gas production made up 3.265 billion cubic feet per day, according to the statement. The company’s international upstream net oil equivalent production comprised 1.099 million barrels per day of liquids production and 5.674 billion cubic feet per day of natural gas production, the statement revealed. Chevron’s total net oil equivalent production was 3.396 million barrels per day in the second quarter and 3.364 million barrels per day in the third quarter of last year. The company’s U.S. upstream net oil equivalent production came in at 1.695 million barrels per day in the second quarter and 1.605 million barrels per day in the third quarter of last year, the statement highlighted. Chevron’s international upstream net oil equivalent output was 1.701 million barrels per day in the second quarter and 1.759 million barrels per day in the third quarter of 2024, according to the statement. Chevron reported upstream earnings of $3.302 billion in the third quarter in its latest results statement, which showed that the company’s upstream earnings stood at

Read More »

North America Goes Back to Adding Rigs

North America added six rigs week on week, according to Baker Hughes’ latest North America rotary rig count, which was published on November 7. The total U.S. rig count increased by two week on week and the total Canada rig count increased by four during the same period, taking the total North America rig count up to 739, comprising 548 rigs from the U.S. and 191 rigs from Canada, the count outlined. Of the total U.S. rig count of 548, 527 rigs are categorized as land rigs, 19 are categorized as offshore rigs, and two are categorized as inland water rigs. The total U.S. rig count is made up of 414 oil rigs, 128 gas rigs, and six miscellaneous rigs, according to Baker Hughes’ count, which revealed that the U.S. total comprises 478 horizontal rigs, 59 directional rigs, and 11 vertical rigs. Week on week, the U.S. offshore and inland water rig counts remained unchanged, and the country’s land rig count increased by two, Baker Hughes highlighted. The U.S. oil rig count remained unchanged, its gas rig count increased by three, and its miscellaneous rig count dropped by one, week on week, the count showed. The U.S. horizontal and vertical rig counts remained unchanged week on week, while the country’s directional rig count increased by two during the period, the count revealed. A major state variances subcategory included in the rig count showed that, week on week, Louisiana added two rigs, Alaska and California each added one rig, and Texas and Wyoming each dropped one rig. A major state variances subcategory included in the rig count showed that, week on week, the Haynesville basin added one rig and the Cana Woodford, Eagle Ford, and Granite Wash basins each dropped one rig week on week. Canada’s total rig count of 191

Read More »

Oil Rises on Shutdown Hopes

Oil rose as a push to end the US government shutdown buoyed wider markets, with crude traders also looking toward a data-heavy week that will yield insights into whether a long-awaited global surplus is forming. West Texas Intermediate rose around 0.6% to settle above $60 a barrel after two weekly declines, while Brent closed around $64. In the US, the White House expressed support for a bipartisan deal to reopen the US government after its longest-ever shutdown. Markets took the progress as a breakthrough, with tech shares driving the equities rally. Crude has dropped in five of the past six weeks as jitters over surplus supply gained greater traction. The Organization of the Petroleum Exporting Countries and its allies have been loosening output curbs in an apparent effort to gain market share, while drillers from outside the alliance, including the US, have also been adding barrels. OPEC is due to release its monthly analysis on Wednesday, with the International Energy Agency issuing an annual energy outlook the same day, followed by its regular monthly snapshot on Thursday. US sanctions also remain in focus after the Trump administration last month targeted Russia’s Rosneft PJSC and Lukoil PJSC in a bid to raise pressure on the Kremlin to end its war in Ukraine. Governments across Europe and the Middle East are rushing to ensure Lukoil’s sprawling oil operations can keep running after the US sanctions and a quashed bid by energy merchant Gunvor Group for its assets last week. Iraq is said to have transferred operations at Lukoil’s West Qurna 2 field to two state firms in an effort to ensure production continues. Earlier in the day Lukoil declared force majeure, allowing it to exercise the right to skip contractual obligations on the field, according to a person familiar with the matter.

Read More »

Buyer’s guide to AI networking technology

Extreme Networks: AI management over AI hardware Extreme deliberately prioritizes AI-powered network management over building specialized hyperscale AI infrastructure, a pragmatic positioning for a vendor targeting enterprise and mid-market.Named a Leader in IDC MarketScape: Worldwide Enterprise Wireless LAN 2025 (October 2025) for AI-powered automation, flexible deployment options and expertise in high-density environments. The company specializes in challenging wireless environments including stadiums, airports and historic venues (Fenway Park, Lambeau Field, Dubai World Trade Center, Liverpool FC’s Anfield Stadium). Key AI networking hardware 8730 Switch: 32×400GbE QSFP-DD fixed configuration delivering 12.8 Tbps throughput in 2RU for IP fabric spine/leaf designs. Designed for AI and HPC workloads with low latency, robust traffic management and power efficiency. Runs Extreme ONE OS (microservices architecture). Supports integrated application hosting with dedicated CPU for VM-based apps. Available Q3 2025. 7830 Switch: High-density 100G/400G fixed-modular core switch delivering 32×100Gb QSFP28 + 8×400Gb QSFP-DD ports with two VIM expansion slots. VIM modules enable up to 64×100Gb or 24×400Gb total capacity with 12.8 Tbps throughput in 2RU. Powered by Fabric Engine OS. Announced May 2025, available Q3 2025. Wi-Fi 7 access points: AP4020 (indoor) and AP4060 (outdoor with external antenna support, GA September 2025) completing premium Wi-Fi 7 portfolio. Extreme Platform ONE:Generally available Q3 2025 with 265+ customers. Integrates conversational, multimodal and agentic AI with three agents (AI Expert, AI Canvas, Service AI Agent) cutting resolution times 98%. Includes embedded Universal ZTNA and two-tier simplified licensing. ExtremeCloud IQ: Cloud-based network management integrating wireless, wired and SD-WAN with AI/ML capabilities and digital twin support for testing configurations before deployment. Extreme Fabric: Native SPB-based Layer 2 fabric with sub-second convergence, automated macro and micro-segmentation and free licensing (no controllers required). Multi-area fabric architecture solves traditional SPB scaling limitations. Analyst Rankings: Market leadership in AI networking Foundry Each of the vendors has its

Read More »

Microsoft’s In-Chip Microfluidics Technology Resets the Limits of AI Cooling

Raising the Thermal Ceiling for AI Hardware As Microsoft positions it, the significance of in-chip microfluidics goes well beyond a novel way to cool silicon. By removing heat at its point of generation, the technology raises the thermal ceiling that constrains today’s most power-dense compute devices. That shift could redefine how next-generation accelerators are designed, packaged, and deployed across hyperscale environments. Impact of this cooling change: Higher-TDP accelerators and tighter packing. Where thermal density has been the limiting factor, in-chip microfluidics could enable denser server sleds—such as NVL- or NVL-like trays—or allow higher per-GPU power budgets without throttling. 3D-stacked and HBM-heavy silicon. Microsoft’s documentation explicitly ties microfluidic cooling to future 3D-stacked and high-bandwidth-memory (HBM) architectures, which would otherwise be heat-limited. By extracting heat inside the package, the approach could unlock new levels of performance and packaging density for advanced AI accelerators. Implications for the AI Data Center If microfluidics can be scaled from prototype to production, its influence will ripple through every layer of the data center, from the silicon package to the white space and plant. The technology touches not only chip design but also rack architecture, thermal planning, and long-term cost models for AI infrastructure. Rack densities, white space topology, and facility thermals Raising thermal efficiency at the chip level has a cascading effect on system design: GPU TDP trajectory. Press materials and analysis around Microsoft’s collaboration with Corintis suggest the feasibility of far higher thermal design power (TDP) envelopes than today’s roughly 1–2 kW per device. Corintis executives have publicly referenced dissipation targets in the 4 kW to 10 kW range, highlighting how in-chip cooling could sustain next-generation GPU power levels without throttling. Rack, ring, and row design. By removing much of the heat directly within the package, microfluidics could reduce secondary heat spread into boards and

Read More »

Designing the AI Century: 7×24 Exchange Fall ’25 Charts the New Data Center Industrial Stack

SMRs and the AI Power Gap: Steve Fairfax Separates Promise from Physics If NVIDIA’s Sean Young made the case for AI factories, Steve Fairfax offered a sobering counterweight: even the smartest factories can’t run without power—and not just any power, but constant, high-availability, clean generation at a scale utilities are increasingly struggling to deliver. In his keynote “Small Modular Reactors for Data Centers,” Fairfax, president of Oresme and one of the data center industry’s most seasoned voices on reliability, walked through the long arc from nuclear fusion research to today’s resurgent interest in fission at modular scale. His presentation blended nuclear engineering history with pragmatic counsel for AI-era infrastructure leaders: SMRs are promising, but their road to reality is paved with physics, fuel, and policy—not PowerPoint. From Fusion Research to Data Center Reliability Fairfax began with his own story—a career that bridges nuclear reliability and data center engineering. As a young physicist and electrical engineer at MIT, he helped build the Alcator C-MOD fusion reactor, a 400-megawatt research facility that heated plasma to 100 million degrees with 3 million amps of current. The magnet system alone drew 265,000 amps at 1,400 volts, producing forces measured in millions of pounds. It was an extreme experiment in controlled power, and one that shaped his later philosophy: design for failure, test for truth, and assume nothing lasts forever. When the U.S. cooled on fusion power in the 1990s, Fairfax applied nuclear reliability methods to data center systems—quantifying uptime and redundancy with the same math used for reactor safety. By 1994, he was consulting for hyperscale pioneers still calling 10 MW “monstrous.” Today’s 400 MW campuses, he noted, are beginning to look a lot more like reactors in their energy intensity—and increasingly, in their regulatory scrutiny. Defining the Small Modular Reactor Fairfax defined SMRs

Read More »

Top network and data center events 2025 & 2026

Denise Dubie is a senior editor at Network World with nearly 30 years of experience writing about the tech industry. Her coverage areas include AIOps, cybersecurity, networking careers, network management, observability, SASE, SD-WAN, and how AI transforms enterprise IT. A seasoned journalist and content creator, Denise writes breaking news and in-depth features, and she delivers practical advice for IT professionals while making complex technology accessible to all. Before returning to journalism, she held senior content marketing roles at CA Technologies, Berkshire Grey, and Cisco. Denise is a trusted voice in the world of enterprise IT and networking.

Read More »

Google’s cheaper, faster TPUs are here, while users of other AI processors face a supply crunch

Opportunities for the AI industry LLM vendors such as OpenAI and Anthropic, which still have relatively young code bases and are continuously evolving them, also have much to gain from the arrival of Ironwood for training their models, said Forrester vice president and principal analyst Charlie Dai. In fact, Anthropic has already agreed to procure 1 million TPUs for training and its models and using them for inferencing. Other, smaller vendors using Google’s TPUs for training models include Lightricks and Essential AI. Google has seen a steady increase in demand for its TPUs (which it also uses to run interna services), and is expected to buy $9.8 billion worth of TPUs from Broadcom this year, compared to $6.2 billion and $2.04 billion in 2024 and 2023 respectively, according to Harrowell. “This makes them the second-biggest AI chip program for cloud and enterprise data centers, just tailing Nvidia, with approximately 5% of the market. Nvidia owns about 78% of the market,” Harrowell said. The legacy problem While some analysts were optimistic about the prospects for TPUs in the enterprise, IDC research director Brandon Hoff said enterprises will most likely to stay away from Ironwood or TPUs in general because of their existing code base written for other platforms. “For enterprise customers who are writing their own inferencing, they will be tied into Nvidia’s software platform,” Hoff said, referring to CUDA, the software platform that runs on Nvidia GPUs. CUDA was released to the public in 2007, while the first version of TensorFlow has only been around since 2015.

Read More »

Cisco launches AI infrastructure, AI practitioner certifications

“This new certification focuses on artificial intelligence and machine learning workloads, helping technical professionals become AI-ready and successfully embed AI into their workflows,” said Pat Merat, vice president at Learn with Cisco, in a blog detailing the new AI Infrastructure Specialist certification. “The certification validates a candidate’s comprehensive knowledge in designing, implementing, operating, and troubleshooting AI solutions across Cisco infrastructure.” Separately, the AITECH certification is part of the Cisco AI Infrastructure track, which complements its existing networking, data center, and security certifications. Cisco says the AITECH cert training is intended for network engineers, system administrators, solution architects, and other IT professionals who want to learn how AI impacts enterprise infrastructure. The training curriculum covers topics such as: Utilizing AI for code generation, refactoring, and using modern AI-assisted coding workflows. Using generative AI for exploratory data analysis, data cleaning, transformation, and generating actionable insights. Designing and implementing multi-step AI-assisted workflows and understanding complex agentic systems for automation. Learning AI-powered requirements, evaluating customization approaches, considering deployment strategies, and designing robust AI workflows. Evaluating, fine-tuning, and deploying pre-trained AI models, and implementing Retrieval Augmented Generation (RAG) systems. Monitoring, maintaining, and optimizing AI-powered workflows, ensuring data integrity and security. AITECH certification candidates will learn how to use AI to enhance productivity, automate routine tasks, and support the development of new applications. The training program includes hands-on labs and simulations to demonstrate practical use cases for AI within Cisco and multi-vendor environments.

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »