Stay Ahead, Stay ONMINE

Multilayer Perceptron Explained with a Real-Life Example and Python Code: Sentiment Analysis

This is the first article in a series dedicated to Deep Learning, a group of Machine Learning methods that has its roots dating back to the 1940’s. Deep Learning gained attention in the last decades for its groundbreaking application in areas like image classification, speech recognition, and machine translation. Stay tuned if you’d like to see different Deep Learning algorithms explained with real-life examples and some Python code. This series of articles focuses on Deep Learning algorithms, which have been getting a lot of attention in the last few years, as many of its applications take center stage in our day-to-day life. From self-driving cars to voice assistants, face recognition or the ability to transcribe speech into text. These applications are just the tip of the iceberg. A long path of research and incremental applications has been paved since the early 1940’s. The improvements and widespread applications we’re seeing today are the culmination of the hardware and data availability catching up with computational demands of these complex methods. In traditional Machine Learning anyone who is building a model either has to be an expert in the problem area they are working on, or team up with one. Without this expert knowledge, designing and engineering features becomes an increasingly difficult challenge[1]. The quality of a Machine Learning model depends on the quality of the dataset, but also on how well features encode the patterns in the data. Deep Learning algorithms use Artificial Neural Networks as their main structure. What sets them apart from other algorithms is that they don’t require expert input during the feature design and engineering phase. Neural Networks can learn the characteristics of the data. Deep Learning algorithms take in the dataset and learn its patterns, they learn how to represent the data with features they extract on their own. Then they combine different representations of the dataset, each one identifying a specific pattern or characteristic, into a more abstract, high-level representation of the dataset[1]. This hands-off approach, without much human intervention in feature design and extraction, allows algorithms to adapt much faster to the data at hand[2]. Neural Networks are inspired by, but not necessarily an exact model of, the structure of the brain. There’s a lot we still don’t know about the brain and how it works, but it has been serving as inspiration in many scientific areas due to its ability to develop intelligence. And although there are neural networks that were created with the sole purpose of understanding how brains work, Deep Learning as we know it today is not intended to replicate how the brain works. Instead, Deep Learning focuses on enabling systems that learn multiple levels of pattern composition[1]. And, as with any scientific progress, Deep Learning didn’t start off with the complex structures and widespread applications you see in recent literature. It all started with a basic structure, one that resembles brain’s neuron. In the early 1940’s Warren McCulloch, a neurophysiologist, teamed up with logician Walter Pitts to create a model of how brains work. It was a simple linear model that produced a positive or negative output, given a set of inputs and weights. McCulloch and Pitts neuron model. (Image by author) This model of computation was intentionally called neuron, because it tried to mimic how the core building block of the brain worked. Just like brain neurons receive electrical signals, McCulloch and Pitts’ neuron received inputs and, if these signals were strong enough, passed them on to other neurons. Neuron and it’s different components. (Image Credits) The first application of the neuron replicated a logic gate, where you have one or two binary inputs, and a boolean function that only gets activated given the right inputs and weights. However, this model had a problem. It couldn’t learn like the brain. The only way to get the desired output was if the weights, working as catalyst in the model, were set beforehand. The nervous system is a net of neurons, each having a soma and an axon […] At any instant a neuron has some threshold, which excitation must exceed to initiate an impulse[3]. It was only a decade later that Frank Rosenblatt extended this model, and created an algorithm that could learn the weights in order to generate an output. Building onto McCulloch and Pitt’s neuron, Rosenblatt developed the Perceptron. Although today the Perceptron is widely recognized as an algorithm, it was initially intended as an image recognition machine. It gets its name from performing the human-like function of perception, seeing and recognizing images. In particular, interest has been centered on the idea of a machine which would be capable of conceptualizing inputs impinging directly from the physical environment of light, sound, temperature, etc. — the “phenomenal world” with which we are all familiar — rather than requiring the intervention of a human agent to digest and code the necessary information.[4] Rosenblatt’s perceptron machine relied on a basic unit of computation, the neuron. Just like in previous models, each neuron has a cell that receives a series of pairs of inputs and weights. The major difference in Rosenblatt’s model is that inputs are combined in a weighted sum and, if the weighted sum exceeds a predefined threshold, the neuron fires and produces an output. Perceptrons neuron model (left) and threshold logic (right). (Image by author) Threshold T represents the activation function. If the weighted sum of the inputs is greater than zero the neuron outputs the value 1, otherwise the output value is zero. With this discrete output, controlled by the activation function, the perceptron can be used as a binary classification model, defining a linear decision boundary. It finds the separating hyperplane that minimizes the distance between misclassified points and the decision boundary[6]. Perceptron’s loss function. (Image by author) To minimize this distance, Perceptron uses Stochastic Gradient Descent as the optimization function. If the data is linearly separable, it is guaranteed that Stochastic Gradient Descent will converge in a finite number of steps. The last piece that Perceptron needs is the activation function, the function that determines if the neuron will fire or not. Initial Perceptron models used sigmoid function, and just by looking at its shape, it makes a lot of sense! The sigmoid function maps any real input to a value that is either 0 or 1, and encodes a non-linear function. The neuron can receive negative numbers as input, and it will still be able to produce an output that is either 0 or 1. Sigmoid function (Image by author). But, if you look at Deep Learning papers and algorithms from the last decade, you’ll see the most of them use the Rectified Linear Unit (ReLU) as the neuron’s activation function. ReLU function. (Image by author) The reason why ReLU became more adopted is that it allows better optimization using Stochastic Gradient Descent, more efficient computation and is scale-invariant, meaning, its characteristics are not affected by the scale of the input. Putting it all together The neuron receives inputs and picks an initial set of weights a random. These are combined in weighted sum and then ReLU, the activation function, determines the value of the output. Perceptrons neuron model (left) and activation function (right). (Image by Author) But you might be wondering, Doesn’t Perceptron actually learn the weights? It does! Perceptron uses Stochastic Gradient Descent to find, or you might say learn, the set of weight that minimizes the distance between the misclassified points and the decision boundary. Once Stochastic Gradient Descent converges, the dataset is separated into two regions by a linear hyperplane. Although it was said the Perceptron could represent any circuit and logic, the biggest criticism was that it couldn’t represent the XOR gate, exclusive OR, where the gate only returns 1 if the inputs are different. This was proved almost a decade later by Minsky and Papert, in 1969[5] and highlights the fact that Perceptron, with only one neuron, can’t be applied to non-linear data. The Multilayer Perceptron was developed to tackle this limitation. It is a neural network where the mapping between inputs and output is non-linear. A Multilayer Perceptron has input and output layers, and one or more hidden layers with many neurons stacked together. And while in the Perceptron the neuron must have an activation function that imposes a threshold, like ReLU or sigmoid, neurons in a Multilayer Perceptron can use any arbitrary activation function. Multilayer Perceptron. (Image by author) Multilayer Perceptron falls under the category of feedforward algorithms, because inputs are combined with the initial weights in a weighted sum and subjected to the activation function, just like in the Perceptron. But the difference is that each linear combination is propagated to the next layer. Each layer is feeding the next one with the result of their computation, their internal representation of the data. This goes all the way through the hidden layers to the output layer. But it has more to it. If the algorithm only computed the weighted sums in each neuron, propagated results to the output layer, and stopped there, it wouldn’t be able to learn the weights that minimize the cost function. If the algorithm only computed one iteration, there would be no actual learning. This is where Backpropagation[7] comes into play. Backpropagation is the learning mechanism that allows the Multilayer Perceptron to iteratively adjust the weights in the network, with the goal of minimizing the cost function. There is one hard requirement for backpropagation to work properly. The function that combines inputs and weights in a neuron, for instance the weighted sum, and the threshold function, for instance ReLU, must be differentiable. These functions must have a bounded derivative, because Gradient Descent is typically the optimization function used in MultiLayer Perceptron. In each iteration, after the weighted sums are forwarded through all layers, the gradient of the Mean Squared Error is computed across all input and output pairs. Then, to propagate it back, the weights of the first hidden layer are updated with the value of the gradient. That’s how the weights are propagated back to the starting point of the neural network! One iteration of Gradient Descent. (Image by author) This process keeps going until gradient for each input-output pair has converged, meaning the newly computed gradient hasn’t changed more than a specified convergence threshold, compared to the previous iteration. Let’s see this with a real-world example. Your parents have a cozy bed and breakfast in the countryside with the traditional guestbook in the lobby. Every guest is welcome to write a note before they leave and, so far, very few leave without writing a short note or inspirational quote. Some even leave drawings of Molly, the family dog. Summer season is getting to a close, which means cleaning time, before work starts picking up again for the holidays. In the old storage room, you’ve stumbled upon a box full of guestbooks your parents kept over the years. Your first instinct? Let’s read everything! After reading a few pages, you just had a much better idea. Why not try to understand if guests left a positive or negative message? You’re a Data Scientist, so this is the perfect task for a binary classifier. So you picked a handful of guestbooks at random, to use as training set, transcribed all the messages, gave it a classification of positive or negative sentiment, and then asked your cousins to classify them as well. In Natural Language Processing tasks, some of the text can be ambiguous, so usually you have a corpus of text where the labels were agreed upon by 3 experts, to avoid ties. Sample of guest messages. (Image by author) With the final labels assigned to the entire corpus, you decided to fit the data to a Perceptron, the simplest neural network of all. But before building the model itself, you needed to turn that free text into a format the Machine Learning model could work with. In this case, you represented the text from the guestbooks as a vector using the Term Frequency — Inverse Document Frequency (TF-IDF). This method encodes any kind of text as a statistic of how frequent each word, or term, is in each sentence and the entire document. In Python you used TfidfVectorizer method from ScikitLearn, removing English stop-words and even applying L1 normalization. TfidfVectorizer(stop_words=’english’, lowercase=True, norm=’l1′) On to binary classification with Perceptron! To accomplish this, you used Perceptron completely out-of-the-box, with all the default parameters. Python source code to run Perceptron on a corpus. (Image by author) After vectorizing the corpus and fitting the model and testing on sentences the model has never seen before, you realize the Mean Accuracy of this model is 67%. Mean accuracy of the Perceptron model. (Image by author) That’s not bad for a simple neural network like Perceptron! On average, Perceptron will misclassify roughly 1 in every 3 messages your parents’ guests wrote. Which makes you wonder if perhaps this data is not linearly separable and that you could also achieve a better result with a slightly more complex neural network. Using SckitLearn’s MultiLayer Perceptron, you decided to keep it simple and tweak just a few parameters: Activation function: ReLU, specified with the parameter activation=’relu’ Optimization function: Stochastic Gradient Descent, specified with the parameter solver=’sgd’ Learning rate: Inverse Scaling, specified with the parameter learning_rate=’invscaling’ Number of iterations: 20, specified with the parameter max_iter=20 Python source code to run MultiLayer Perceptron on a corpus. (Image by author) By default, Multilayer Perceptron has three hidden layers, but you want to see how the number of neurons in each layer impacts performance, so you start off with 2 neurons per hidden layer, setting the parameter num_neurons=2. Finally, to see the value of the loss function at each iteration, you also added the parameter verbose=True. Mean accuracy of the Multilayer Perceptron model with 3 hidden layers, each with 2 nodes. (Image by author) In this case, the Multilayer Perceptron has 3 hidden layers with 2 nodes each, performs much worse than a simple Perceptron. It converges relatively fast, in 24 iterations, but the mean accuracy is not good. While the Perceptron misclassified on average 1 in every 3 sentences, this Multilayer Perceptron is kind of the opposite, on average predicts the correct label 1 in every 3 sentences. What about if you added more capacity to the neural network? What happens when each hidden layer has more neurons to learn the patterns of the dataset? Using the same method, you can simply change the num_neurons parameter an set it, for instance, to 5. buildMLPerceptron(train_features, test_features, train_targets, test_targets, num_neurons=5) Adding more neurons to the hidden layers definitely improved Model accuracy! Mean accuracy of the Multilayer Perceptron model with 3 hidden layers, each with 5 nodes. (Image by author) You kept the same neural network structure, 3 hidden layers, but with the increased computational power of the 5 neurons, the model got better at understanding the patterns in the data. It converged much faster and mean accuracy doubled! In the end, for this specific case and dataset, the Multilayer Perceptron performs as well as a simple Perceptron. But it was definitely a great exercise to see how changing the number of neurons in each hidden-layer impacts model performance. It’s not a perfect model, there’s possibly some room for improvement, but the next time a guest leaves a message that your parents are not sure if it’s positive or negative, you can use Perceptron to get a second opinion. The first Deep Learning algorithm was very simple, compared to the current state-of-the-art. Perceptron is a neural network with only one neuron, and can only understand linear relationships between the input and output data provided. However, with Multilayer Perceptron, horizons are expanded and now this neural network can have many layers of neurons, and ready to learn more complex patterns. Hope you’ve enjoyed learning about algorithms! Stay tuned for the next articles in this series, where we continue to explore Deep Learning algorithms. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015) Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. The MIT Press. McCulloch, W.S., Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5, 115–133 (1943) Frank Rosenblatt. The Perceptron, a Perceiving and Recognizing Automaton Project Para. Cornell Aeronautical Laboratory 85, 460–461 (1957) Minsky M. L. and Papert S. A. 1969. Perceptrons. Cambridge, MA: MIT Press. Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. (2013). An introduction to statistical learning : with applications in R. New York :Springer D. Rumelhart, G. Hinton, and R. Williams. Learning Representations by Back-propagating Errors. Nature 323 (6088): 533–536 (1986).

This is the first article in a series dedicated to Deep Learning, a group of Machine Learning methods that has its roots dating back to the 1940’s. Deep Learning gained attention in the last decades for its groundbreaking application in areas like image classification, speech recognition, and machine translation.

Stay tuned if you’d like to see different Deep Learning algorithms explained with real-life examples and some Python code.


This series of articles focuses on Deep Learning algorithms, which have been getting a lot of attention in the last few years, as many of its applications take center stage in our day-to-day life. From self-driving cars to voice assistants, face recognition or the ability to transcribe speech into text.

These applications are just the tip of the iceberg. A long path of research and incremental applications has been paved since the early 1940’s. The improvements and widespread applications we’re seeing today are the culmination of the hardware and data availability catching up with computational demands of these complex methods.

In traditional Machine Learning anyone who is building a model either has to be an expert in the problem area they are working on, or team up with one. Without this expert knowledge, designing and engineering features becomes an increasingly difficult challenge[1]. The quality of a Machine Learning model depends on the quality of the dataset, but also on how well features encode the patterns in the data.

Deep Learning algorithms use Artificial Neural Networks as their main structure. What sets them apart from other algorithms is that they don’t require expert input during the feature design and engineering phase. Neural Networks can learn the characteristics of the data.

Deep Learning algorithms take in the dataset and learn its patterns, they learn how to represent the data with features they extract on their own. Then they combine different representations of the dataset, each one identifying a specific pattern or characteristic, into a more abstract, high-level representation of the dataset[1]. This hands-off approach, without much human intervention in feature design and extraction, allows algorithms to adapt much faster to the data at hand[2].

Neural Networks are inspired by, but not necessarily an exact model of, the structure of the brain. There’s a lot we still don’t know about the brain and how it works, but it has been serving as inspiration in many scientific areas due to its ability to develop intelligence. And although there are neural networks that were created with the sole purpose of understanding how brains work, Deep Learning as we know it today is not intended to replicate how the brain works. Instead, Deep Learning focuses on enabling systems that learn multiple levels of pattern composition[1].

And, as with any scientific progress, Deep Learning didn’t start off with the complex structures and widespread applications you see in recent literature.

It all started with a basic structure, one that resembles brain’s neuron.

In the early 1940’s Warren McCulloch, a neurophysiologist, teamed up with logician Walter Pitts to create a model of how brains work. It was a simple linear model that produced a positive or negative output, given a set of inputs and weights.

McCulloch and Pitts neuron model. (Image by author)

This model of computation was intentionally called neuron, because it tried to mimic how the core building block of the brain worked. Just like brain neurons receive electrical signals, McCulloch and Pitts’ neuron received inputs and, if these signals were strong enough, passed them on to other neurons.

Neuron and it’s different components. (Image Credits)

The first application of the neuron replicated a logic gate, where you have one or two binary inputs, and a boolean function that only gets activated given the right inputs and weights.

However, this model had a problem. It couldn’t learn like the brain. The only way to get the desired output was if the weights, working as catalyst in the model, were set beforehand.

The nervous system is a net of neurons, each having a soma and an axon […] At any instant a neuron has some threshold, which excitation must exceed to initiate an impulse[3].

It was only a decade later that Frank Rosenblatt extended this model, and created an algorithm that could learn the weights in order to generate an output.

Building onto McCulloch and Pitt’s neuron, Rosenblatt developed the Perceptron.

Although today the Perceptron is widely recognized as an algorithm, it was initially intended as an image recognition machine. It gets its name from performing the human-like function of perception, seeing and recognizing images.

In particular, interest has been centered on the idea of a machine which would be capable of conceptualizing inputs impinging directly from the physical environment of light, sound, temperature, etc. — the “phenomenal world” with which we are all familiar — rather than requiring the intervention of a human agent to digest and code the necessary information.[4]

Rosenblatt’s perceptron machine relied on a basic unit of computation, the neuron. Just like in previous models, each neuron has a cell that receives a series of pairs of inputs and weights.

The major difference in Rosenblatt’s model is that inputs are combined in a weighted sum and, if the weighted sum exceeds a predefined threshold, the neuron fires and produces an output.

Perceptrons neuron model (left) and threshold logic (right). (Image by author)

Threshold represents the activation function. If the weighted sum of the inputs is greater than zero the neuron outputs the value 1, otherwise the output value is zero.

With this discrete output, controlled by the activation function, the perceptron can be used as a binary classification model, defining a linear decision boundary. It finds the separating hyperplane that minimizes the distance between misclassified points and the decision boundary[6].

Perceptron’s loss function. (Image by author)

To minimize this distance, Perceptron uses Stochastic Gradient Descent as the optimization function.

If the data is linearly separable, it is guaranteed that Stochastic Gradient Descent will converge in a finite number of steps.

The last piece that Perceptron needs is the activation function, the function that determines if the neuron will fire or not.

Initial Perceptron models used sigmoid function, and just by looking at its shape, it makes a lot of sense!

The sigmoid function maps any real input to a value that is either 0 or 1, and encodes a non-linear function.

The neuron can receive negative numbers as input, and it will still be able to produce an output that is either 0 or 1.

Sigmoid function (Image by author).

But, if you look at Deep Learning papers and algorithms from the last decade, you’ll see the most of them use the Rectified Linear Unit (ReLU) as the neuron’s activation function.

ReLU function. (Image by author)

The reason why ReLU became more adopted is that it allows better optimization using Stochastic Gradient Descent, more efficient computation and is scale-invariant, meaning, its characteristics are not affected by the scale of the input.

Putting it all together

The neuron receives inputs and picks an initial set of weights a random. These are combined in weighted sum and then ReLU, the activation function, determines the value of the output.

Perceptrons neuron model (left) and activation function (right). (Image by Author)

But you might be wondering, Doesn’t Perceptron actually learn the weights?

It does! Perceptron uses Stochastic Gradient Descent to find, or you might say learn, the set of weight that minimizes the distance between the misclassified points and the decision boundary. Once Stochastic Gradient Descent converges, the dataset is separated into two regions by a linear hyperplane.

Although it was said the Perceptron could represent any circuit and logic, the biggest criticism was that it couldn’t represent the XOR gateexclusive OR, where the gate only returns 1 if the inputs are different.

This was proved almost a decade later by Minsky and Papert, in 1969[5] and highlights the fact that Perceptron, with only one neuron, can’t be applied to non-linear data.

The Multilayer Perceptron was developed to tackle this limitation. It is a neural network where the mapping between inputs and output is non-linear.

A Multilayer Perceptron has input and output layers, and one or more hidden layers with many neurons stacked together. And while in the Perceptron the neuron must have an activation function that imposes a threshold, like ReLU or sigmoid, neurons in a Multilayer Perceptron can use any arbitrary activation function.

Multilayer Perceptron. (Image by author)

Multilayer Perceptron falls under the category of feedforward algorithms, because inputs are combined with the initial weights in a weighted sum and subjected to the activation function, just like in the Perceptron. But the difference is that each linear combination is propagated to the next layer.

Each layer is feeding the next one with the result of their computation, their internal representation of the data. This goes all the way through the hidden layers to the output layer.

But it has more to it.

If the algorithm only computed the weighted sums in each neuron, propagated results to the output layer, and stopped there, it wouldn’t be able to learn the weights that minimize the cost function. If the algorithm only computed one iteration, there would be no actual learning.

This is where Backpropagation[7] comes into play.

Backpropagation is the learning mechanism that allows the Multilayer Perceptron to iteratively adjust the weights in the network, with the goal of minimizing the cost function.

There is one hard requirement for backpropagation to work properly. The function that combines inputs and weights in a neuron, for instance the weighted sum, and the threshold function, for instance ReLU, must be differentiable. These functions must have a bounded derivative, because Gradient Descent is typically the optimization function used in MultiLayer Perceptron.

In each iteration, after the weighted sums are forwarded through all layers, the gradient of the Mean Squared Error is computed across all input and output pairs. Then, to propagate it back, the weights of the first hidden layer are updated with the value of the gradient. That’s how the weights are propagated back to the starting point of the neural network!

One iteration of Gradient Descent. (Image by author)

This process keeps going until gradient for each input-output pair has converged, meaning the newly computed gradient hasn’t changed more than a specified convergence threshold, compared to the previous iteration.

Let’s see this with a real-world example.

Your parents have a cozy bed and breakfast in the countryside with the traditional guestbook in the lobby. Every guest is welcome to write a note before they leave and, so far, very few leave without writing a short note or inspirational quote. Some even leave drawings of Molly, the family dog.

Summer season is getting to a close, which means cleaning time, before work starts picking up again for the holidays. In the old storage room, you’ve stumbled upon a box full of guestbooks your parents kept over the years. Your first instinct? Let’s read everything!

After reading a few pages, you just had a much better idea. Why not try to understand if guests left a positive or negative message?

You’re a Data Scientist, so this is the perfect task for a binary classifier.

So you picked a handful of guestbooks at random, to use as training set, transcribed all the messages, gave it a classification of positive or negative sentiment, and then asked your cousins to classify them as well.

In Natural Language Processing tasks, some of the text can be ambiguous, so usually you have a corpus of text where the labels were agreed upon by 3 experts, to avoid ties.

Sample of guest messages. (Image by author)

With the final labels assigned to the entire corpus, you decided to fit the data to a Perceptron, the simplest neural network of all.

But before building the model itself, you needed to turn that free text into a format the Machine Learning model could work with.

In this case, you represented the text from the guestbooks as a vector using the Term Frequency — Inverse Document Frequency (TF-IDF). This method encodes any kind of text as a statistic of how frequent each word, or term, is in each sentence and the entire document.

In Python you used TfidfVectorizer method from ScikitLearn, removing English stop-words and even applying L1 normalization.

TfidfVectorizer(stop_words='english', lowercase=True, norm='l1')

On to binary classification with Perceptron!

To accomplish this, you used Perceptron completely out-of-the-box, with all the default parameters.

Python source code to run Perceptron on a corpus. (Image by author)

After vectorizing the corpus and fitting the model and testing on sentences the model has never seen before, you realize the Mean Accuracy of this model is 67%.

Mean accuracy of the Perceptron model. (Image by author)

That’s not bad for a simple neural network like Perceptron!

On average, Perceptron will misclassify roughly 1 in every 3 messages your parents’ guests wrote. Which makes you wonder if perhaps this data is not linearly separable and that you could also achieve a better result with a slightly more complex neural network.

Using SckitLearn’s MultiLayer Perceptron, you decided to keep it simple and tweak just a few parameters:

  • Activation function: ReLU, specified with the parameter activation=’relu’
  • Optimization function: Stochastic Gradient Descent, specified with the parameter solver=’sgd’
  • Learning rate: Inverse Scaling, specified with the parameter learning_rate=’invscaling’
  • Number of iterations: 20, specified with the parameter max_iter=20
Python source code to run MultiLayer Perceptron on a corpus. (Image by author)

By default, Multilayer Perceptron has three hidden layers, but you want to see how the number of neurons in each layer impacts performance, so you start off with 2 neurons per hidden layer, setting the parameter num_neurons=2.

Finally, to see the value of the loss function at each iteration, you also added the parameter verbose=True.

Mean accuracy of the Multilayer Perceptron model with 3 hidden layers, each with 2 nodes. (Image by author)

In this case, the Multilayer Perceptron has 3 hidden layers with 2 nodes each, performs much worse than a simple Perceptron.

It converges relatively fast, in 24 iterations, but the mean accuracy is not good.

While the Perceptron misclassified on average 1 in every 3 sentences, this Multilayer Perceptron is kind of the opposite, on average predicts the correct label 1 in every 3 sentences.

What about if you added more capacity to the neural network? What happens when each hidden layer has more neurons to learn the patterns of the dataset?

Using the same method, you can simply change the num_neurons parameter an set it, for instance, to 5.

buildMLPerceptron(train_features, test_features, train_targets, test_targets, num_neurons=5)

Adding more neurons to the hidden layers definitely improved Model accuracy!

Mean accuracy of the Multilayer Perceptron model with 3 hidden layers, each with 5 nodes. (Image by author)

You kept the same neural network structure, 3 hidden layers, but with the increased computational power of the 5 neurons, the model got better at understanding the patterns in the data. It converged much faster and mean accuracy doubled!

In the end, for this specific case and dataset, the Multilayer Perceptron performs as well as a simple Perceptron. But it was definitely a great exercise to see how changing the number of neurons in each hidden-layer impacts model performance.

It’s not a perfect model, there’s possibly some room for improvement, but the next time a guest leaves a message that your parents are not sure if it’s positive or negative, you can use Perceptron to get a second opinion.

The first Deep Learning algorithm was very simple, compared to the current state-of-the-art. Perceptron is a neural network with only one neuron, and can only understand linear relationships between the input and output data provided.

However, with Multilayer Perceptron, horizons are expanded and now this neural network can have many layers of neurons, and ready to learn more complex patterns.

Hope you’ve enjoyed learning about algorithms!

Stay tuned for the next articles in this series, where we continue to explore Deep Learning algorithms.

  1. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015)
  2. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. The MIT Press.
  3. McCulloch, W.S., Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5, 115–133 (1943)
  4. Frank Rosenblatt. The Perceptron, a Perceiving and Recognizing Automaton Project Para. Cornell Aeronautical Laboratory 85, 460–461 (1957)
  5. Minsky M. L. and Papert S. A. 1969. Perceptrons. Cambridge, MA: MIT Press.
  6. Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. (2013)An introduction to statistical learning : with applications in R. New York :Springer
  7. D. Rumelhart, G. Hinton, and R. Williams. Learning Representations by Back-propagating Errors. Nature 323 (6088): 533–536 (1986).
Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

Ubuntu namespace vulnerability should be addressed quickly: Expert

Thus, “there is little impact of not ‘patching’ the vulnerability,” he said. “Organizations using centralized configuration tools like Ansible may deploy these changes with regularly scheduled maintenance or reboot windows.”  Features supposed to improve security Ironically, last October Ubuntu introduced AppArmor-based features to improve security by reducing the attack surface

Read More »

Google Cloud partners with mLogica to offer mainframe modernization

Other than the partnership with mLogica, Google Cloud also offers a variety of other mainframe migration tools, including Radis and G4 that can be employed to modernize specific applications. Enterprises can also use a combination of migration tools to modernize their mainframe applications. Some of these tools include the Gemini-powered

Read More »

Rubio Warns Venezuela Against Attacking Guyana, Exxon

US Secretary of State Marco Rubio warned Venezuela that any attempt to invade Guyana or threaten Exxon Mobil Corp.’s operations in the country would be a “very bad move.”  Rubio spoke less than a month after a Venezuelan patrol ship entered Guyanese waters and positioned itself near a vessel contracted by Exxon, which is operating the world’s fastest-growing major oil field off the coast of the South American country.  “It would be a very bad day for the Venezuelan regime if they were to attack Guyana or attack Exxon Mobil,” Rubio said in the capital city of Georgetown on Thursday. “Suffice it to say that if that regime were to do something such as that, it would be a very bad move. It would be a big mistake. For them.” Venezuelan leader Nicolas Maduro reopened a border dispute more than a century after it was settled by international arbitration as he sought to galvanize supporters for last year’s presidential election. Maduro’s military and naval arsenal dwarfs Guyana’s, which was one of the continent’s poorest countries prior to Exxon’s 2015 discovery of oil.   Guyana’s President Irfaan Ali has been successful in rallying the international community behind the country’s dispute with Venezuela, with the UK, France and the US pledging support.  “We have a big Navy,” Rubio said. “It can get anywhere in the world.”  Rubio also said the US would bolster ties with Guyana, without getting into specifics. “We have commitments that exist today with Guyana,” he said. “We want to build on those, expand on those.”  Rubio also was scheduled to visit Suriname, which has sought to encourage oil exploration in offshore territory close to the Guyanese discoveries. WHAT DO YOU THINK? Generated by readers, the comments included herein do not reflect the views and opinions of Rigzone. All

Read More »

Oil Slips Despite Weekly Gain

Oil fell on concerns that the Trump administration’s tariff onslaught will reduce energy demand. West Texas Intermediate slid 0.8% to settle above $69 a barrel, retreating along with equity markets. Crude still notched its third straight weekly advance amid waning expectations of a near-term oversupply. The US is planning to impose tariffs on auto imports and so-called reciprocal levies next week, widening the global trade war. Oil traders face an uncertain outlook as they grapple with President Donald Trump’s policies and an OPEC+ plan to revive idled output. WTI futures have been rangebound for the past eight months, trading in a band of about $15 between the high $60s and low $80s. “US stocks are struggling, and longer-term demand fears are on the minds of most traders as tariffs begin to kick in on cars not manufactured in the US,” said Dennis Kissler, senior vice president for trading at BOK Financial Securities.   Earlier this week, Vitol’s chief executive officer said while there are some threats to supply, it’s generally adequate for the next couple of years. Meanwhile, Venezuela is boosting oil exports to China as the Trump administration deploys sanctions and secondary tariffs to squeeze the Latin American nation. Oil Prices: WTI for May delivery fell 0.8% to settle at $69.36 a barrel in New York. Futures gained 1.6% for the week. Brent for May settlement dipped 0.5% to settle at $73.63 a barrel. What do you think? We’d love to hear from you, join the conversation on the Rigzone Energy Network. The Rigzone Energy Network is a new social experience created for you and all energy professionals to Speak Up about our industry, share knowledge, connect with peers and industry insiders and engage in a professional community that will empower your career in energy. MORE FROM THIS AUTHOR

Read More »

Solong North Sea disaster ship pulls into Aberdeen

The Solong, a burned-out container ship badly damaged in a collision with a US oil tanker, has finally reached Aberdeen Friday morning. It arrived at South Harbour for “safe berthing” following days of intense salvage operations. The Portuguese-flagged vessel was towed to the Port of Aberdeen after crashing into the anchored Stena Immaculate off the East Yorkshire coast on March 10, triggering an explosion and fires. It has been the focus of ongoing salvage efforts after enduring extensive damage and a week-long fire. The Solong was accompanied by another vessel equipped with counter-pollution measures to prevent further environmental damage on its pas The Solong, a burned-out container ship badly damaged in a collision with a US oil tanker, has finally reached Aberdeen Friday morning. It arrived at South Harbour for “safe berthing” following days of intense salvage operations. The Portuguese-flagged vessel was towed to the Port of Aberdeen after crashing into the anchored Stena Immaculate off the East Yorkshire coast on March 10, triggering an explosion and fires. It has been the focus of ongoing salvage efforts after enduring extensive damage and a week-long fire. The Solong was accompanied by another vessel equipped with counter-pollution measures to prevent further environmental damage on its passage to Aberdeen. Solong sailor presumed dead The crash resulted in a tragic loss: one sailor from the Solong, 38-year-old Filipino national Mark Angelo Pernia, remains missing and is presumed dead. In total, rescuers saved 36 crew members from both ships. Meanwhile, the Solong’s captain, 59-year-old Vladimir Motin from St. Petersburg, Russia, has been arrested and charged with gross negligence manslaughter. © DC ThomsonCrew on board the burnt out Solong container ship being tugged into Aberdeen’s south harbour. Image: Kenny Elrick/DC Thomson © DC ThomsonImage: Kenny Elrick/DC Thomson © DC ThomsonImage: Kenny Elrick/DC Thomson. Drone / DJI

Read More »

USA Crude Oil Inventories Down 3.3MM Barrels WoW

U.S. commercial crude oil inventories, excluding those in the Strategic Petroleum Reserve (SPR), decreased by 3.3 million barrels from the week ending March 14 to the week ending March 21, the U.S. Energy Information Administration (EIA) highlighted in its latest weekly petroleum status report. This report was published on March 26 and included data for the week ending March 21. The EIA report showed that crude oil stocks, not including the SPR, stood at 433.6 million barrels on March 21, 437.0 million barrels on March 14, and 448.2 million barrels on March 22, 2024. Crude oil in the SPR stood at 396.1 million barrels on March 21, 395.9 million barrels on March 14, and 363.1 million barrels on March 22, 2024, the report outlined. The EIA report highlighted that data may not add up to totals due to independent rounding. Total petroleum stocks – including crude oil, total motor gasoline, fuel ethanol, kerosene type jet fuel, distillate fuel oil, residual fuel oil, propane/propylene, and other oils – stood at 1.600 billion barrels on March 21, the report showed. Total petroleum stocks were up 3.5 million barrels week on week and up 19.9 million barrels year on year, the report revealed. “At 433.6 million barrels, U.S. crude oil inventories are about five percent below the five year average for this time of year,” the EIA said in its latest weekly petroleum status report. “Total motor gasoline inventories decreased by 1.4 million barrels from last week and are two percent above the five year average for this time of year. Finished gasoline inventories increased and blending components inventories decreased last week,” it added. “Distillate fuel inventories decreased by 0.4 million barrels last week and are about seven percent below the five year average for this time of year. Propane/propylene inventories decreased by

Read More »

Schneider Electric to invest $700M in US manufacturing

Dive Brief: Automation manufacturer Schneider Electric plans to invest $700 million in its U.S. operations through 2027, the company announced Tuesday. The money will go toward facility upgrades, expansions and openings across eight sites in Texas, Tennessee, Ohio, North Carolina, Massachusetts and Missouri. Schneider expects to create more than 1,000 jobs.  The move marks Schneider’s largest-ever investment in the U.S., as the company aims to meet rising demand across its data center, utilities, manufacturing and energy infrastructure segments. Dive Insight: Schneider’s announcement is part of a larger $1 billion investment the company is making in the U.S. this decade.  Artificial intelligence-driven demand for data centers and electrical infrastructure is driving the need for heightened spending on electrical grid-related needs. Data center electricity demand could double by 2030 — consuming up to 9% of the country’s electricity generation, according to a May 2024 study by the Electric Power Research Institute. “We stand at an inflection point for the technology and industrial sectors in the U.S., driven by incredible AI growth and unprecedented energy demand,” Aamir Paul, president of North America Operations for Schneider Electric, said in a statement.  Schneider has been pushing a localization strategy in recent months, with a goal to locally source and produce roughly 90% of sales in each region. That push could help the company weather the Trump administration’s tariffs on Mexico, where Schneider has much of its North American production.  CFO Hilary Maxson said on a recent earnings call that the company is watching for any reciprocal tariffs that may impact their operations. If the United States-Mexico-Canada Agreement remains in place, Maxson said the impact to Schneider would likely be “immaterial.” If the trade deal and free trade zones are repealed, however, the CFO added that the hit to the company could be greater.  “We’re really

Read More »

Utilities should develop data center tariffs to protect consumers, decarbonize: SWEEP

With data center electricity demand on the rise across the U.S., utilities should develop specialized tariffs to protect consumers and keep their grids green when these large load customers interconnect, the Southwest Energy Efficiency Project said Thursday in a report. “While AI offers the potential for significant economic and social benefits, there are growing concerns with the rapid increases in electricity demand from data centers and how they will impact the power sector and state and utility climate goals,” SWEEP said. Data centers today account for about 4.5% of U.S. electricity consumption, according to the analysis. But in its most recent report to Congress, the Lawrence Berkeley National Lab projected data centers could account for up to 12% of U.S. electricity use by 2028, SWEEP said. EPRI recently surveyed 25 utilities nationally and found almost half have received requests for new data center facilities with loads larger than 1,000 MW. And “almost half of the utilities surveyed have received data center requests that exceed 50% of their current system peak demand,” SWEEP said. The potential load additions “pose two types of threats to state greenhouse [gas] emission reduction goals,” SWEEP said: Utilities could add or utilize more fossil-fuel based generation, and they could struggle to add sufficient renewables to meet demand from the electrification of vehicles, buildings and industry, the report warned. To address these risks, SWEEP recommends that utilities ensure new data center customers — and other new industrial or commercial customers with demands over 50 MW, or combined demands from several facilities of more than 100 MW — “commit to providing sufficient revenue, over a contract period such as 12 years, to cover the generation and transmission investments made on their behalf.” Utilities should also “propose and attempt to get approval for tariffs that require new large data center customers, and other new

Read More »

Airtel connects India with 100Tbps submarine cable

“Businesses are becoming increasingly global and digital-first, with industries such as financial services, data centers, and social media platforms relying heavily on real-time, uninterrupted data flow,” Sinha added. The 2Africa Pearls submarine cable system spans 45,000 kilometers, involving a consortium of global telecommunications leaders including Bayobab, China Mobile International, Meta, Orange, Telecom Egypt, Vodafone Group, and WIOCC. Alcatel Submarine Networks is responsible for the cable’s manufacturing and installation, the statement added. This cable system is part of a broader global effort to enhance international digital connectivity. Unlike traditional telecommunications infrastructure, the 2Africa Pearls project represents a collaborative approach to solving complex global communication challenges. “The 100 Tbps capacity of the 2Africa Pearls cable significantly surpasses most existing submarine cable systems, positioning India as a key hub for high-speed connectivity between Africa, Europe, and Asia,” said Prabhu Ram, VP for Industry Research Group at CyberMedia Research. According to Sinha, Airtel’s infrastructure now spans “over 400,000 route kilometers across 34+ cables, connecting 50 countries across five continents. This expansive infrastructure ensures businesses and individuals stay seamlessly connected, wherever they are.” Gogia further emphasizes the broader implications, noting, “What also stands out is the partnership behind this — Airtel working with Meta and center3 signals a broader shift. India is no longer just a consumer of global connectivity. We’re finally shaping the routes, not just using them.”

Read More »

Former Arista COO launches NextHop AI for customized networking infrastructure

Sadana argued that unlike traditional networking where an IT person can just plug a cable into a port and it works, AI networking requires intricate, custom solutions. The core challenge is creating highly optimized, efficient networking infrastructure that can support massive AI compute clusters with minimal inefficiencies. How NextHop is looking to change the game for hyperscale networking NextHop AI is working directly alongside its hyperscaler customers to develop and build customized networking solutions. “We are here to build the most efficient AI networking solutions that are out there,” Sadana said. More specifically, Sadana said that NextHop is looking to help hyperscalers in several ways including: Compressing product development cycles: “Companies that are doing things on their own can compress their product development cycle by six to 12 months when they partner with us,” he said. Exploring multiple technological alternatives: Sadana noted that hyperscalers might try and build on their own and will often only be able to explore one or two alternative approaches. With NextHop, Sadana said his company will enable them to explore four to six different alternatives. Achieving incremental efficiency gains: At the massive cloud scale that hyperscalers operate, even an incremental one percent improvement can have an oversized outcome. “You have to make AI clusters as efficient as possible for the world to use all the AI applications at the right cost structure, at the right economics, for this to be successful,” Sadana said. “So we are participating by making that infrastructure layer a lot more efficient for cloud customers, or the hyperscalers, which, in turn, of course, gives the benefits to all of these software companies trying to run AI applications in these cloud companies.” Technical innovations: Beyond traditional networking In terms of what the company is actually building now, NextHop is developing specialized network switches

Read More »

Microsoft abandons data center projects as OpenAI considers its own, hinting at a market shift

A potential ‘oversupply position’ In a new research note, TD Cowan analysts reportedly said that Microsoft has walked away from new data center projects in the US and Europe, purportedly due to an oversupply of compute clusters that power AI. This follows reports from TD Cowen in February that Microsoft had “cancelled leases in the US totaling a couple of hundred megawatts” of data center capacity. The researchers noted that the company’s pullback was a sign of it “potentially being in an oversupply position,” with demand forecasts lowered. OpenAI, for its part, has reportedly discussed purchasing billions of dollars’ worth of data storage hardware and software to increase its computing power and decrease its reliance on hyperscalers. This fits with its planned Stargate Project, a $500 billion, US President Donald Trump-endorsed initiative to build out its AI infrastructure in the US over the next four years. Based on the easing of exclusivity between the two companies, analysts say these moves aren’t surprising. “When looking at storage in the cloud — especially as it relates to use in AI — it is incredibly expensive,” said Matt Kimball, VP and principal analyst for data center compute and storage at Moor Insights & Strategy. “Those expenses climb even higher as the volume of storage and movement of data grows,” he pointed out. “It is only smart for any business to perform a cost analysis of whether storage is better managed in the cloud or on-prem, and moving forward in a direction that delivers the best performance, best security, and best operational efficiency at the lowest cost.”

Read More »

PEAK:AIO adds power, density to AI storage server

There is also the fact that many people working with AI are not IT professionals, such as professors, biochemists, scientists, doctors, clinicians, and they don’t have a traditional enterprise department or a data center. “It’s run by people that wouldn’t really know, nor want to know, what storage is,” he said. While the new AI Data Server is a Dell design, PEAK:AIO has worked with Lenovo, Supermicro, and HPE as well as Dell over the past four years, offering to convert their off the shelf storage servers into hyper fast, very AI-specific, cheap, specific storage servers that work with all the protocols at Nvidia, like NVLink, along with NFS and NVMe over Fabric. It also greatly increased storage capacity by going with 61TB drives from Solidigm. SSDs from the major server vendors typically maxed out at 15TB, according to the vendor. PEAK:AIO competes with VAST, WekaIO, NetApp, Pure Storage and many others in the growing AI workload storage arena. PEAK:AIO’s AI Data Server is available now.

Read More »

SoftBank to buy Ampere for $6.5B, fueling Arm-based server market competition

SoftBank’s announcement suggests Ampere will collaborate with other SBG companies, potentially creating a powerful ecosystem of Arm-based computing solutions. This collaboration could extend to SoftBank’s numerous portfolio companies, including Korean/Japanese web giant LY Corp, ByteDance (TikTok’s parent company), and various AI startups. If SoftBank successfully steers its portfolio companies toward Ampere processors, it could accelerate the shift away from x86 architecture in data centers worldwide. Questions remain about Arm’s server strategy The acquisition, however, raises questions about how SoftBank will balance its investments in both Arm and Ampere, given their potentially competing server CPU strategies. Arm’s recent move to design and sell its own server processors to Meta signaled a major strategic shift that already put it in direct competition with its own customers, including Qualcomm and Nvidia. “In technology licensing where an entity is both provider and competitor, boundaries are typically well-defined without special preferences beyond potential first-mover advantages,” Kawoosa explained. “Arm will likely continue making independent licensing decisions that serve its broader interests rather than favoring Ampere, as the company can’t risk alienating its established high-volume customers.” Industry analysts speculate that SoftBank might position Arm to focus on custom designs for hyperscale customers while allowing Ampere to dominate the market for more standardized server processors. Alternatively, the two companies could be merged or realigned to present a unified strategy against incumbents Intel and AMD. “While Arm currently dominates processor architecture, particularly for energy-efficient designs, the landscape isn’t static,” Kawoosa added. “The semiconductor industry is approaching a potential inflection point, and we may witness fundamental disruptions in the next 3-5 years — similar to how OpenAI transformed the AI landscape. SoftBank appears to be maximizing its Arm investments while preparing for this coming paradigm shift in processor architecture.”

Read More »

Nvidia, xAI and two energy giants join genAI infrastructure initiative

The new AIP members will “further strengthen the partnership’s technology leadership as the platform seeks to invest in new and expanded AI infrastructure. Nvidia will also continue in its role as a technical advisor to AIP, leveraging its expertise in accelerated computing and AI factories to inform the deployment of next-generation AI data center infrastructure,” the group’s statement said. “Additionally, GE Vernova and NextEra Energy have agreed to collaborate with AIP to accelerate the scaling of critical and diverse energy solutions for AI data centers. GE Vernova will also work with AIP and its partners on supply chain planning and in delivering innovative and high efficiency energy solutions.” The group claimed, without offering any specifics, that it “has attracted significant capital and partner interest since its inception in September 2024, highlighting the growing demand for AI-ready data centers and power solutions.” The statement said the group will try to raise “$30 billion in capital from investors, asset owners, and corporations, which in turn will mobilize up to $100 billion in total investment potential when including debt financing.” Forrester’s Nguyen also noted that the influence of two of the new members — xAI, owned by Elon Musk, along with Nvidia — could easily help with fundraising. Musk “with his connections, he does not make small quiet moves,” Nguyen said. “As for Nvidia, they are the face of AI. Everything they do attracts attention.” Info-Tech’s Bickley said that the astronomical dollars involved in genAI investments is mind-boggling. And yet even more investment is needed — a lot more.

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »