Let's define our terms.
Happy birthday Laks uncle! I’ve written up a short bootcamp explaining some of the fundamental concepts in AI development. We’ll introduce some of the key terms that are floating around, take a look at some of the older approaches to machine learning, and discuss the new model architectures that have made the current state of AI possible. Hopefully, this will provide some useful background on AI theory and practice. Let’s get started!
Today, it seems like everyone is talking about how AI will change the world. AI companies are already upending cybersecurity (see Project Glasswing) and promising to cure cancer. But what exactly is artificial intelligence?
In this bootcamp, we will use the term artificial intelligence to refer to any machine-based system that can perform complex tasks that are usually associated with human intelligence, whether that is measured via reasoning, decision making, communication, etc.
You may also be familiar with the terms machine learning (ML) and deep learning. These are very closely related to artificial intelligence, but they are not exactly the same. Strictly speaking, machine learning (designing algorithms to learn patterns instead of applying hard-coded rules) is one approach to artificial intelligence, and deep learning (learning patterns via large neural networks) is one approach to machine learning. Over the last decade, however, deep learning has solidified itself as the dominant approach to artificial intelligence. In this short bootcamp, the distinction between the terms won’t matter too much, but it’s good to be familiar with the definitions.
The concept of “deep learning” has existed since the 1960s, when Alexey Ivakhnenko and V.G. Lapa published the first deep learning algorithm capable of training arbitrarily deep neural networks. So, one natural question to ask is: “Why now?” That is, why did it take so long for neural networks to become useful?
The answer is twofold. First, our computers are much better than they were in the 1960s. Over the past sixty years, electrical engineers have designed specialized circuits (GPUs) that are excellent at carrying out matrix multiplication and other parallelizable tasks that happened to be crucial for machine learning. Second, and equally important, is that the internet now provides almost unimaginable amounts of data. Modern datasets are highly complex, and classical statistical techniques often fail to capture this rich structure. Before we dive straight into modern machine learning, though, we’ll first discuss the most popular tool from classical statistics: linear regression. This will give us a better understanding of how AI has developed into what it is today.