Brandon John Grenier

Chapter 1: An Introduction to Artificial Intelligence and Machine Learning

What is Artificial Intelligence?

First coined in 1956 by John McCarthy, Artificial Intelligence (AI) broadly describes computer systems that have the ability to perform tasks that mimic human intelligence. Creating computer systems that can successfully reason, plan, represent knowledge, learn, process natural language and perceive are all major goals of active AI research. Artificial Intelligence can be categorized into two different types, or forms of intelligence:

General AI, also known as Strong AI or Full AI is the canonical form of intelligence you’ll likely think about when you hear the term AI – this is your Skynet and Terminators. General AI would exhibit all of the characteristics of human intelligence, but is incredibly difficult to create and does not exist in any meaningful or material form today. The creation of General AI systems is the ultimate goal for AI researchers, who reserve the term General AI for machines that are capable of experiencing consciousness.

Applied AI, also known as Weak AI or Narrow AI is the most prevalent form of AI today. Unlike General AI, Applied AI exhibits very limited and very specific characteristics of human intelligence, but can incredibly effective when applied to very specific types of problems. Concrete examples of Applied AI include facial recognition systems, personal digital assistants, spam filters and autonomous vehicles.

What is Machine Learning?

Arthur Samuel popularized the term Machine Learning (ML) in 1959 as a field of computer science that gives computers the ability to learn without being explicitly programmed, and specifically focuses on the design and construction of algorithms that can both learn from and make predictions on data. Machine Learning is an approach towards achieving AI. A more formal definition of Machine Learning was produced by Tom Mitchell in 1998:

A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

Simply put, if a computer program can improve the performance of a task by using previous experience you can say that the program has learned from the experience.

The ability for software to learn and improve from experience is significantly different from how traditional software is written. ML enables the development new types of applications that would either be impractical or impossible if the same application were to be programmed explicitly using traditional software engineering approaches.

Machine Learning Algorithmic Strategies

There are generally considered to be three fundamental algorithmic strategies, or algorithm “families” in machine learning. In this section we’ll provide an overview of how these strategies work along some insight into the problems they intend to solve.

Supervised Learning: Algorithms that learn to recognize relationships

Supervised learning is a machine learning strategy that uses existing pairs of input and output data with an expectation or hypothesis that some form of relationship exists between the input and output. A supervised learning algorithm trained with a set of input and output pairs discovers relationships between the inputs and outputs, and subsequently makes predictions with novel inputs that it has not been exposed to before.

Spam filtering is a practical application of supervised learning – in this example, a data pair would consist of an email as an input and an assessment (i.e. spam or legitimate) as an output.

A spam filtering algorithm needs to be trained with sets of emails that have already been assessed and assigned as either spam or legitimate - data pairs that contain defined or assigned output is referred to as labelled data. It is the use of labelled data that characterizes supervised learning, and differentiates it from unsupervised and reinforcement learning. In fact, supervised learning algorithms must be trained with labelled data in order to act as useful predictive tools.

Supervised learning is effective at solving problems where you cannot meaningfully or accurately express the relationship(s) in your data with a discrete set of rules or heuristics. While you might quickly decide whether a new email in your inbox is spam or not, it’s difficult to develop traditional software with an explicit and concrete set of rules to achieve the same outcome. Modern supervised machine learning algorithms offer a practical algorithmic solution to these problem spaces.

Unsupervised Learning: Algorithms that learn to discover patterns

Unsupervised learning is an effective strategy for solving discovery problems; these algorithms attempt to discover novel patterns or structures in data that would be impractical or even impossible using traditional programming techniques.

The discovery of patterns is primarily used to group, cluster or segment data. For example, unsupervised learning is a highly effective strategy for image recognition. Unsupervised learning algorithms won’t make predictions like “this is an image of a cat”, “this is an image of a dog” or “this is an image of a person”. Instead, the algorithm will predict the most likely group an image belongs to, based on the patterns discovered in the image. The resulting clusters or groups of images that come out of an unsupervised learning algorithm are unlabeled; it would be up to a person to label the clusters as images of dogs, cats and people.

Reinforcement Learning: Adaptive algorithms that learn from their environment

Reinforcement learning is a modern machine learning strategy that makes use of goal-oriented algorithms which learn how to accomplish an objective over many steps. For example, reinforcement learning algorithms are well suited to maximize the points won in a game over many moves. These algorithms are penalized when they make the wrong decisions and rewarded when they make the right ones, which is why these types of algorithms are called reinforcement learning algorithms.

Reinforcement learning algorithms are not provided with training sets ahead of time. In the absence of training data, these algorithms learn from experience by collecting data from their environment through trial-and-error; they observe and dynamically adapt to the environment around them.

In this respect, reinforcement learning algorithms share a similar data-driven learning approach as supervised learning algorithms; while supervised learning algorithms learn from data ahead of time, reinforcement learning algorithms learn from data as it becomes available in their environment.

The ability for software to dynamically learn from and adapt to changes in an environment makes reinforcement learning algorithms well suited for advanced problems spaces like autonomous vehicle design, robotics and supply chain optimization.

Machine Learning Predictive Models

Each machine learning algorithmic strategy uses one or more concrete predictive models - a predictive model is a specific solution to the statistical analysis of historical data to produce a model that predicts or identifies future behavior.

Supervised learning makes use of classification and regression models, while unsupervised learning makes use of clustering and dimensionality reduction models. This section provides a brief overview of each of these predictive models and the problems they aim to solve; you’ll learn more about these models as well as the algorithms that underpin them in upcoming chapters.

Classification: Predict a discrete value

Classification models work by predicting discrete values from well-defined, finite sets of possible outcomes. In other words, classification algorithms predict the best group for an object, given all of the possible groups an algorithm is allowed to choose from. For example, a spam filtering algorithm predicts which group an email best belongs to: a spam email group, or a legitimate email group. You can say that the goal of a spam filtering algorithm is to classify emails.

Regression: Predict a continuous value

Regression models work by predicting continuous values. A continuous value is a real value, such as an integer or floating point number. Regression algorithms will often predict quantities, such as amounts and sizes. For example, regression models may predict whether a house sells for a specific value, or if a stock to rises or falls to a specific price.

Clustering: Discover new patterns and relationships in your data

Clustering assigns objects to groups while ensuring that objects in different groups are not similar to each other. Clustering aims to discover the hidden patterns in, and structure of your data. Each object is described by a set features, or characteristics.

Dimensionality Reduction: Reduce the number of variables in your data

Dimensionality reduction is fairly popular and a has a number of practical applications within the domain of machine learning itself. Dimensionality reduction models reduce the number of features, variables or dimensions under consideration in statistical ML models. Feature reduction can be achieved by feature selection, where a subset of features are selected from the existing features, or by feature extraction, where features are extracted by combining existing features.