Introduction to Artificial Intelligence

Lecture 0: Artificial Intelligence

???

The elephant in the room. ChatGPT.

In november 2022, OpenAI released ChatGPT, a chatbot interface to GPT-3, a neural network trained on a very large corpus of text.

For the first time, the public got access through a simple web interface to a model that can generate text that is indistinguishable from human-written text. You can ask any question to the model, and it will answer, however complex or bizarre the question is.

For instance, in the video, I am asking ChatGPT to prepare a 1-day itinerary for a trip to Liège, Belgium, and it is answering me with a detailed itinerary, including the names of the places to visit, general information about the city, etc. I can also instruct ChatGPT to pretend that it knows all the hidden secrets of the city and revise its itinerary accordingly.

I am quite sure many of you have already used ChatGPT more or less seriously, so there is no need to introduce it further. However, I believe we can all agree that this is a very impressive technology and that it marks a milestone in the history of AI.

One simple idea:

] .kol-1-2[.center.width-70[]] ]

???

Despite its impressive performance and its apparent complexity, the underlying principle of ChatGPT is actually very simple.

The model is trained to guess the next word in a sentence. That's it.

It is the same principle that is used in your phone to autocomplete your messages, except that the model is much larger and has been trained on a much larger corpus of text.

What is interesting with the guess-the-next-word problem is that it is a very simple problem, but that it is also a very hard problem. It is simple because it is easy to understand, but it is hard because there are many possible next words. It is also hard because the context of a word is not always enough to predict the next word. Sometimes, we need to know more about the world to make a prediction.

In the 1960s, Armstrong ____

???

Ambiguous! Louis Armstrong or Neil Armstrong?

Possible completions: played, sang, walked, flew

In the 1960s, Armstrong performed ___

???

Now leaning towards Louis Armstrong.

Possible completions: jazz, music, trumpet solos, ...

In the 1960s, Armstrong performed a moonwalk ___

???

Twist!

Possible completions: on stage, during a concert, in a jazz club, ...

In the 1960s, Armstrong performed a moonwalk on the ___

???

Dramatic shift of context!

Most likely completion: moon

In the 1960s, Armstrong performed a moonwalk on the lunar ___

???

Further narrowing down the context!

Possible completions: surface, landscape, terrain, ...

In the 1960s, Armstrong performed a moonwalk on the lunar surface 
and said ___

???

Very specific context!

Possible completions: "That's one small step for man, one giant leap for mankind."

This explains why large language models ...

invent things and cannot cite sources;
never produce the same answers;
cannot count, compute, or reason*;
can hardly correct their own mistakes once they have been made.

.center.width-100[]

???

Fortunately, each sentence contains many guess-the-next-word problems that can be used to train a transformer. [For instance, the sentence "John plays the piano and Paul plays the ___" contains 8 instances of the problem, one for each word.]

Contrary to what we used to face in deep learning, acquiring data is therefore cheap and easy. There is no need to manually label the data, which is a tedious and expensive process. Instead, collecting data is as simple as downloading a large corpus of text from the internet.

For this reason, big tech companies have been able to train transformers on very large corpora of text, literally large parts of the internet counting hundreds of billions of tokens, which only they can collect.

.center.width-90[]

???

Alright, so we have a transformer that has been trained on a large corpus of text. How do we use it?

After pre-training, a transformer is not very useful by itself just yet. It is like a dog that has been trained to guess the next word, but that cannot do anything else. To be useful, transformers must be fine-tuned or instructed to answer questions.

To do so, we can use a technique called prompt engineering or in-context learning, which consists in writing a prompt, a short text that contains instructions of a task to solve and some context. The prompt is then fed to the transformer, which is used to complete the prompt. The continuation of the prompt, obtained by guessing the next words, is likely to be consistent with the instructions and to answer the question. That's it!

As an example, the prompt in the slide can be used to instruct a transformer to translate English to French. The prompt contains a description of the task, together with a few examples of translations. The transformer is then used to complete the prompt, which results in a translation.

Similarly, ChatGPT and other chatbots are transformers that are repurposed to answer questions by using prompts that instruct them to answer questions, in the form of a conversation.

Not just text, but also images and sounds.

Artificial Intelligence

"With artificial intelligence we are summoning the demon" -- Elon Musk, 2014.

???

Triggers a rich imagination fueled by science-fiction.

"We're really closer to a smart washing machine than Terminator" -- Fei-Fei Li, Director of Stanford AI Lab, 2017.

???

The reality is quite different...

Yann LeCun, 2018.

Geoffrey Hinton, 2023.

Yann LeCun, 2023.

A definition of AI?

.center.circle.width-40[]

.center["Artificial intelligence is the science of making machines do things that would require intelligence if done by men." -- Marvin Minsky, 1968.]

???

But what is intelligence anyway?

The Turing test

A computer passes the Turing test (aka the Imitation Game) if a human operator, after posing some written questions, cannot tell whether the written responses come from a person or from a computer.

.grid[ .kol-2-3[ .width-80.center[
] ] .kol-1-3.center[ .width-100.circle[] .caption[Can machines think?
(Alan Turing, 1950)] ] ]

???

The Turing test is an operational definition of intelligence.

An agent would not pass the Turing test without the following requirements:

natural language processing
knowledge representation
automated reasoning
machine learning
computer vision (total Turing test)
robotics (total Turing test)

Despite being proposed almost 70 years ago, the Turing test is still relevant today.

The Turing test tends to focus on human-like errors, linguistic tricks, etc.

However, it seems more important to study the principles underlying intelligence than to replicate an exemplar.

.center.width-80[]

Aeronautics is not defined as the field of making machines
that fly so exactly like pigeons that they can fool even other pigeons.

A modern definition of AI

An ‘AI system’ is a machine-based system that is designed to operate with varying levels of autonomy and that may exhibit adaptiveness after deployment, and that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. -- European AI Act, Article 3, 2024.

???

Very broad definition!

Does encompass modern deep learning models, but also many other systems.

Is a thermostat an AI system?

A short history of AI

1940-1950: Early days

1943: McCulloch and Pitts: Boolean circuit model of the brain.
1950: Turing's "Computing machinery and intelligence".

1950-1970: Excitement and expectations

1950s: Early AI programs, including Samuel's checkers program, Newell and Simon's Logic Theorist and Gelernter's Geometry Engine.
1956: Dartmouth meeting: "Artificial Intelligence" adopted.
1958: Rosenblatt invents the perceptron.
1965: Robinson's complete algorithm for logical reasoning.
1966-1974: AI discovers computational complexity.

.width-60.center[]

The Darthmouth workshop (1956)

.italic[The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.]

1970-1990: Knowledge-based approaches

1969: Neural network research almost disappears after Minsky and Papert's book (1st AI winter).
1969-1979: Early development of knowledge-based systems.
1980-1988: Expert systems industrial boom.
1988-1993: Expert systems industry busts (2nd AI winter).

1990-Present: Statistical approaches

1985-1995: The return of neural networks.
1988-: Resurgence of probability, focus on uncertainty, general increase in technical depth.
1995-2010: New fade of neural networks.
2000-: Availability of very large datasets.
2010-: Availability of fast commodity hardware (GPUs).
2012-: Resurgence of neural networks with deep learning approaches.
2017: Attention is all you need (transformers).
2022: ChatGPT released to the public.

The deep learning revolution

.center.width-40[]

???

When you start a project in artificial intelligence or machine learning, one of the very first steps, and this is something I keep repeating to my students, is to look at the data. Take the raw data and visualize it.

The data I want to start with today is related to blood pressure. We have a dataset of 30 patients, with their age and their blood pressure shown as points on the plot. This is a very simple dataset, and the problem is not very interesting in itself, but it is a good example to illustrate how we can use machine learning to make predictions.

Let's start with a simple question: can we predict the blood pressure of a patient based on his or her age?

In other words, can we write a computer program that, given the age of a patient, will make a guess about his or her blood pressure?

???

The machine learning approach to this problem is not hardcode some made-up computer program that would take the age of a patient and return a blood pressure. Instead, the machine learning approach is to write a computer program that will learn to make predictions by itself, by looking at the data.

???

To do so, we need to define a model, a mathematical function that will take the age of a patient as input and return a prediction of his or her blood pressure.

Often, this model is a function with parameters. The parameters are the knobs of the model, and we will tune them to make the best predictions.

The simplest example is the linear model, which is a function of the form y = ax + b, where a and b are the parameters to be learned, the two knobs that we can turn to change the behavior of the model.

To find the best parameter values, we will use the data to train the model. This means that we will show the model the data, make it make predictions, and then adjust its parameters to reduce the error between its predictions and the true blood pressure values.

This process can be described mathematically and implemented in a computer program. This is what we call the training of the model.

Deep learning .bold[scales up] the statistical and machine learning approaches by

using larger models known as neural networks,
training on larger datasets,
using more compute resources.

???

[Talk about the slide first.]

Scaling up the statistical and machine learning approaches by brute force in these three dimensions has been key to the success of deep learning.

Specialized neural networks can be trained achieve super-human performance on many complex tasks that were previously thought to be out of reach for machines.

.center[(Top) Scene understanding, pose estimation, geometric reasoning.
(Bottom) Planning, Image captioning, Question answering.]

???

Following this approach, deep learning has been successful in tasks that were previously considered hard for computers, such as understanding images, speech, or text.

In particular, specialized neural networks can be trained to solve a large variety of problems, from scene understanding to geometric reasoning, from planning to question answering.

While these problems can be perceived as artificial and not really important in their own right, they actually form a set of primitive tasks that are found in many domains of application.

Neural networks form .bold[primitives] that can be transferred to many domains.

.grid[ .kol-1-3.center.width-100[] .kol-1-3.center.width-80[] .kol-1-3.center.width-80[] ] .width-100[]

.center[(Top) Analysis of histological slides, denoising of MRI images, nevus detection.
(Bottom) Whole-body hemodynamics reconstruction from PPG signals.]

???

For example, in health and medicine, the same specialized neural networks that are used to annotate scenes can be used to analyze biomedical images, such as histological slides.

Specialized neural networks can also be used to denoise MRI images, to detect nevus, or to reconstruct whole-body hemodynamics from PPG signals, if some of you have an Apple Watch.

As a matter of fact, the adoption of AI and deep learning in health and medicine has been growing steadily over the past decade, with many applications in medical imaging, genomics, and many more.

These applications however, are often deeply embedded in the tools used by healthcare professionals, and are not always visible to the public.

The breakthrough

Vaswani et al., 2017.]] .kol-1-2[.width-100[]] ]

A brutal simplicity:

The more data, the better the model.
The more parameters, the better the model.
The more compute, the better the model.

Scaling up further to gigantic models, datasets, and compute resources keeps pushing the boundaries of what is possible, .bold[with no sign of slowing down].

Conversational AI assistants (Anthropic, 2024)

Code assistants (Cursor, 2024)

Object detection, pose estimation, segmentation (Meta AI, 2023)

Autonomous cars (Waymo, 2022)

Powering the future of clean energy (NVIDIA, 2023)

How AI is advancing medicine (Google, 2018)

Deep learning can also .bold[solve problems that no one could solve before].

???

Beyond the basic work that can be automated, the most exciting applications of AI, at least for the scientist in me, is the fact that deep learning can also be used to solve problems that no one could solve before. To make discoveries.

I have many examples in mind, but I will only mention a few today, to give you a sense of what is possible. I will focus on health and medicine, but the same is true in many other domains.

AlphaFold: From a sequence of amino acids to a 3D structure

???

The first example is AlphaFold, a neural network based on the trasnformer architecture that can predict the 3D structure of a protein from its amino acid sequence.

This problem is important because the 3D structure of a protein determines its function, and understanding protein function is key to understanding biology and designing new drugs.

However, determining the 3D structure of a protein experimentally is difficult and expensive, taking up to months just to solve a single structure.

AlphaFold has been a breakthrough in this area, and has been able to predict the 3D structure of proteins with high accuracy, in just a couple of minutes for the longest sequences.

AI for Science (Deepmind, AlphaFold, 2020)

Drug discovery with graph neural networks

.center.width-80[]

???

A second example is the use of graph neural networks to discover new drugs.

Discovering new drugs is a complex and expensive search problem, where the goal is to find molecules that will bind to a target protein and modulate its function. Unfortunately, this problem is difficult for two reasons:

first, the search space is huge -- the space all possible pharmacologically active molecules is estimated to be in the order of 10^60 molecules.
second, the binding of a molecule to a protein is a complex process that is difficult to model. Laboratory experiments are necessary to evaluate the binding of a molecule to a protein, and these experiments are expensive and time-consuming.

Graph neural networks have been a breakthrough in this area, and have been able to predict the properties of molecules with high accuracy.

In a sense, they can serve as a virtual laboratory that can be used to pre-screen millions of molecules in a matter of hours, thereby reducing the laboratory work to only the most promising candidates.

GraphCast: fast and accurate weather forecasts

.center.width-75[]

What is missing?

Intelligence is not just about pattern recognition, which is something most of these works are based on.

It is about modeling the world:

explaining and understanding what we see;
imagining things we could see but haven't yet;
problem solving and planning actions to make these things real;
building new models as we learn more about the world.

INFO8006 Introduction to AI

Course outline

Lecture 0: Artificial intelligence
Lecture 1: Intelligent agents
Lecture 2: Solving problems by searching
Lecture 3: Adversarial search
Lecture 4: Quantifying uncertainty
Lecture 5: Probabilistic reasoning
Lecture 6: Reasoning over time
Lecture 7: Machine learning and neural networks
Lecture 8: Making decisions
Lecture 9: Reinforcement learning

My mission

By the end of this course, you will have built autonomous agents that efficiently make decisions in fully informed, partially observable and adversarial settings. Your agents will draw inferences in uncertain and unknown environments and optimize actions for arbitrary reward structures.

The models and algorithms you will learn in this course apply to a wide variety of artificial intelligence problems and will serve as the foundation for further study in any application area (from engineering and science, to business and medicine) you choose to pursue.

Goals and philosophy

Understand the landscape of artificial intelligence.
Be able to write from scratch, debug and run (some) AI algorithms.

Good old-fashioned AI: well-established algorithms for intelligent agents and their mathematical foundations.
Introduction to materials new from research ($\leq$ 5 years old).
Understand some of the open questions and challenges in the field.

Fun and challenging course projects.

Going further

This course is designed as an introduction to the many other courses available at ULiège and (broadly) related to AI, including:

INFO8006: Introduction to Artificial Intelligence $\leftarrow$ you are there
DATS0001: Foundations of Data Science
ELEN0062: Introduction to Machine Learning
INFO8010: Deep Learning
INFO8004: Advanced Machine Learning
INFO9023: Machine Learning Systems Design
INFO8003: Optimal decision making for complex problems
INFO0948: Introduction to Intelligent Robotics
INFO9014: Knowledge representation and reasoning
ELEN0016: Computer vision

???

Mention pre-requisites:

programming experience
probability theory

The end.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lecture0.md

lecture0.md

Introduction to Artificial Intelligence

Artificial Intelligence

A definition of AI?

The Turing test

A modern definition of AI

A short history of AI

1940-1950: Early days

1950-1970: Excitement and expectations

The Darthmouth workshop (1956)

1970-1990: Knowledge-based approaches

1990-Present: Statistical approaches

The deep learning revolution

The breakthrough

AlphaFold: From a sequence of amino acids to a 3D structure

Drug discovery with graph neural networks

GraphCast: fast and accurate weather forecasts

What is missing?

INFO8006 Introduction to AI

Course outline

My mission

Goals and philosophy

Going further

Files

lecture0.md

Latest commit

History

lecture0.md

File metadata and controls

Introduction to Artificial Intelligence

Artificial Intelligence

A definition of AI?

The Turing test

A modern definition of AI

A short history of AI

1940-1950: Early days

1950-1970: Excitement and expectations

The Darthmouth workshop (1956)

1970-1990: Knowledge-based approaches

1990-Present: Statistical approaches

The deep learning revolution

The breakthrough

AlphaFold: From a sequence of amino acids to a 3D structure

Drug discovery with graph neural networks

GraphCast: fast and accurate weather forecasts

What is missing?

INFO8006 Introduction to AI

Course outline

My mission

Goals and philosophy

Going further