class: middle, center, title-slide
Lecture 1: Intelligent agents
Prof. Gilles Louppe
[email protected]
class: middle
.grid[
.kol-1-5.center[
Percepts
Actions
???
Roomba example
class: middle
-
An agent is an entity that perceives its environment through sensors and take actions through actuators.
-
The agent behavior is described by its policy, a function
$$\pi : \mathcal{P}^* \to \mathcal{A}$$ that maps percept sequences to actions.
class: middle
Let us consider a 2-cell world with a Pacman agent.
- Percepts: location and content, e.g.
$(\text{left cell}, \text{no food})$ - Actions:
$\text{go left}$ ,$\text{go right}$ ,$\text{eat}$ ,$\text{do nothing}$
???
Take the time to explain that the game is here different from the actual Pacman.
class: middle
The policy of a Pacman agent is a function that maps percept sequences to actions. It can be implemented as a table.
Percept sequence | Action |
---|---|
(...) | (...) |
class: middle, center, black-slide
What about the actual Pacman?
???
Run the program!
class: middle
What is the optimal agent policy?
How to even formulate the goal of Pacman?
- 1 point per food dot collected up to time
$t$ ? - 1 point per food dot collected up to time
$t$ , minus one per move? - penalize when too many food dots are left not collected?
Can it be implemented in a small and efficient agent program?
- A performance measure evaluates a sequence of environment states caused by the agent's behavior.
- A rational agent is an agent that chooses whichever action that maximizes the expected value of the performance measure, given the percept sequence to date.
.alert[Rationality only concerns .bold[what] decisions are made (not the thought process behind them, human-like or not).]
.footnote[Credits: CS188, UC Berkeley.]
class: middle
In this course, Artificial intelligence = Maximizing expected performance ]
.footnote[Credits: CS188, UC Berkeley.]
???
Underline the importance of expected.
class: middle
- Rationality
$\neq$ omniscience- percepts may not supply all relevant information.
- Rationality
$\neq$ clairvoyance- action outcomes may not be as expected.
- Hence, rational
$\neq$ successful. - However, rationality leads to exploration, learning and autonomy.
The characteristics of the performance measure, environment, action space and percepts dictate approaches for selecting rational actions. They are summarized as the task environment.
- performance measure: win, draw, lose, ...
- environment: chess board, opponent, ...
- actuators: move pieces, ...
- sensors: board state, opponent moves, ...
class: middle
- performance measure: safety, destination, legality, comfort, ...
- environment: streets, highways, traffic, pedestrians, weather, ...
- actuators: steering, accelerator, brake, horn, speaker, display, ...
- sensors: video, accelerometers, gauges, engine sensors, GPS, ...
- performance measure: patient health, cost, time, ...
- environment: patient, hospital, medical records, ...
- actuators: diagnosis, treatment, referral, ...
- sensors: medical records, lab results, ...
Fully observable vs. partially observable
Whether the agent sensors give access to the complete state of the environment, at each point in time.
Deterministic vs. stochastic
Whether the next state of the environment is completely determined by the current state and the action executed by the agent.
Episodic vs. sequential
Whether the agent's experience is divided into atomic independent episodes.
Static vs. dynamic
Whether the environment can change, or the performance measure can change with time.
class: middle
Discrete vs. continuous
Whether the state of the environment, the time, the percepts or the actions are continuous.
Single agent vs. multi-agent
Whether the environment include several agents that may interact which each other.
Known vs unknown
Reflects the agent's state of knowledge of the "law of physics" of the environment.
class: middle
Are the following task environments fully observable? deterministic? episodic? static? discrete? single agents? Known?
- Crossword puzzle
- Chess, with a clock
- Poker
- Backgammon
- Taxi driving
- Medical diagnosis
- Image analysis
- Part-picking robot
- Refinery controller
- The real world
class: middle, center, black-slide
What about Pacman?
Our goal is to design an agent program that implements the agent policy.
Agent programs can be designed and implemented in many ways:
- with tables
- with rules
- with search algorithms
- with learning algorithms
Reflex agents ...
- choose an action based on current percept (and maybe memory);
- may have memory or model of the world's current state;
- do not consider the futur consequences of their actions.
.center.width-50[![](figures/lec1/reflex-agent-cartoon.png)]
.footnote[Credits: CS188, UC Berkeley.]
class: middle
???
Solution to huge tables: forget about the past!
Compress them using condition-action rules.
class: middle
- Simple reflex agents select actions on the basis of the current percept, ignoring the rest of the percept history.
- They implement condition-action rules that match the current percept to an action. Rules provide a way to compress the function table.
- They can only work in a Markovian environment, that is if the correct decision can be made on the basis of only the current percept. In other words, if the environment is fully observable.
???
Example (autonomous car): If a car in front of you slow down, you should break. The color and model of the car, the music on the radio or the weather are all irrelevant.
class: middle
???
Solution: do not actually forget about the past. Remember what you have seen so far by maintaining an internal representation of the world, a belief state.
Then map this state to an action.
class: middle
- Model-based agents handle partial observability of the environment by keeping track of the part of the world they cannot see now.
- The internal state of model-based agents is updated on the basis of a model which determines:
- how the environment evolves independently of the agent;
- how the agent actions affect the world.
Planning agents ...
- ask "what if?";
- make decisions based on (hypothesized) consequences of actions;
- must have a model of how the world evolves in response to actions;
- must formulate a goal.
.center.width-50[![](figures/lec1/plan-agent-cartoon.png)]
.footnote[Credits: CS188, UC Berkeley.]
class: middle
???
It is not easy to map a state to an action because goals are not explicit in condition-action rules.
class: middle
- Decision process:
- generate possible sequences of actions
- predict the resulting states
- assess goals in each.
- A goal-based agent chooses an action that will achieve the goal.
- More general than rules. Goals are rarely explicit in condition-action rules.
- Finding action sequences that achieve goals is difficult. Search and planning are two strategies.
class: middle
???
Often there are several sequences of actions that achieve a goal. We should pick the best.
class: middle
- Goals are often not enough to generate high-quality behavior. Goals only provide binary assessment of performance.
- A utility function scores any given sequence of environment states.
- The utility function is an internalization of the performance measure.
- A rational utility-based agent chooses an action that maximizes the expected utility of its outcomes.
???
Example (autonomous car): There are many ways to arrive to destination, but some are quicker or more reliable.
.center.width-80[![](figures/lec1/learning-agent.svg)]
class: middle
- Learning agents are capable of self-improvement. They can become more competent than their initial knowledge alone might allow.
- They can make changes to any of the knowledge components by:
- learning how the world evolves;
- learning what are the consequences of actions;
- learning the utility of actions through rewards.
class: middle
- Performance element:
- The current system for selecting actions and driving.
- The critic observes the world and passes information to the learning element.
- E.g., the car makes a quick left turn across three lanes of traffic. The critic observes shocking language from the other drivers and informs bad action.
- The learning element tries to modifies the performance element to avoid reproducing this situation in the future.
- The problem generator identifies certain areas of behavior in need of improvement and suggest experiments.
- E.g., trying out the brakes on different surfaces in different weather conditions.
- An agent is an entity that perceives and acts in an environment.
- The performance measure evaluates the agent's behavior. Rational agents act so as to maximize the expected value of the performance measure.
- Task environments includes performance measure, environment, actuators and sensors. They can vary along several significant dimensions.
- The agent program effectively implements the agent function. Their designs are dictated by the task environment.
- Simple reflex agents respond directly to percepts, whereas model-based reflex agents maintain internal state to track the world. Goal-based agents act to achieve goals while utility-based agents try to maximize their expected performance.
- All agents can improve their performance through learning.
???
Next week we will see how to implement our first agent, a goal-based agent based on search algorithms.
class: end-slide, center count: false
The end.