Question 1

Learning

Accepted Answer

Minimising Prediction Error

Question 2

RATS, MONKEYS, MACHINES

Accepted Answer

Learning happens when reality does not equal expectation

Question 3

Discrepancy

Accepted Answer

Prediction error
drives updates in knowledge and behaviour

Question 4

Rescorla Wagner Model

Accepted Answer

Classical Conditioning & Prediction error

Question 5

RW model - CC and Prediction error

Accepted Answer

Goal: learn associations between stimuli
Mechanism: update associative strength using prediction error (PE)

Question 6

RW equation

Accepted Answer

Change in associative strength
Salience of CS (HOW NOTICEABLE)
Learning rate related to US
Max associative strength (eg 1 is US present)
Current total strength (expectation)

Question 7

Prediction error

Accepted Answer

Outcomes:
Positive PE -> learning increases
Negative PE -> learning decreases
Zero PE -> No learning

Question 8

PE explains

Accepted Answer

acquisition
extinction
blocking
condition inhibition

Question 9

PE - acquisition

Accepted Answer

Learning is rapid when PE is large

Question 10

PE - extinction

Accepted Answer

CS without US -> Negative PE -> Decline in strength

Question 11

PE - blocking

Accepted Answer

Prior CS already predicts US -> New CS gains nothing

Question 12

PE - Conditioned inhibition

Accepted Answer

A CS predicts absence of US -> Negative Value

Question 13

Dopamine as Reward Prediction Error

Accepted Answer

VTA dopamine neurons encode PE
Dopamine activity matches RW model

Question 14

Pattern

Accepted Answer

Unexpected reward -> dopamine burst
Predicted reward -> dopamine fires at cue, not reward
Omitted reward -> dopamine dip
Surprise in timing -> dopamine response shifts

Question 15

Implications

Accepted Answer

Dopamine firing = biological prediction error
Supports both pavlovian and instrumental learning

Question 16

Addiction

Accepted Answer

hijacking dopamine PE -> Overlearning drug cues

Question 17

Temporal difference TD learning

Accepted Answer

from biology to AI

Question 18

TD equation

Accepted Answer

key points
sequential learning: update at every time step (not just trial end)
discount factor - how much future rewards matter
can explain dopamine shift to predictive cues over time

Question 19

TD equation
Comparison to RW

Accepted Answer

both use PE for learning
TD handles multi step learning and timing
RW updates once per trial, TD updates continuously

Question 20

Q learning - decision making in agents

Accepted Answer

Extends TD by adding addictions: learns how valuable each action is in a given state
Q learning equation

Question 21

Q learning equation key concepts

Accepted Answer

Q = value of taking action a in state s
max Q = best expected future reward from next state
learns optimal actions in complex uncertain moments

Question 22

Practical application

Accepted Answer

Grid world - agents learn to reach the goal while avoiding punishment
Agent's policy improves by adjusting Q values based on prediction errors

Question 23

Hull's goal gradient and links to AI

Accepted Answer

Hull
Rats speed up as they near reward
Motivation increases with proximity

Question 24

Connection to td/q learning

Accepted Answer

states/actions near reward have higher value
agents and animals both act more decisively near goal
applies to consumer behaviour too (eg loyalty cards)

Question 25

Rescorla Wagner

Accepted Answer

Focus is on stimulus - outcome
Learns from trial end PE
Updates - Once per trial

Question 26

TD learning

Accepted Answer

Focus on State values
Learns from step by step PE
Updates at each time step

Question 27

Q learning

Accepted Answer

Focus on state action values
Learns from step by step PE + future reward
Updates at each time step and action

Question 28

All models

Accepted Answer

Use prediction error as a signal to learn
Adjust internal expectations/values to improve future outcomes
Connected psychology (RW), neuroscience (dopamine) and AI (TD, Q)

Question 29

Learning

Accepted Answer

reducing surprise

Question 30

dopamine

Accepted Answer

brain's prediction error system

Question 31

td/q learning

Accepted Answer

ai's version of this biological strategy

Question 32

psychology -> neuroscience -> ai

Accepted Answer

a shared learning architecture

"Know" box contains:
Time elapsed:
Retries:

Tut 11 - AoL Aa