Save
Upgrade to remove ads
Busy. Please wait.
Log in with Clever
or

show password
Forgot Password?

Don't have an account?  Sign up 
Sign up using Clever
or

Username is available taken
show password


Make sure to remember your password. If you forget it there is no way for StudyStack to send you a reset link. You would need to create a new account.
Your email address is only used to allow you to reset your password. See our Privacy Policy and Terms of Service.


Already a StudyStack user? Log In

Reset Password
Enter the associated with your account, and we'll email you a link to reset your password.
focusNode
Didn't know it?
click below
 
Knew it?
click below
Don't Know
Remaining cards (0)
Know
0:00
Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

  Normal Size     Small Size show me how

Tut 11 - AoL Aa

TermDefinition
Learning Minimising Prediction Error
RATS, MONKEYS, MACHINES Learning happens when reality does not equal expectation
Discrepancy Prediction error drives updates in knowledge and behaviour
Rescorla Wagner Model Classical Conditioning & Prediction error
RW model - CC and Prediction error Goal: learn associations between stimuli Mechanism: update associative strength using prediction error (PE)
RW equation Change in associative strength Salience of CS (HOW NOTICEABLE) Learning rate related to US Max associative strength (eg 1 is US present) Current total strength (expectation)
Prediction error Outcomes: Positive PE -> learning increases Negative PE -> learning decreases Zero PE -> No learning
PE explains acquisition extinction blocking condition inhibition
PE - acquisition Learning is rapid when PE is large
PE - extinction CS without US -> Negative PE -> Decline in strength
PE - blocking Prior CS already predicts US -> New CS gains nothing
PE - Conditioned inhibition A CS predicts absence of US -> Negative Value
Dopamine as Reward Prediction Error VTA dopamine neurons encode PE Dopamine activity matches RW model
Pattern Unexpected reward -> dopamine burst Predicted reward -> dopamine fires at cue, not reward Omitted reward -> dopamine dip Surprise in timing -> dopamine response shifts
Implications Dopamine firing = biological prediction error Supports both pavlovian and instrumental learning
Addiction hijacking dopamine PE -> Overlearning drug cues
Temporal difference TD learning from biology to AI
TD equation key points sequential learning: update at every time step (not just trial end) discount factor - how much future rewards matter can explain dopamine shift to predictive cues over time
TD equation Comparison to RW both use PE for learning TD handles multi step learning and timing RW updates once per trial, TD updates continuously
Q learning - decision making in agents Extends TD by adding addictions: learns how valuable each action is in a given state Q learning equation
Q learning equation key concepts Q = value of taking action a in state s max Q = best expected future reward from next state learns optimal actions in complex uncertain moments
Practical application Grid world - agents learn to reach the goal while avoiding punishment Agent's policy improves by adjusting Q values based on prediction errors
Hull's goal gradient and links to AI Hull Rats speed up as they near reward Motivation increases with proximity
Connection to td/q learning states/actions near reward have higher value agents and animals both act more decisively near goal applies to consumer behaviour too (eg loyalty cards)
Rescorla Wagner Focus is on stimulus - outcome Learns from trial end PE Updates - Once per trial
TD learning Focus on State values Learns from step by step PE Updates at each time step
Q learning Focus on state action values Learns from step by step PE + future reward Updates at each time step and action
All models Use prediction error as a signal to learn Adjust internal expectations/values to improve future outcomes Connected psychology (RW), neuroscience (dopamine) and AI (TD, Q)
Learning reducing surprise
dopamine brain's prediction error system
td/q learning ai's version of this biological strategy
psychology -> neuroscience -> ai a shared learning architecture
Created by: brendonpizarro1
 

 



Voices

Use these flashcards to help memorize information. Look at the large card and try to recall what is on the other side. Then click the card to flip it. If you knew the answer, click the green Know box. Otherwise, click the red Don't know box.

When you've placed seven or more cards in the Don't know box, click "retry" to try those cards again.

If you've accidentally put the card in the wrong box, just click on the card to take it out of the box.

You can also use your keyboard to move the cards as follows:

If you are logged in to your account, this website will remember which cards you know and don't know so that they are in the same box the next time you log in.

When you need a break, try one of the other activities listed below the flashcards like Matching, Snowman, or Hungry Bug. Although it may feel like you're playing a game, your brain is still making more connections with the information to help you out.

To see how well you know the information, try the Quiz or Test activity.

Pass complete!
"Know" box contains:
Time elapsed:
Retries:
restart all cards