Question 1

What would it mean for a machine to pass the Turing Test?

Accepted Answer

That a human cannot distinguish the difference bewteen that machine and another human over a conversation

Question 2

What is the primary advantage of the rational agent approach for the purpose of science research?

Accepted Answer

It provides a framework for creating machines whose "intelligence" is measurable and comparable

Question 3

Following Moravec's Paradox, which of these tasks would require the most computational power? 
a. A program that solves tic-tac-toe
b. A machine that can pick up objects using eating utensils
c. A program that can solve math eqns

Accepted Answer

A machine that can pick up objects using eating utensils

Question 4

What is Moravec’s Paradox?

Accepted Answer

Moravec’s Paradox highlights that AI systems easily outperform humans on abstract, logic‑based tasks (like chess or arithmetic) yet struggle  with “simple” skills (like recognizing objects ) that evolution has optimized over millions of years.

Question 5

What is the difference between Tree-Search Algorithm and Graph-Search Algorithm?

Accepted Answer

Graph Search algorithms keep track of explored nodes to prevent cycles

Question 6

T or F: A* search is always optimal

Accepted Answer

False

Question 7

In what year was the field of AI officially founded?

Accepted Answer

1956

Question 8

Which of the following best describes how leading textbooks define the study of AI?

Accepted Answer

The study of Intelligent Agents

Question 9

What are the inputs to a search problem?
a. Initial State
b. Goal State
c. Action cost function 
d. Transition Model 
e. All of the above

Accepted Answer

initial state, goal state, action cost function,  transition model (all of the above)

Question 10

T or F: Adversarial Search is exactly the same as performing regular search in a multi-agent environment?

Accepted Answer

False

Question 11

What is the name of the computer that famously defeated world champion Gary Kasparov in chess in 1997?

Accepted Answer

Deep Blue

Question 12

What are the space and time complexities of BFS?

Accepted Answer

Space: Exponential, Time: Exponential

Question 13

What is the main way that graph search algorithms differ?

Accepted Answer

By how they select the next node for exploration

Question 14

Which element of search algorithms is unique to informed search strategies?

Accepted Answer

Heuristic function

Question 15

WHat is the output of a general search algorithm?

Accepted Answer

A sequence of actions to reach the goal state

Question 16

What is the key difference between a goal-based agent and a utility-based agent?

Accepted Answer

A goal-based agent selects action to achieve a specific goal, while a utility-based agent selects actions based on how desirable they ate

Question 17

In what way does online search demonstrate an example of machine learning?

Accepted Answer

The agent learns the transition model of a problem after completing online search

Question 18

Which of the following is an example of a pruning algorithm?

Accepted Answer

Alpha-beta

Question 19

In bline search problem formulation, how is the goal test changed?

Accepted Answer

the goal test checks if all states in the beliefe state are the goal state

Question 20

Which of the following tasks would always involve non-deterministic actions?

Accepted Answer

off-road driving

Question 21

Which of the following best describes the Physical Symbol System Hypothesis?

Accepted Answer

Any system capable of general intelligence must operate on sybmols and symbolic manipulation

Question 22

Which best describes the argument in Rodney Brook's 1990 influential paper "Elephants Don't Play Chess"?

Accepted Answer

Elephants don't play chess but that does not mean they are unintelligent and lack behavior worth studying

Question 23

In the field of AI, what are the two necessary components of an agent?

Accepted Answer

Sensors and Actuators

Question 24

h(n) = 0 is this heuristic function admissible given non-negative edge weights?

Accepted Answer

True

Question 25

What are the four environment assumptions needed to execute proper execution of "pure" search algorithms?

Accepted Answer

Deterministic, Observable, Discrete, Known

Question 26

What conditions must be met in order for a reflex agent to be rational?

Accepted Answer

A reflex agent is rational if it selects actions that maximizes its performance measure based on current percept and knowledge, given its environment and actions. 
ex. thermostat controlling heating system is rational if it turns on when temp is below

Question 27

what is the critical assumption of the minimax algorithm? If that assumption were not true, would the algorithm still yield the optiomal action to take?

Accepted Answer

- the critical assumption is that your apponent (MIN) will take the optimal move at every step to minimize MAX's score
- if this is not true, the algorithm may not yield optimal actions

Question 28

what is the backpropogation algorithm used for?

Accepted Answer

to find incremental adjustments to make to all the weights in a neural network

Question 29

in general, how do we optimize an ML model?

Accepted Answer

- Minimize a loss function with respect to the parameters of the network
-  Minimize a cost function with respect to the parameters of the network

Question 30

Gradient Descent is a general algorithm to do what?

Accepted Answer

Find the minimim of a function

Question 31

Why is a step function not an ideal activation function for a neural network?

Accepted Answer

Its not differentiable

Question 32

T or F: Linear regression can be applied only on 2 dimensional data

Accepted Answer

false

Question 33

which best describes a loss function?

Accepted Answer

a measure of the imperfection of our prediction

Question 34

in the contect of Machine learning, which best describes a Validation Set?

Accepted Answer

A partition of data used to tune hyper-parameters of a model before testing

Question 35

what does alpha describe in the gradient descent update equation?

Accepted Answer

the learning rate

Question 36

in the context of machine learning, which statement best describes the purpose of a hypothesis function h(x)?

Accepted Answer

h(x) is an estimate function of the underlying tre function f(x) which relates features, x to labels, y

Question 37

which best describes the intuition for k-nearest neighbors classifications?

Accepted Answer

- each nearby datapoint within a certain distance radius votes for the classification of a novel datapoint
--> set of datapoints closest to a novel datapoint vote for its classification

Question 38

which of the following best describes the process of selection a model for a machine learning  problem?

Accepted Answer

model selection involves experimentation influenced by the problem, data, and evaluation

Question 39

what is the backpropogation algorithm used to compute?

Accepted Answer

the influence that changing any weight of a neural network has on the prediction error

Question 40

what is a valid interpretation of a trained perceptron's output?

Accepted Answer

a binary linear classifier

Question 41

which of the following best describe the intuition behind linear regression?

Accepted Answer

a hyperplane through a set of datapoints using the residuals between estimate and actual output

Question 42

which of the following is not a category of ML?
a. unsupervised learning
b. reinforcement learning
c. supervised learning
d. harmonic learning

Accepted Answer

harmonic learning

Question 43

T or F: A neural network of sufficient size can in theory approximate any continuous function

Accepted Answer

true

Question 44

which of the following is a commonly used type of artificial neural network?

Accepted Answer

multi-layer perceptron

Question 45

in context of machine learning, which best describes the concept of Ockham's Razor?

Accepted Answer

given equal performance the least complex model is often preferred

Question 46

which describes a likely observation you could make on an overfit model?

Accepted Answer

it performs poorly on the validation data but well on the training data

Question 47

which of the following best describes the intuition behind a SVM?

Accepted Answer

a hyperplave divides data in such a way that maximizes the margin between the categories

Question 48

what is the main reason why a step function is not an ideal activation function for a neural network?

Accepted Answer

its not differentiable and thus incompatible with differential optimization techniques

Question 49

T or F: When approaching a ML problem, there's only one correct model to use to create accurate predictions

Accepted Answer

false

Question 50

for a perceptron, which of the following best describes a hard threshold activation?

Accepted Answer

a linear sum of weighted inputs is taken. if that sum exceeds a set value then the perceptron activates sending a fixed signal to its downstream connections

Question 51

Which best describes the "Kernel Trick"?

Accepted Answer

A trick to find a decision boundary in a different coordinate space

Question 52

T or F:  Multilayer neural networks can in theory predict any continous function

Accepted Answer

true

Question 53

T or F: a ML model that is overfit to a dataset will generlize well

Accepted Answer

false

Question 54

physical symbol system hypothesis:

Accepted Answer

the idea that manipulating symbols according to rules is sufficient for general intelligence. 
- top-down
- rule-based

Question 55

Which hypothesis underlies “Good Old-Fashioned AI”

Accepted Answer

Physical symbol system hypothesis

Question 56

In connectionist AI systems, “learning” primarily means:

Accepted Answer

Adjusting neural-network parameters from data

Question 57

Connectionist AI:

Accepted Answer

- bottom-up, inductive approach.
-  Systems learn rules and patterns directly from data (observations) rather than being explicitly programmed.

Question 58

Which approach is top-down and rule-based?
A. Symbolic AI
B. Connectionist AI
C. Reinforcement learning
D. Evolutionary algorithms

Accepted Answer

A. Symbolic AI

Question 59

Natural Language Processing (NLP) is primarily concerned with:
A. Visual scene understanding
B. Physical robot control
C. Understanding and generating human language
D. Optimizing search algorithms

Accepted Answer

C. Understanding and generating human language

Question 60

Computer Vision:

Accepted Answer

Enabling agents to "see" and interpret visual information.

Question 61

The chain rule in language modeling expresses

Accepted Answer

joint probability as a product of conditional probabilities

Question 62

A bigram model relies on which assumption?

Accepted Answer

P(wᵢ|w₁…wᵢ₋₁) ≈ P(wᵢ|wᵢ₋₁)
- a bigram is two units
ex. 'th', or 'the cat'
- Edge only from Wi−1​ to Wi​.

Question 63

Which n-gram model ignores word order entirely?

Accepted Answer

Unigram

Question 64

Corpus

Accepted Answer

Corpus: A collection of text or speech data used for analysis or training

Term	Definition
What would it mean for a machine to pass the Turing Test?	That a human cannot distinguish the difference bewteen that machine and another human over a conversation
What is the primary advantage of the rational agent approach for the purpose of science research?	It provides a framework for creating machines whose "intelligence" is measurable and comparable
Following Moravec's Paradox, which of these tasks would require the most computational power? a. A program that solves tic-tac-toe b. A machine that can pick up objects using eating utensils c. A program that can solve math eqns	A machine that can pick up objects using eating utensils
What is Moravec’s Paradox?	Moravec’s Paradox highlights that AI systems easily outperform humans on abstract, logic‑based tasks (like chess or arithmetic) yet struggle with “simple” skills (like recognizing objects ) that evolution has optimized over millions of years.
What is the difference between Tree-Search Algorithm and Graph-Search Algorithm?	Graph Search algorithms keep track of explored nodes to prevent cycles
T or F: A* search is always optimal	False
In what year was the field of AI officially founded?	1956
Which of the following best describes how leading textbooks define the study of AI?	The study of Intelligent Agents
What are the inputs to a search problem? a. Initial State b. Goal State c. Action cost function d. Transition Model e. All of the above	initial state, goal state, action cost function, transition model (all of the above)
T or F: Adversarial Search is exactly the same as performing regular search in a multi-agent environment?	False
What is the name of the computer that famously defeated world champion Gary Kasparov in chess in 1997?	Deep Blue
What are the space and time complexities of BFS?	Space: Exponential, Time: Exponential
What is the main way that graph search algorithms differ?	By how they select the next node for exploration
Which element of search algorithms is unique to informed search strategies?	Heuristic function
WHat is the output of a general search algorithm?	A sequence of actions to reach the goal state
What is the key difference between a goal-based agent and a utility-based agent?	A goal-based agent selects action to achieve a specific goal, while a utility-based agent selects actions based on how desirable they ate
In what way does online search demonstrate an example of machine learning?	The agent learns the transition model of a problem after completing online search
Which of the following is an example of a pruning algorithm?	Alpha-beta
In bline search problem formulation, how is the goal test changed?	the goal test checks if all states in the beliefe state are the goal state
Which of the following tasks would always involve non-deterministic actions?	off-road driving
Which of the following best describes the Physical Symbol System Hypothesis?	Any system capable of general intelligence must operate on sybmols and symbolic manipulation
Which best describes the argument in Rodney Brook's 1990 influential paper "Elephants Don't Play Chess"?	Elephants don't play chess but that does not mean they are unintelligent and lack behavior worth studying
In the field of AI, what are the two necessary components of an agent?	Sensors and Actuators
h(n) = 0 is this heuristic function admissible given non-negative edge weights?	True
What are the four environment assumptions needed to execute proper execution of "pure" search algorithms?	Deterministic, Observable, Discrete, Known
What conditions must be met in order for a reflex agent to be rational?	A reflex agent is rational if it selects actions that maximizes its performance measure based on current percept and knowledge, given its environment and actions. ex. thermostat controlling heating system is rational if it turns on when temp is below
what is the critical assumption of the minimax algorithm? If that assumption were not true, would the algorithm still yield the optiomal action to take?	- the critical assumption is that your apponent (MIN) will take the optimal move at every step to minimize MAX's score - if this is not true, the algorithm may not yield optimal actions
what is the backpropogation algorithm used for?	to find incremental adjustments to make to all the weights in a neural network
in general, how do we optimize an ML model?	- Minimize a loss function with respect to the parameters of the network - Minimize a cost function with respect to the parameters of the network
Gradient Descent is a general algorithm to do what?	Find the minimim of a function
Why is a step function not an ideal activation function for a neural network?	Its not differentiable
T or F: Linear regression can be applied only on 2 dimensional data	false
which best describes a loss function?	a measure of the imperfection of our prediction
in the contect of Machine learning, which best describes a Validation Set?	A partition of data used to tune hyper-parameters of a model before testing
what does alpha describe in the gradient descent update equation?	the learning rate
in the context of machine learning, which statement best describes the purpose of a hypothesis function h(x)?	h(x) is an estimate function of the underlying tre function f(x) which relates features, x to labels, y
which best describes the intuition for k-nearest neighbors classifications?	- each nearby datapoint within a certain distance radius votes for the classification of a novel datapoint --> set of datapoints closest to a novel datapoint vote for its classification
which of the following best describes the process of selection a model for a machine learning problem?	model selection involves experimentation influenced by the problem, data, and evaluation
what is the backpropogation algorithm used to compute?	the influence that changing any weight of a neural network has on the prediction error
what is a valid interpretation of a trained perceptron's output?	a binary linear classifier
which of the following best describe the intuition behind linear regression?	a hyperplane through a set of datapoints using the residuals between estimate and actual output
which of the following is not a category of ML? a. unsupervised learning b. reinforcement learning c. supervised learning d. harmonic learning	harmonic learning
T or F: A neural network of sufficient size can in theory approximate any continuous function	true
which of the following is a commonly used type of artificial neural network?	multi-layer perceptron
in context of machine learning, which best describes the concept of Ockham's Razor?	given equal performance the least complex model is often preferred
which describes a likely observation you could make on an overfit model?	it performs poorly on the validation data but well on the training data
which of the following best describes the intuition behind a SVM?	a hyperplave divides data in such a way that maximizes the margin between the categories
what is the main reason why a step function is not an ideal activation function for a neural network?	its not differentiable and thus incompatible with differential optimization techniques
T or F: When approaching a ML problem, there's only one correct model to use to create accurate predictions	false
for a perceptron, which of the following best describes a hard threshold activation?	a linear sum of weighted inputs is taken. if that sum exceeds a set value then the perceptron activates sending a fixed signal to its downstream connections
Which best describes the "Kernel Trick"?	A trick to find a decision boundary in a different coordinate space
T or F: Multilayer neural networks can in theory predict any continous function	true
T or F: a ML model that is overfit to a dataset will generlize well	false
physical symbol system hypothesis:	the idea that manipulating symbols according to rules is sufficient for general intelligence. - top-down - rule-based
Which hypothesis underlies “Good Old-Fashioned AI”	Physical symbol system hypothesis
In connectionist AI systems, “learning” primarily means:	Adjusting neural-network parameters from data
Connectionist AI:	- bottom-up, inductive approach. - Systems learn rules and patterns directly from data (observations) rather than being explicitly programmed.
Which approach is top-down and rule-based? A. Symbolic AI B. Connectionist AI C. Reinforcement learning D. Evolutionary algorithms	A. Symbolic AI
Natural Language Processing (NLP) is primarily concerned with: A. Visual scene understanding B. Physical robot control C. Understanding and generating human language D. Optimizing search algorithms	C. Understanding and generating human language
Computer Vision:	Enabling agents to "see" and interpret visual information.
The chain rule in language modeling expresses	joint probability as a product of conditional probabilities
A bigram model relies on which assumption?	P(wᵢ\|w₁…wᵢ₋₁) ≈ P(wᵢ\|wᵢ₋₁) - a bigram is two units ex. 'th', or 'the cat' - Edge only from Wi−1 to Wi.
Which n-gram model ignores word order entirely?	Unigram
Corpus	Corpus: A collection of text or speech data used for analysis or training
Why is the full joint distribution P(w₁…wₘ) infeasible to compute directly?	Table size grows as \|V\|ᵐ (explodes)
Language identification via n-grams works by: A. Counting parts of speech B. Comparing sequence probability under each language model C. Parsing with a CFG D. Measuring sentence length	Comparing sequence probability under each language model
In LLMs, the “context window” is: A. Number of GPUs used B. Maximum length of prefix tokens the model can attend to C. Batch size during training D. Size of the model vocabulary	Maximum length of prefix tokens the model can attend to
Which stage of LLM training uses human-labelled prompt/output pairs to teach formatting? A. Pre-training B. Tokenization C. Instruction tuning D. Inference	Instruction tuning
RLHF stands for: A. Reinforcement Learning from Human Feedback B. Recurrent Language Hierarchical Framework C. Randomized Learning Hyperparameter Fitting D. Rule-based Language Heuristic Fusion Answer: A	Reinforcement Learning from Human Feedback
Chain-of-thought” prompting aims to: A. Force shorter outputs B. Make the model list bullet points C. Have the model articulate its step-by-step reasoning D. Restrict vocabulary usage	Have the model articulate its step-by-step reasoning
Information Extraction	- Extract specific pieces of structured information from unstructured or semi-structured text - An example is extracting product names and their prices from websites
Regex	A sequence of characters defining a search pattern (ex. format of a price or phone number)
Risk of LLM's being prompted to perform information extraction:	LLMs may "hallucinate" or invent information that isn't actually present in the text, making them less reliable than regex for tasks requiring high accuracy
Information retrieval	find documents relevant to a user's query from a large collection (corpus) - ex. web search enginges (google, bing, etc.)
components of information retrieval	- document collection: The large set of documents to search within - query: The user's expression of their information need - retrieval system: The algorithm/system that processes the query and retruns ranked subset of documents deemed relevant
In TF-IDF scoring, a term gets high weight if it is: A. Frequent in all documents B. Rare in the corpus but frequent in the current document C. Absent from the current document D. Only appears in stop-word list	high score if frequent in the doc but rare overall. Sum scores for query terms
TF-IDF	- TF: How often a term appears in the document. High scores suggests relevance - IDF: How rare a term is across the entire corpus. Rarer terms are considered more informative IDF(t) = log(N/df(t)), [N = total docs, d(f) = # docs containing term]
PageRank ranks web pages based on: A. Term frequency B. Link structure (“importance” via incoming links) C. Document length D. Keyword density	Scores pages based on link structure ("popularity contest")
In a TF-IDF scheme, what role does the IDF component play?	It downweights terms that appear in many documents, so common words carry less influence. (ex. "the", "and", etc)
NLP Task: Syntactic Parsing	Goal: Analyze the grammatical structure of a sentence according to a formal grammar - relies on formal grammars, often CFGs, which define rules for how words (terminals) group into constituents (non-terminals) like noun phrases and verb phrases.
Context-Free Grammar	A type of grammer where rules apply regardless of surrounding context In a plain CFG, if you have two ways to expand NP—say 1. NP → Det Noun (ex. "the cat") 2. NP → Name (ex. Sam) you don’t say which one is preferred; both parses are just “allowed.”
A Probabilistic CFG (PCFG) extends a CFG by	Assigning probabilities to each production rule - instead of treating every grammar rule as equally “possible,” a PCFG lets you say “Rule X is twice as likely as Rule Y.”
Probabilistic CFG (PCFG)	Assigns probabilities to each grammer rule based on its observed frequency in a corpus
NP → Det Noun (tree)	NP └── Det Noun \| \| “the” “dog”
Calculate the PCFP of "the cat": Rule Probability NP → Det Noun 0.6 NP → "dog" 0.5 Det → “the” 1.0 Det → “cat” 0.5	Probability: 0.6 x 1.0 x 0.5 = 0.3 NP ├─ Det → “the” └─ Noun → “cat”
In a parse tree, “terminals” are: A. Non-terminal symbols like NP or VP B. Actual words of the sentence C. Probability values D. Grammar rules	- Actual words of a sentence (leaves of the parse tree) - the set of terminals is the lexicon/vocabulary
Word embeddings differ from one-hot vectors because they are: A. Sparse and high-dimensional B. Dense and low-dimensional, learned to capture similarity C. Randomly assigned D. Always binary	- word embed: Dense and low-dimensional, learned to capture similarity - one-hot: very sparse and high dimensional, treats words as independent symbols
Word embedding example	Vocabulary: "cat" "dog" "mouse" Example sentence: “cat dog cat” , Indexing: c→0, d→1, m→2 Word-count vector looks like: [2, 1, 0] What you see: "c" appears twice. "d" once, "m" zero Drawbacks: - order is lost ("dog cat cat" gives same vector)
One-Hot Encoding example	"cat" → [1,0,0] "dog" → [0,1,0] "mouse" → [0,0,1] every word is turned into a vector of length \|V\| = 3, with a single 1 at its index
why are word embeddings better than one-hot encoding?	One-hot are sparse and don't capture meaning. Word embeddings are better: each word is mapped to a dense, relatively low-dimensional vector whose values are learned during training. These embeddings often capture semantic relationships between words.
Softmax activation is used to: A. Normalize outputs into a probability distribution (sum=1) B. Compute the maximum activation only C. Introduce non-linearity by thresholding at 0 D. Pool features spatially	Normalize outputs into a probability distribution (sum=1)
An RNN’s hidden state ht is updated by combining: A. Previous output only B. Previous hidden state ht₋₁ and current input embedding C. Unrelated random noise D. Future target tokens	Previous hidden state ht₋₁ and current input embedding
RNNs often struggle to learn long-range dependencies due to: A. Exploding gradients only B. Vanishing gradients only C. Both vanishing and exploding gradients D. Lack of embeddings	Both vanishing and exploding gradients
Greedy decoding always picks: A. A random next word B. The highest-probability next word C. The least probable next word D. A word based on TF-IDF	- The highest-probability next word - often leads to repetivite and deterministic output
How do RNN's build off of fixed-windows?	-RNN's were developed to handle sequential data more effectively -added: hidden state (ht) that is updated at each time step (t) based on the current input (Et) and the prev hidden state (ht-1) -allows network to maintain a summary of sequences seen
Search algorithm components	states actions initial state goal state transition edge cost
temperature	the parameter controlling randomness - higher temp = more randomness (flattens distribution) - lower temp = more greedy (sharpens distribution)
nucleau sampling (top-p)	-A common advanced sampling method that considers only the most probable words whose cumulative probability exceeds a threshold 'p'. -all tokens with cumulative probability ≥ p -balance temp by sampling over core set (nucl) of most probable next words
Rule-based chatbots rely on: A. Large neural networks B. Predefined patterns and templates (often regex) C. Probabilistic context-free grammars D. Reinforcement learning	Predefined patterns and templates (often regex)
Conversational Agents	Agents interacting within a conversational environment using natural language
two categories of conversational agents	-chatbots: designed for open-ended convo -task-oriented: designed to help user accomplish specific goals (e.g. Siri/Alexa, automated phone system)
Corpus-based chatbot	retrieves responses from a large database of existing conversations (e.g. movie scripts, twitter data)
Rule-based chatbot strength and weakness	-precise for inputs they recognize -fail on anything unexpected and requires significant manual effort
Corpus-based chatbots strength and weakness	-handle more variety -lack control and can inherit biases or undesirable content from the training data
Automatic Speech Recognition (ASR)	-Convert user's spoken audio into text ("utterance") -modern ASR uses deep learning (e.g. transformers)
Natural Language Understanding (NLU)	Analyze the utterance text to determine the user's goal. steps: 1. Domain classification 2. Intent Determination 3. Slot filling
NLU: Domain Classification	Identify the general topic (e.g. Weather, Music)
NLU: Intent Determination	Identify the specific action requested within the doman (e.g. GetWeather, PlayMusic etc.)
NLU: Slot filling	Extract specific parameters (slots) needed to fulfill the users intent (e.g. Location="Boston", Date="Tomorrow" for GetWeather)
Computer vision	- extract high-level knowledge and understanding from visual data - focuses on enabling machines to interpret and understand information from images and videos
computer vision relation to agent paradigm	comp. vision provides the "perceptron" component, allowing agents to sense and interpret their visual environment to inform state representation and action selection
Digital pixel colors: Grayscale and Color	Grayscale: On value per pixel --> single intensity value Color (RGB): typically 3 values per pixel
Haar Cascade classifier	- an algorithm for object detection, particularly face detection - uses rectangualr features called Haar-like features which capture basic patterns of intensity differences in faces (e.g. the eye region is typically darker than upper cheeks)
Convolutional Neural Networks (CNNs)	A CNN scans an image with small, learnable filters to pick out features like edges or textures, then uses pooling to shrink and summarize those feature maps before feeding them into a final classifier.
What key operations are a part of CNN's?	- convolution - pooling
convolution	Apply small, learnable filters across input img Each filter slides over img, computing dot-product to detect patterns Outputs a feature map showing where each pattern appears Uses parameter sharing (same filter everywhere) to keep the model compact
Pooling	– Downsample each feature map by summarizing small regions (e.g., taking the maximum in each 2×2 block). – Reduces spatial dimensions and computation. – Introduces slight invariance to shifts or distortions in the input.
A convolutional layer differs from a fully-connected layer by: A. Using hand-designed filters only B. Applying the same small filter (kernel) across the spatial dimensions (parameter sharing) C. Operating on one pixel at a time	Applying the same small filter (kernel) across the spatial dimensions (parameter sharing)
Pooling layers (e.g., max-pooling) serve to: A. Increase feature map size B. Reduce spatial dimensions and add invariance to small shifts C. Normalize pixel intensities D. Learn filter weights	Reduce spatial dimensions and add invariance to small shifts
AlexNet (2012):	A deep CNN architecture (8 layers: 5 convolutional, 3 fully connected) developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton.
Why is AlexNet so important?	-Dramatically outperformed all previous approaches (which used traditional computer vision methods) -ignited the modern deep learning era
Moravec’s paradox states that tasks easy for humans (walking, grasping) are often harder for AI than: A. Simple lookup tables B. High-level abstract reasoning (e.g., chess) C. Calculating arithmetic D. Sorting numbers	High-level abstract reasoning (e.g., chess)
The “localization” problem in robotics is about determining:	The robot’s current state (position/orientation/joint angles)
A Kalman filter is used to: A. Generate trajectories B. Estimate the true state by filtering noisy sensor measurements C. Plan collision-free paths D. Optimize control gains offline	Estimate the true state by filtering noisy sensor measurements
n an MDP for reinforcement learning, the agent aims to learn a policy that maximizes: A. Immediate reward only B. Cumulative (discounted) future rewards C. Total number of states visited D. Size of the action space	Cumulative (discounted) future rewards
PID Controller	tracks the error between the desired and actual states and adjusts its output using proportional, integral, and derivative terms to counter disturbances and accurately reach and maintain the target state.
Kalman filters algorithm	used to estimate the true underlying state by statistically averaging noisy measurements over time, producing a smoother, more reliable signal.
Control Theory:	A field dedicated to designing control systems that handle uncertainty and ensure desired behavior despite noise and disturbances. -ex. PID Controllers Reinforcement Learning (RL)
Reinforcement Learning (RL)	Agent learns through trial-and-error interaction with an environment, receiving feedback in the form of rewards or punishments, without explicit data/label pairs like supervised learning.
Markov Decision Process (MDP)	The standard mathematical framework for RL problems -defined by: States (S), Action (A), Transition Probabilities, Reward, Discount Factor
MPD Policy (π(s)→a)	A function mapping states to actions. RL aims to find the optimal policy π∗ that maximizes expected discounted future rewards.
Reinforcement learning (RL) vs Supervied Learning (SL)	In SL, a model learns to map inputs to known outputs by minimizing prediction error on labeled training data. In RL, an agent learns through trial and error by interacting with an environment and optimizing its actions to maximize cumulative reward.
Deep Reinforcement Learning (Deep RL)	combines reinforcement learning with deep neural networks, using networks to approximate policies or value functions so agents can handle very large or continuous state/action spaces.
How to train Deep RL	agent interacts with the environment using its current neural network policy. The collected experiences (state, action, reward, next state) are used to update the network parameters (θ) via gradient-based optimization, aiming to improve expected rewards.
Adversarial Search	deals with multi-agent environments where other agents are actively trying to prevent the agent from reaching its goal -ex. board games
A* search time and space complexity	Both exponential
unifrom cost search time and space complexity	both exponential
Uniform Cost Search	search algorithm that finds the path withe the lowest cumulative cost
DFS time and space complexity	both linear
BFS vs DFS	BFS: explores level by level DFS: explores as deep as possible branch by branch
utility-based agents	Extend goal-based agents by adding a utility function to measure how well a state achieves the goal.
rational agent	defined as one that selects actions expected to maximize its performance measure, given its perceptions and built-in knowledge.
the perceptron model	mathematical model of a biological neuron, a foundational element of artificial neural networks. -developed in 1958
Artificial Neural Network (Multilayer Perceptrons)	-The output of one layer of Perceptrons serves as the input to the next layer. -This interconnected structure allows ANNs to represent much more complex functions than a single Perceptron.

"Know" box contains:
Time elapsed:
Retries:

383 Final