383 Final
Quiz yourself by thinking what should be in
each of the black spaces below before clicking
on it to display the answer.
Help!
|
|
||||
---|---|---|---|---|---|
What would it mean for a machine to pass the Turing Test? | show 🗑
|
||||
What is the primary advantage of the rational agent approach for the purpose of science research? | show 🗑
|
||||
show | A machine that can pick up objects using eating utensils
🗑
|
||||
show | Moravec’s Paradox highlights that AI systems easily outperform humans on abstract, logic‑based tasks (like chess or arithmetic) yet struggle with “simple” skills (like recognizing objects ) that evolution has optimized over millions of years.
🗑
|
||||
What is the difference between Tree-Search Algorithm and Graph-Search Algorithm? | show 🗑
|
||||
T or F: A* search is always optimal | show 🗑
|
||||
In what year was the field of AI officially founded? | show 🗑
|
||||
show | The study of Intelligent Agents
🗑
|
||||
What are the inputs to a search problem? a. Initial State b. Goal State c. Action cost function d. Transition Model e. All of the above | show 🗑
|
||||
T or F: Adversarial Search is exactly the same as performing regular search in a multi-agent environment? | show 🗑
|
||||
What is the name of the computer that famously defeated world champion Gary Kasparov in chess in 1997? | show 🗑
|
||||
show | Space: Exponential, Time: Exponential
🗑
|
||||
What is the main way that graph search algorithms differ? | show 🗑
|
||||
show | Heuristic function
🗑
|
||||
show | A sequence of actions to reach the goal state
🗑
|
||||
show | A goal-based agent selects action to achieve a specific goal, while a utility-based agent selects actions based on how desirable they ate
🗑
|
||||
In what way does online search demonstrate an example of machine learning? | show 🗑
|
||||
show | Alpha-beta
🗑
|
||||
show | the goal test checks if all states in the beliefe state are the goal state
🗑
|
||||
Which of the following tasks would always involve non-deterministic actions? | show 🗑
|
||||
show | Any system capable of general intelligence must operate on sybmols and symbolic manipulation
🗑
|
||||
Which best describes the argument in Rodney Brook's 1990 influential paper "Elephants Don't Play Chess"? | show 🗑
|
||||
In the field of AI, what are the two necessary components of an agent? | show 🗑
|
||||
h(n) = 0 is this heuristic function admissible given non-negative edge weights? | show 🗑
|
||||
What are the four environment assumptions needed to execute proper execution of "pure" search algorithms? | show 🗑
|
||||
show | A reflex agent is rational if it selects actions that maximizes its performance measure based on current percept and knowledge, given its environment and actions.
ex. thermostat controlling heating system is rational if it turns on when temp is below
🗑
|
||||
show | - the critical assumption is that your apponent (MIN) will take the optimal move at every step to minimize MAX's score
- if this is not true, the algorithm may not yield optimal actions
🗑
|
||||
show | to find incremental adjustments to make to all the weights in a neural network
🗑
|
||||
in general, how do we optimize an ML model? | show 🗑
|
||||
Gradient Descent is a general algorithm to do what? | show 🗑
|
||||
Why is a step function not an ideal activation function for a neural network? | show 🗑
|
||||
show | false
🗑
|
||||
which best describes a loss function? | show 🗑
|
||||
in the contect of Machine learning, which best describes a Validation Set? | show 🗑
|
||||
what does alpha describe in the gradient descent update equation? | show 🗑
|
||||
show | h(x) is an estimate function of the underlying tre function f(x) which relates features, x to labels, y
🗑
|
||||
which best describes the intuition for k-nearest neighbors classifications? | show 🗑
|
||||
show | model selection involves experimentation influenced by the problem, data, and evaluation
🗑
|
||||
what is the backpropogation algorithm used to compute? | show 🗑
|
||||
show | a binary linear classifier
🗑
|
||||
which of the following best describe the intuition behind linear regression? | show 🗑
|
||||
show | harmonic learning
🗑
|
||||
show | true
🗑
|
||||
show | multi-layer perceptron
🗑
|
||||
in context of machine learning, which best describes the concept of Ockham's Razor? | show 🗑
|
||||
which describes a likely observation you could make on an overfit model? | show 🗑
|
||||
show | a hyperplave divides data in such a way that maximizes the margin between the categories
🗑
|
||||
show | its not differentiable and thus incompatible with differential optimization techniques
🗑
|
||||
show | false
🗑
|
||||
show | a linear sum of weighted inputs is taken. if that sum exceeds a set value then the perceptron activates sending a fixed signal to its downstream connections
🗑
|
||||
show | A trick to find a decision boundary in a different coordinate space
🗑
|
||||
T or F: Multilayer neural networks can in theory predict any continous function | show 🗑
|
||||
show | false
🗑
|
||||
physical symbol system hypothesis: | show 🗑
|
||||
show | Physical symbol system hypothesis
🗑
|
||||
show | Adjusting neural-network parameters from data
🗑
|
||||
Connectionist AI: | show 🗑
|
||||
show | A. Symbolic AI
🗑
|
||||
show | C. Understanding and generating human language
🗑
|
||||
show | Enabling agents to "see" and interpret visual information.
🗑
|
||||
The chain rule in language modeling expresses | show 🗑
|
||||
show | P(wᵢ|w₁…wᵢ₋₁) ≈ P(wᵢ|wᵢ₋₁)
- a bigram is two units
ex. 'th', or 'the cat'
- Edge only from Wi−1 to Wi.
🗑
|
||||
show | Unigram
🗑
|
||||
show | Corpus: A collection of text or speech data used for analysis or training
🗑
|
||||
show | Table size grows as |V|ᵐ (explodes)
🗑
|
||||
Language identification via n-grams works by: A. Counting parts of speech B. Comparing sequence probability under each language model C. Parsing with a CFG D. Measuring sentence length | show 🗑
|
||||
show | Maximum length of prefix tokens the model can attend to
🗑
|
||||
show | Instruction tuning
🗑
|
||||
RLHF stands for: A. Reinforcement Learning from Human Feedback B. Recurrent Language Hierarchical Framework C. Randomized Learning Hyperparameter Fitting D. Rule-based Language Heuristic Fusion Answer: A | show 🗑
|
||||
show | Have the model articulate its step-by-step reasoning
🗑
|
||||
show | - Extract specific pieces of structured information from unstructured or semi-structured text
- An example is extracting product names and their prices from websites
🗑
|
||||
show | A sequence of characters defining a search pattern (ex. format of a price or phone number)
🗑
|
||||
Risk of LLM's being prompted to perform information extraction: | show 🗑
|
||||
Information retrieval | show 🗑
|
||||
components of information retrieval | show 🗑
|
||||
In TF-IDF scoring, a term gets high weight if it is: A. Frequent in all documents B. Rare in the corpus but frequent in the current document C. Absent from the current document D. Only appears in stop-word list | show 🗑
|
||||
show | - TF: How often a term appears in the document. High scores suggests relevance
- IDF: How rare a term is across the entire corpus. Rarer terms are considered more informative
IDF(t) = log(N/df(t)), [N = total docs, d(f) = # docs containing term]
🗑
|
||||
PageRank ranks web pages based on: A. Term frequency B. Link structure (“importance” via incoming links) C. Document length D. Keyword density | show 🗑
|
||||
In a TF-IDF scheme, what role does the IDF component play? | show 🗑
|
||||
NLP Task: Syntactic Parsing | show 🗑
|
||||
show | A type of grammer where rules apply regardless of surrounding context
In a plain CFG, if you have two ways to expand NP—say
1. NP → Det Noun (ex. "the cat")
2. NP → Name (ex. Sam)
you don’t say which one is preferred; both parses are just “allowed.”
🗑
|
||||
A Probabilistic CFG (PCFG) extends a CFG by | show 🗑
|
||||
show | Assigns probabilities to each grammer rule based on its observed frequency in a corpus
🗑
|
||||
show | NP
└── Det Noun
| |
“the” “dog”
🗑
|
||||
Calculate the PCFP of "the cat": Rule Probability NP → Det Noun 0.6 NP → "dog" 0.5 Det → “the” 1.0 Det → “cat” 0.5 | show 🗑
|
||||
In a parse tree, “terminals” are: A. Non-terminal symbols like NP or VP B. Actual words of the sentence C. Probability values D. Grammar rules | show 🗑
|
||||
Word embeddings differ from one-hot vectors because they are: A. Sparse and high-dimensional B. Dense and low-dimensional, learned to capture similarity C. Randomly assigned D. Always binary | show 🗑
|
||||
Word embedding example | show 🗑
|
||||
show | "cat" → [1,0,0]
"dog" → [0,1,0]
"mouse" → [0,0,1]
every word is turned into a vector of length |V| = 3, with a single 1 at its index
🗑
|
||||
show | One-hot are sparse and don't capture meaning.
Word embeddings are better: each word is mapped to a dense, relatively low-dimensional vector whose values are learned during training.
These embeddings often capture semantic relationships between words.
🗑
|
||||
show | Normalize outputs into a probability distribution (sum=1)
🗑
|
||||
An RNN’s hidden state ht is updated by combining: A. Previous output only B. Previous hidden state ht₋₁ and current input embedding C. Unrelated random noise D. Future target tokens | show 🗑
|
||||
show | Both vanishing and exploding gradients
🗑
|
||||
Greedy decoding always picks: A. A random next word B. The highest-probability next word C. The least probable next word D. A word based on TF-IDF | show 🗑
|
||||
How do RNN's build off of fixed-windows? | show 🗑
|
||||
Search algorithm components | show 🗑
|
||||
show | the parameter controlling randomness
- higher temp = more randomness (flattens distribution)
- lower temp = more greedy (sharpens distribution)
🗑
|
||||
show | -A common advanced sampling method that considers only the most probable words whose cumulative probability exceeds a threshold 'p'.
-all tokens with cumulative probability ≥ p
-balance temp by sampling over core set (nucl) of most probable next words
🗑
|
||||
show | Predefined patterns and templates (often regex)
🗑
|
||||
Conversational Agents | show 🗑
|
||||
show | -chatbots: designed for open-ended convo
-task-oriented: designed to help user accomplish specific goals (e.g. Siri/Alexa, automated phone system)
🗑
|
||||
Corpus-based chatbot | show 🗑
|
||||
Rule-based chatbot strength and weakness | show 🗑
|
||||
show | -handle more variety
-lack control and can inherit biases or undesirable content from the training data
🗑
|
||||
Automatic Speech Recognition (ASR) | show 🗑
|
||||
Natural Language Understanding (NLU) | show 🗑
|
||||
NLU: Domain Classification | show 🗑
|
||||
show | Identify the specific action requested within the doman (e.g. GetWeather, PlayMusic etc.)
🗑
|
||||
NLU: Slot filling | show 🗑
|
||||
show | - extract high-level knowledge and understanding from visual data
- focuses on enabling machines to interpret and understand information from images and videos
🗑
|
||||
computer vision relation to agent paradigm | show 🗑
|
||||
Digital pixel colors: Grayscale and Color | show 🗑
|
||||
show | - an algorithm for object detection, particularly face detection
- uses rectangualr features called Haar-like features which capture basic patterns of intensity differences in faces (e.g. the eye region is typically darker than upper cheeks)
🗑
|
||||
Convolutional Neural Networks (CNNs) | show 🗑
|
||||
show | - convolution
- pooling
🗑
|
||||
convolution | show 🗑
|
||||
show | – Downsample each feature map by summarizing small regions (e.g., taking the maximum in each 2×2 block).
– Reduces spatial dimensions and computation.
– Introduces slight invariance to shifts or distortions in the input.
🗑
|
||||
A convolutional layer differs from a fully-connected layer by: A. Using hand-designed filters only B. Applying the same small filter (kernel) across the spatial dimensions (parameter sharing) C. Operating on one pixel at a time | show 🗑
|
||||
show | Reduce spatial dimensions and add invariance to small shifts
🗑
|
||||
show | A deep CNN architecture (8 layers: 5 convolutional, 3 fully connected) developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton.
🗑
|
||||
Why is AlexNet so important? | show 🗑
|
||||
Moravec’s paradox states that tasks easy for humans (walking, grasping) are often harder for AI than: A. Simple lookup tables B. High-level abstract reasoning (e.g., chess) C. Calculating arithmetic D. Sorting numbers | show 🗑
|
||||
show | The robot’s current state (position/orientation/joint angles)
🗑
|
||||
show | Estimate the true state by filtering noisy sensor measurements
🗑
|
||||
show | Cumulative (discounted) future rewards
🗑
|
||||
show | tracks the error between the desired and actual states and adjusts its output using proportional, integral, and derivative terms to counter disturbances and accurately reach and maintain the target state.
🗑
|
||||
Kalman filters algorithm | show 🗑
|
||||
show | A field dedicated to designing control systems that handle uncertainty and ensure desired behavior despite noise and disturbances.
-ex. PID Controllers Reinforcement Learning (RL)
🗑
|
||||
show | Agent learns through trial-and-error interaction with an environment, receiving feedback in the form of rewards or punishments, without explicit data/label pairs like supervised learning.
🗑
|
||||
Markov Decision Process (MDP) | show 🗑
|
||||
show | A function mapping states to actions. RL aims to find the optimal policy π∗ that maximizes expected discounted future rewards.
🗑
|
||||
Reinforcement learning (RL) vs Supervied Learning (SL) | show 🗑
|
||||
Deep Reinforcement Learning (Deep RL) | show 🗑
|
||||
show | agent interacts with the environment using its current neural network policy. The collected experiences (state, action, reward, next state) are used to update the network parameters (θ) via gradient-based optimization, aiming to improve expected rewards.
🗑
|
||||
show | deals with multi-agent environments where other agents are actively trying to prevent the agent from reaching its goal
-ex. board games
🗑
|
||||
A* search time and space complexity | show 🗑
|
||||
unifrom cost search time and space complexity | show 🗑
|
||||
Uniform Cost Search | show 🗑
|
||||
DFS time and space complexity | show 🗑
|
||||
BFS vs DFS | show 🗑
|
||||
utility-based agents | show 🗑
|
||||
show | defined as one that selects actions expected to maximize its performance measure, given its perceptions and built-in knowledge.
🗑
|
||||
the perceptron model | show 🗑
|
||||
show | -The output of one layer of Perceptrons serves as the input to the next layer.
-This interconnected structure allows ANNs to represent much more complex functions than a single Perceptron.
🗑
|
Review the information in the table. When you are ready to quiz yourself you can hide individual columns or the entire table. Then you can click on the empty cells to reveal the answer. Try to recall what will be displayed before clicking the empty cell.
To hide a column, click on the column name.
To hide the entire table, click on the "Hide All" button.
You may also shuffle the rows of the table by clicking on the "Shuffle" button.
Or sort by any of the columns using the down arrow next to any column heading.
If you know all the data on any row, you can temporarily remove it by tapping the trash can to the right of the row.
To hide a column, click on the column name.
To hide the entire table, click on the "Hide All" button.
You may also shuffle the rows of the table by clicking on the "Shuffle" button.
Or sort by any of the columns using the down arrow next to any column heading.
If you know all the data on any row, you can temporarily remove it by tapping the trash can to the right of the row.
Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.
Normal Size Small Size show me how
Normal Size Small Size show me how
Created by:
user-1948709
Popular Computers sets