Save
Busy. Please wait.
Log in with Clever
or

show password
Forgot Password?

Don't have an account?  Sign up 
Sign up using Clever
or

Username is available taken
show password

Your email address is only used to allow you to reset your password. See our Privacy Policy and Terms of Service.


Already a StudyStack user? Log In

Reset Password
Enter the associated with your account, and we'll email you a link to reset your password.

Term

What would it mean for a machine to pass the Turing Test?
click to flip
focusNode
Didn't know it?
click below
 
Knew it?
click below
Don't know

Term

What is the primary advantage of the rational agent approach for the purpose of science research?
Remaining cards (143)
Know
0:00
Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

  Normal Size     Small Size show me how

383 Final

TermDefinition
What would it mean for a machine to pass the Turing Test? That a human cannot distinguish the difference bewteen that machine and another human over a conversation
What is the primary advantage of the rational agent approach for the purpose of science research? It provides a framework for creating machines whose "intelligence" is measurable and comparable
Following Moravec's Paradox, which of these tasks would require the most computational power? a. A program that solves tic-tac-toe b. A machine that can pick up objects using eating utensils c. A program that can solve math eqns A machine that can pick up objects using eating utensils
What is Moravec’s Paradox? Moravec’s Paradox highlights that AI systems easily outperform humans on abstract, logic‑based tasks (like chess or arithmetic) yet struggle with “simple” skills (like recognizing objects ) that evolution has optimized over millions of years.
What is the difference between Tree-Search Algorithm and Graph-Search Algorithm? Graph Search algorithms keep track of explored nodes to prevent cycles
T or F: A* search is always optimal False
In what year was the field of AI officially founded? 1956
Which of the following best describes how leading textbooks define the study of AI? The study of Intelligent Agents
What are the inputs to a search problem? a. Initial State b. Goal State c. Action cost function d. Transition Model e. All of the above initial state, goal state, action cost function, transition model (all of the above)
T or F: Adversarial Search is exactly the same as performing regular search in a multi-agent environment? False
What is the name of the computer that famously defeated world champion Gary Kasparov in chess in 1997? Deep Blue
What are the space and time complexities of BFS? Space: Exponential, Time: Exponential
What is the main way that graph search algorithms differ? By how they select the next node for exploration
Which element of search algorithms is unique to informed search strategies? Heuristic function
WHat is the output of a general search algorithm? A sequence of actions to reach the goal state
What is the key difference between a goal-based agent and a utility-based agent? A goal-based agent selects action to achieve a specific goal, while a utility-based agent selects actions based on how desirable they ate
In what way does online search demonstrate an example of machine learning? The agent learns the transition model of a problem after completing online search
Which of the following is an example of a pruning algorithm? Alpha-beta
In bline search problem formulation, how is the goal test changed? the goal test checks if all states in the beliefe state are the goal state
Which of the following tasks would always involve non-deterministic actions? off-road driving
Which of the following best describes the Physical Symbol System Hypothesis? Any system capable of general intelligence must operate on sybmols and symbolic manipulation
Which best describes the argument in Rodney Brook's 1990 influential paper "Elephants Don't Play Chess"? Elephants don't play chess but that does not mean they are unintelligent and lack behavior worth studying
In the field of AI, what are the two necessary components of an agent? Sensors and Actuators
h(n) = 0 is this heuristic function admissible given non-negative edge weights? True
What are the four environment assumptions needed to execute proper execution of "pure" search algorithms? Deterministic, Observable, Discrete, Known
What conditions must be met in order for a reflex agent to be rational? A reflex agent is rational if it selects actions that maximizes its performance measure based on current percept and knowledge, given its environment and actions. ex. thermostat controlling heating system is rational if it turns on when temp is below
what is the critical assumption of the minimax algorithm? If that assumption were not true, would the algorithm still yield the optiomal action to take? - the critical assumption is that your apponent (MIN) will take the optimal move at every step to minimize MAX's score - if this is not true, the algorithm may not yield optimal actions
what is the backpropogation algorithm used for? to find incremental adjustments to make to all the weights in a neural network
in general, how do we optimize an ML model? - Minimize a loss function with respect to the parameters of the network - Minimize a cost function with respect to the parameters of the network
Gradient Descent is a general algorithm to do what? Find the minimim of a function
Why is a step function not an ideal activation function for a neural network? Its not differentiable
T or F: Linear regression can be applied only on 2 dimensional data false
which best describes a loss function? a measure of the imperfection of our prediction
in the contect of Machine learning, which best describes a Validation Set? A partition of data used to tune hyper-parameters of a model before testing
what does alpha describe in the gradient descent update equation? the learning rate
in the context of machine learning, which statement best describes the purpose of a hypothesis function h(x)? h(x) is an estimate function of the underlying tre function f(x) which relates features, x to labels, y
which best describes the intuition for k-nearest neighbors classifications? - each nearby datapoint within a certain distance radius votes for the classification of a novel datapoint --> set of datapoints closest to a novel datapoint vote for its classification
which of the following best describes the process of selection a model for a machine learning problem? model selection involves experimentation influenced by the problem, data, and evaluation
what is the backpropogation algorithm used to compute? the influence that changing any weight of a neural network has on the prediction error
what is a valid interpretation of a trained perceptron's output? a binary linear classifier
which of the following best describe the intuition behind linear regression? a hyperplane through a set of datapoints using the residuals between estimate and actual output
which of the following is not a category of ML? a. unsupervised learning b. reinforcement learning c. supervised learning d. harmonic learning harmonic learning
T or F: A neural network of sufficient size can in theory approximate any continuous function true
which of the following is a commonly used type of artificial neural network? multi-layer perceptron
in context of machine learning, which best describes the concept of Ockham's Razor? given equal performance the least complex model is often preferred
which describes a likely observation you could make on an overfit model? it performs poorly on the validation data but well on the training data
which of the following best describes the intuition behind a SVM? a hyperplave divides data in such a way that maximizes the margin between the categories
what is the main reason why a step function is not an ideal activation function for a neural network? its not differentiable and thus incompatible with differential optimization techniques
T or F: When approaching a ML problem, there's only one correct model to use to create accurate predictions false
for a perceptron, which of the following best describes a hard threshold activation? a linear sum of weighted inputs is taken. if that sum exceeds a set value then the perceptron activates sending a fixed signal to its downstream connections
Which best describes the "Kernel Trick"? A trick to find a decision boundary in a different coordinate space
T or F: Multilayer neural networks can in theory predict any continous function true
T or F: a ML model that is overfit to a dataset will generlize well false
physical symbol system hypothesis: the idea that manipulating symbols according to rules is sufficient for general intelligence. - top-down - rule-based
Which hypothesis underlies “Good Old-Fashioned AI” Physical symbol system hypothesis
In connectionist AI systems, “learning” primarily means: Adjusting neural-network parameters from data
Connectionist AI: - bottom-up, inductive approach. - Systems learn rules and patterns directly from data (observations) rather than being explicitly programmed.
Which approach is top-down and rule-based? A. Symbolic AI B. Connectionist AI C. Reinforcement learning D. Evolutionary algorithms A. Symbolic AI
Natural Language Processing (NLP) is primarily concerned with: A. Visual scene understanding B. Physical robot control C. Understanding and generating human language D. Optimizing search algorithms C. Understanding and generating human language
Computer Vision: Enabling agents to "see" and interpret visual information.
The chain rule in language modeling expresses joint probability as a product of conditional probabilities
A bigram model relies on which assumption? P(wᵢ|w₁…wᵢ₋₁) ≈ P(wᵢ|wᵢ₋₁) - a bigram is two units ex. 'th', or 'the cat' - Edge only from Wi−1​ to Wi​.
Which n-gram model ignores word order entirely? Unigram
Corpus Corpus: A collection of text or speech data used for analysis or training
Why is the full joint distribution P(w₁…wₘ) infeasible to compute directly? Table size grows as |V|ᵐ (explodes)
Language identification via n-grams works by: A. Counting parts of speech B. Comparing sequence probability under each language model C. Parsing with a CFG D. Measuring sentence length Comparing sequence probability under each language model
In LLMs, the “context window” is: A. Number of GPUs used B. Maximum length of prefix tokens the model can attend to C. Batch size during training D. Size of the model vocabulary Maximum length of prefix tokens the model can attend to
Which stage of LLM training uses human-labelled prompt/output pairs to teach formatting? A. Pre-training B. Tokenization C. Instruction tuning D. Inference Instruction tuning
RLHF stands for: A. Reinforcement Learning from Human Feedback B. Recurrent Language Hierarchical Framework C. Randomized Learning Hyperparameter Fitting D. Rule-based Language Heuristic Fusion Answer: A Reinforcement Learning from Human Feedback
Chain-of-thought” prompting aims to: A. Force shorter outputs B. Make the model list bullet points C. Have the model articulate its step-by-step reasoning D. Restrict vocabulary usage Have the model articulate its step-by-step reasoning
Information Extraction - Extract specific pieces of structured information from unstructured or semi-structured text - An example is extracting product names and their prices from websites
Regex A sequence of characters defining a search pattern (ex. format of a price or phone number)
Risk of LLM's being prompted to perform information extraction: LLMs may "hallucinate" or invent information that isn't actually present in the text, making them less reliable than regex for tasks requiring high accuracy
Information retrieval find documents relevant to a user's query from a large collection (corpus) - ex. web search enginges (google, bing, etc.)
components of information retrieval - document collection: The large set of documents to search within - query: The user's expression of their information need - retrieval system: The algorithm/system that processes the query and retruns ranked subset of documents deemed relevant
In TF-IDF scoring, a term gets high weight if it is: A. Frequent in all documents B. Rare in the corpus but frequent in the current document C. Absent from the current document D. Only appears in stop-word list high score if frequent in the doc but rare overall. Sum scores for query terms
TF-IDF - TF: How often a term appears in the document. High scores suggests relevance - IDF: How rare a term is across the entire corpus. Rarer terms are considered more informative IDF(t) = log(N/df(t)), [N = total docs, d(f) = # docs containing term]
PageRank ranks web pages based on: A. Term frequency B. Link structure (“importance” via incoming links) C. Document length D. Keyword density Scores pages based on link structure ("popularity contest")
In a TF-IDF scheme, what role does the IDF component play? It downweights terms that appear in many documents, so common words carry less influence. (ex. "the", "and", etc)
NLP Task: Syntactic Parsing Goal: Analyze the grammatical structure of a sentence according to a formal grammar - relies on formal grammars, often CFGs, which define rules for how words (terminals) group into constituents (non-terminals) like noun phrases and verb phrases.
Context-Free Grammar A type of grammer where rules apply regardless of surrounding context In a plain CFG, if you have two ways to expand NP—say 1. NP → Det Noun (ex. "the cat") 2. NP → Name (ex. Sam) you don’t say which one is preferred; both parses are just “allowed.”
A Probabilistic CFG (PCFG) extends a CFG by Assigning probabilities to each production rule - instead of treating every grammar rule as equally “possible,” a PCFG lets you say “Rule X is twice as likely as Rule Y.”
Probabilistic CFG (PCFG) Assigns probabilities to each grammer rule based on its observed frequency in a corpus
NP → Det Noun (tree) NP └── Det Noun | | “the” “dog”
Calculate the PCFP of "the cat": Rule Probability NP → Det Noun 0.6 NP → "dog" 0.5 Det → “the” 1.0 Det → “cat” 0.5 Probability: 0.6 x 1.0 x 0.5 = 0.3 NP ├─ Det → “the” └─ Noun → “cat”
In a parse tree, “terminals” are: A. Non-terminal symbols like NP or VP B. Actual words of the sentence C. Probability values D. Grammar rules - Actual words of a sentence (leaves of the parse tree) - the set of terminals is the lexicon/vocabulary
Word embeddings differ from one-hot vectors because they are: A. Sparse and high-dimensional B. Dense and low-dimensional, learned to capture similarity C. Randomly assigned D. Always binary - word embed: Dense and low-dimensional, learned to capture similarity - one-hot: very sparse and high dimensional, treats words as independent symbols
Word embedding example Vocabulary: "cat" "dog" "mouse" Example sentence: “cat dog cat” , Indexing: c→0, d→1, m→2 Word-count vector looks like: [2, 1, 0] What you see: "c" appears twice. "d" once, "m" zero Drawbacks: - order is lost ("dog cat cat" gives same vector)
One-Hot Encoding example "cat" → [1,0,0] "dog" → [0,1,0] "mouse" → [0,0,1] every word is turned into a vector of length |V| = 3, with a single 1 at its index
why are word embeddings better than one-hot encoding? One-hot are sparse and don't capture meaning. Word embeddings are better: each word is mapped to a dense, relatively low-dimensional vector whose values are learned during training. These embeddings often capture semantic relationships between words.
Softmax activation is used to: A. Normalize outputs into a probability distribution (sum=1) B. Compute the maximum activation only C. Introduce non-linearity by thresholding at 0 D. Pool features spatially Normalize outputs into a probability distribution (sum=1)
An RNN’s hidden state ht is updated by combining: A. Previous output only B. Previous hidden state ht₋₁ and current input embedding C. Unrelated random noise D. Future target tokens Previous hidden state ht₋₁ and current input embedding
RNNs often struggle to learn long-range dependencies due to: A. Exploding gradients only B. Vanishing gradients only C. Both vanishing and exploding gradients D. Lack of embeddings Both vanishing and exploding gradients
Greedy decoding always picks: A. A random next word B. The highest-probability next word C. The least probable next word D. A word based on TF-IDF - The highest-probability next word - often leads to repetivite and deterministic output
How do RNN's build off of fixed-windows? -RNN's were developed to handle sequential data more effectively -added: hidden state (ht) that is updated at each time step (t) based on the current input (Et) and the prev hidden state (ht-1) -allows network to maintain a summary of sequences seen
Search algorithm components states actions initial state goal state transition edge cost
temperature the parameter controlling randomness - higher temp = more randomness (flattens distribution) - lower temp = more greedy (sharpens distribution)
nucleau sampling (top-p) -A common advanced sampling method that considers only the most probable words whose cumulative probability exceeds a threshold 'p'. -all tokens with cumulative probability ≥ p -balance temp by sampling over core set (nucl) of most probable next words
Rule-based chatbots rely on: A. Large neural networks B. Predefined patterns and templates (often regex) C. Probabilistic context-free grammars D. Reinforcement learning Predefined patterns and templates (often regex)
Conversational Agents Agents interacting within a conversational environment using natural language
two categories of conversational agents -chatbots: designed for open-ended convo -task-oriented: designed to help user accomplish specific goals (e.g. Siri/Alexa, automated phone system)
Corpus-based chatbot retrieves responses from a large database of existing conversations (e.g. movie scripts, twitter data)
Rule-based chatbot strength and weakness -precise for inputs they recognize -fail on anything unexpected and requires significant manual effort
Corpus-based chatbots strength and weakness -handle more variety -lack control and can inherit biases or undesirable content from the training data
Automatic Speech Recognition (ASR) -Convert user's spoken audio into text ("utterance") -modern ASR uses deep learning (e.g. transformers)
Natural Language Understanding (NLU) Analyze the utterance text to determine the user's goal. steps: 1. Domain classification 2. Intent Determination 3. Slot filling
NLU: Domain Classification Identify the general topic (e.g. Weather, Music)
NLU: Intent Determination Identify the specific action requested within the doman (e.g. GetWeather, PlayMusic etc.)
NLU: Slot filling Extract specific parameters (slots) needed to fulfill the users intent (e.g. Location="Boston", Date="Tomorrow" for GetWeather)
Computer vision - extract high-level knowledge and understanding from visual data - focuses on enabling machines to interpret and understand information from images and videos
computer vision relation to agent paradigm comp. vision provides the "perceptron" component, allowing agents to sense and interpret their visual environment to inform state representation and action selection
Digital pixel colors: Grayscale and Color Grayscale: On value per pixel --> single intensity value Color (RGB): typically 3 values per pixel
Haar Cascade classifier - an algorithm for object detection, particularly face detection - uses rectangualr features called Haar-like features which capture basic patterns of intensity differences in faces (e.g. the eye region is typically darker than upper cheeks)
Convolutional Neural Networks (CNNs) A CNN scans an image with small, learnable filters to pick out features like edges or textures, then uses pooling to shrink and summarize those feature maps before feeding them into a final classifier.
What key operations are a part of CNN's? - convolution - pooling
convolution Apply small, learnable filters across input img Each filter slides over img, computing dot-product to detect patterns Outputs a feature map showing where each pattern appears Uses parameter sharing (same filter everywhere) to keep the model compact
Pooling – Downsample each feature map by summarizing small regions (e.g., taking the maximum in each 2×2 block). – Reduces spatial dimensions and computation. – Introduces slight invariance to shifts or distortions in the input.
A convolutional layer differs from a fully-connected layer by: A. Using hand-designed filters only B. Applying the same small filter (kernel) across the spatial dimensions (parameter sharing) C. Operating on one pixel at a time Applying the same small filter (kernel) across the spatial dimensions (parameter sharing)
Pooling layers (e.g., max-pooling) serve to: A. Increase feature map size B. Reduce spatial dimensions and add invariance to small shifts C. Normalize pixel intensities D. Learn filter weights Reduce spatial dimensions and add invariance to small shifts
AlexNet (2012): A deep CNN architecture (8 layers: 5 convolutional, 3 fully connected) developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton.
Why is AlexNet so important? -Dramatically outperformed all previous approaches (which used traditional computer vision methods) -ignited the modern deep learning era
Moravec’s paradox states that tasks easy for humans (walking, grasping) are often harder for AI than: A. Simple lookup tables B. High-level abstract reasoning (e.g., chess) C. Calculating arithmetic D. Sorting numbers High-level abstract reasoning (e.g., chess)
The “localization” problem in robotics is about determining: The robot’s current state (position/orientation/joint angles)
A Kalman filter is used to: A. Generate trajectories B. Estimate the true state by filtering noisy sensor measurements C. Plan collision-free paths D. Optimize control gains offline Estimate the true state by filtering noisy sensor measurements
n an MDP for reinforcement learning, the agent aims to learn a policy that maximizes: A. Immediate reward only B. Cumulative (discounted) future rewards C. Total number of states visited D. Size of the action space Cumulative (discounted) future rewards
PID Controller tracks the error between the desired and actual states and adjusts its output using proportional, integral, and derivative terms to counter disturbances and accurately reach and maintain the target state.
Kalman filters algorithm used to estimate the true underlying state by statistically averaging noisy measurements over time, producing a smoother, more reliable signal.
Control Theory: A field dedicated to designing control systems that handle uncertainty and ensure desired behavior despite noise and disturbances. -ex. PID Controllers Reinforcement Learning (RL)
Reinforcement Learning (RL) Agent learns through trial-and-error interaction with an environment, receiving feedback in the form of rewards or punishments, without explicit data/label pairs like supervised learning.
Markov Decision Process (MDP) The standard mathematical framework for RL problems -defined by: States (S), Action (A), Transition Probabilities, Reward, Discount Factor
MPD Policy (π(s)→a) A function mapping states to actions. RL aims to find the optimal policy π∗ that maximizes expected discounted future rewards.
Reinforcement learning (RL) vs Supervied Learning (SL) In SL, a model learns to map inputs to known outputs by minimizing prediction error on labeled training data. In RL, an agent learns through trial and error by interacting with an environment and optimizing its actions to maximize cumulative reward.
Deep Reinforcement Learning (Deep RL) combines reinforcement learning with deep neural networks, using networks to approximate policies or value functions so agents can handle very large or continuous state/action spaces.
How to train Deep RL agent interacts with the environment using its current neural network policy. The collected experiences (state, action, reward, next state) are used to update the network parameters (θ) via gradient-based optimization, aiming to improve expected rewards.
Adversarial Search deals with multi-agent environments where other agents are actively trying to prevent the agent from reaching its goal -ex. board games
A* search time and space complexity Both exponential
unifrom cost search time and space complexity both exponential
Uniform Cost Search search algorithm that finds the path withe the lowest cumulative cost
DFS time and space complexity both linear
BFS vs DFS BFS: explores level by level DFS: explores as deep as possible branch by branch
utility-based agents Extend goal-based agents by adding a utility function to measure how well a state achieves the goal.
rational agent defined as one that selects actions expected to maximize its performance measure, given its perceptions and built-in knowledge.
the perceptron model mathematical model of a biological neuron, a foundational element of artificial neural networks. -developed in 1958
Artificial Neural Network (Multilayer Perceptrons) -The output of one layer of Perceptrons serves as the input to the next layer. -This interconnected structure allows ANNs to represent much more complex functions than a single Perceptron.
Created by: user-1948709
Popular Computers sets

 

 



Voices

Use these flashcards to help memorize information. Look at the large card and try to recall what is on the other side. Then click the card to flip it. If you knew the answer, click the green Know box. Otherwise, click the red Don't know box.

When you've placed seven or more cards in the Don't know box, click "retry" to try those cards again.

If you've accidentally put the card in the wrong box, just click on the card to take it out of the box.

You can also use your keyboard to move the cards as follows:

If you are logged in to your account, this website will remember which cards you know and don't know so that they are in the same box the next time you log in.

When you need a break, try one of the other activities listed below the flashcards like Matching, Snowman, or Hungry Bug. Although it may feel like you're playing a game, your brain is still making more connections with the information to help you out.

To see how well you know the information, try the Quiz or Test activity.

Pass complete!
"Know" box contains:
Time elapsed:
Retries:
restart all cards