Save
Upgrade to remove ads
Busy. Please wait.
Log in with Clever
or

show password
Forgot Password?

Don't have an account?  Sign up 
Sign up using Clever
or

Username is available taken
show password


Make sure to remember your password. If you forget it there is no way for StudyStack to send you a reset link. You would need to create a new account.
Your email address is only used to allow you to reset your password. See our Privacy Policy and Terms of Service.


Already a StudyStack user? Log In

Reset Password
Enter the associated with your account, and we'll email you a link to reset your password.
focusNode
Didn't know it?
click below
 
Knew it?
click below
Don't Know
Remaining cards (0)
Know
0:00
Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

  Normal Size     Small Size show me how

Deep Learning final

QuestionAnswer
CNN biological inspiration D. Hubel and T. Wiesel
CNN backstory Neocognitron, LeNet-5
CNN output size (W-F+2P)/S)+ 1
receptive field region of the input feature map whose values contribute to the response of that unit
receptive field size 1 + L(F - 1)
early CNNs nearly all learnable parameters are located in the FC layers
Google LeNet / Inception Uses a "stem" network to aggressively downsample the image early on.
ResNet uses skip connections
ResNext splits feature maps into groups, concats separate convolutions
DenseNet concats all previous layers' feature maps to the next layer
small batches less memory, more gradient noise (regularization)
large batches lower gradient variance, faster hardware utilization
learning rate decay needed so network can step into local minimum
momentum adds a fraction of the previous update to the current one
adagrad adapts learning rate per parameter. flaw: accumulates all past gradients so the LR eventually drops to zero
RMSprop fixes adagrad by using exponential moving average of past squared gradients
Adam combines RMSProp and Momentum
pre-processing zero-centering and normalizing variance. important so that all inputs to activation functions aren't strictly positive/negative.
weight initialization don't initialize to zero (causes symmetric gradients)
batch norm shifts and rescales activations using the mean and variance of current mini-batch
L1/L2 (weight decay) penalizes large weights in the loss function
early stopping stop training when validation error starts to rise
dropout randomly turns off neurons during training with probability p
label smoothing replaces hard labels with soft targets
distillation train a large teacher network to train smaller student network to fit softmax probabilities generated by the teacher
IoU (intersection over union) area of overlap / area of union. measures how well a predicted box aligns with the ground truth. >0.5 is decent
NMS (non-maximum suppression) detectors output multiple overlapping boxes for one object. NMS selects the highest scoring box and deletes all other boxes that overlap it significantly
evaluation (mAP) calculate precision vs. recall curve. area under curve = average precision (AP). mean AP averages across all classes
R-CNN 1. generate regions using external tool (selective search) 2. warp regions to fixed size. 3. pass each region through a CNN. 4. Classify with SVM. Flaw: extremely slow
Fast R-CNN 1. Pass the whole image through CNN to get a feature map. 2. Project region proposals onto the feature map. 3. RoI Pooling to extract fixed-size features for each region. 4. FC layers for class + box offsets. Flaw: still relies on selective search
Faster R-CNN replaces external proposal generator with region proposal network. fully end-to-end.
Mask R-CNN adds a third branch branch to Faster R-CNN to predict a pixel-level binary mask for the object. bilinear interpolation
YOLO divide image into a coarse grid. each grid cell directly predicts class probabilities and bounding boxes.
SSD predicts boxes from multiple different convolutional feature maps at different resolutions
RetinaNet & Focal loss down weight loss for easy samples so network focus on hard examples
vanilla RNN h_t = tanh(W_hh*h_t-1 + Wxh*xt) hidden state acts as memory
vanishing/exploding gradients because the exact same weight matrix W is multiplied repeatedly across time steps, gradients shrink to 0 or grow to inf
bi-directional RNN process sequence both ways, concating hidden states. good when you need future context
LSTM adds a separate cell state c_t. uses sigmoid gates to decide updating memory
GRU (gated recurrent unit) simplifiedLSTM. merges cell state and hidden state. merges forget and input gates into single update gate
beam search keep top k best overall sequences at every step
seq2seq standard encoder-decoder bottlenecks the entire input sequence into a single hidden vector
Attention(Q,K,V) softmax(QK^T/sqrt(D)) V
query what the current token is looking for
key what the other tokens contain
value actual information the token will pass along if selected
why divide by sqrt(D) prevent vanishing gradients, maintaining consistent variance
encoder self-attention Q,K,V all come from previous encoder layer. every token looks at eveyr other token
decoder self-attention Q,K,V come from previous decoder layer. masked so token can only look at past tokens
encoder-decoder (cross) attention Q comes from decoder. K and V come from encoder's final output
transformer memory O(N^2) w.r.t. sequence length
positional encoding adds sine/cosine waves of different frequencies to the input empeddings
RoPE (Rotary Positional Embedding) instead of adding position to embeddings, it rotates Q and K vectors in space.
ViT (vision transformer) split image into a grid of non-overlapping patches, flatten each patch, apply linear projection, add positional embeddings, feed them into Transfer encoder
Swin (shifted window transformer) solves quadratic cost of ViT. computes self-attention only within local windows. Merges patches in deeper layers
Markov decision process states (s), actions (a), transition model P(s' | s, a), and reward function r(s). Next state s' depends only on the current state s and action a
Bellman-equations V(s)=r(s) + discount * max_aE[V(s')] Q(s,a) = r(s) + discount * E[max_a Q(s', a')]
Q learning uses neural net to approximate Q values. experience replay buffer and frozen target network to prevent instability
double DQN Standard DQN uses the max operator to both select and evaluate actions, causing overestimation. Double DQN uses the online network to select the action, and the target network to evaluate its value.
dueling DQN splits network head into two paths: one predicts the general state values V(s) other predicts advantage of each action A(s, a).
REINFORCE Learns the policy directly pi(a | s). Multiply gradient of the log-probability of an action by the actual reward received.
Actor-Critic Combines policy gradients (actor) with Q-learning (critic).
PPO (proximal policy optimization) uses clipped surrogate objective that prevents new policy from changing too much from the old policy. allows safe reuse of mini-batches.
Created by: user-2040991
 

 



Voices

Use these flashcards to help memorize information. Look at the large card and try to recall what is on the other side. Then click the card to flip it. If you knew the answer, click the green Know box. Otherwise, click the red Don't know box.

When you've placed seven or more cards in the Don't know box, click "retry" to try those cards again.

If you've accidentally put the card in the wrong box, just click on the card to take it out of the box.

You can also use your keyboard to move the cards as follows:

If you are logged in to your account, this website will remember which cards you know and don't know so that they are in the same box the next time you log in.

When you need a break, try one of the other activities listed below the flashcards like Matching, Snowman, or Hungry Bug. Although it may feel like you're playing a game, your brain is still making more connections with the information to help you out.

To see how well you know the information, try the Quiz or Test activity.

Pass complete!
"Know" box contains:
Time elapsed:
Retries:
restart all cards