Question 1

CNN biological inspiration

Accepted Answer

D. Hubel and T. Wiesel

Question 2

CNN backstory

Accepted Answer

Neocognitron, LeNet-5

Question 3

CNN output size

Accepted Answer

(W-F+2P)/S)+ 1

Question 4

receptive field

Accepted Answer

region of the input feature
map whose values contribute
to the response of that unit

Question 5

receptive field size

Accepted Answer

1 + L(F - 1)

Question 6

early CNNs

Accepted Answer

nearly all learnable parameters are located in the FC layers

Question 7

Google LeNet / Inception

Accepted Answer

Uses a "stem" network to aggressively downsample the image early on.

Question 8

ResNet

Accepted Answer

uses skip connections

Question 9

ResNext

Accepted Answer

splits feature maps into groups, concats separate convolutions

Question 10

DenseNet

Accepted Answer

concats all previous layers' feature maps to the next layer

Question 11

small batches

Accepted Answer

less memory, more gradient noise (regularization)

Question 12

large batches

Accepted Answer

lower gradient variance, faster hardware utilization

Question 13

learning rate decay

Accepted Answer

needed so network can step into local minimum

Question 14

momentum

Accepted Answer

adds a fraction of the previous update to the current one

Question 15

adagrad

Accepted Answer

adapts learning rate per parameter. flaw: accumulates all past gradients so the LR eventually drops to zero

Question 16

RMSprop

Accepted Answer

fixes adagrad by using exponential moving average of past squared gradients

Question 17

Adam

Accepted Answer

combines RMSProp and Momentum

Question 18

pre-processing

Accepted Answer

zero-centering and normalizing variance. important so that all inputs to activation functions aren't strictly positive/negative.

Question 19

weight initialization

Accepted Answer

don't initialize to zero (causes symmetric gradients)

Question 20

batch norm

Accepted Answer

shifts and rescales activations using the mean and variance of current mini-batch

Question 21

L1/L2 (weight decay)

Accepted Answer

penalizes large weights in the loss function

Question 22

early stopping

Accepted Answer

stop training when validation error starts to rise

Question 23

dropout

Accepted Answer

randomly turns off neurons during training with probability p

Question 24

label smoothing

Accepted Answer

replaces hard labels with soft targets

Question 25

distillation

Accepted Answer

train a large teacher network to train smaller student network to fit softmax probabilities generated by the teacher

Question 26

IoU (intersection over union)

Accepted Answer

area of overlap / area of union. measures how well a predicted box aligns with the ground truth. >0.5 is decent

Question 27

NMS (non-maximum suppression)

Accepted Answer

detectors output multiple overlapping boxes for one object. NMS selects the highest scoring box and deletes all other boxes that overlap it significantly

Question 28

evaluation (mAP)

Accepted Answer

calculate precision vs. recall curve. area under curve = average precision (AP). mean AP averages across all classes

Question 29

R-CNN

Accepted Answer

1. generate regions using external tool (selective search) 2. warp regions to fixed size. 3. pass each region through a CNN. 4. Classify with SVM. Flaw: extremely slow

Question 30

Fast R-CNN

Accepted Answer

1. Pass the whole image through CNN to get a feature map. 2. Project region proposals onto the feature map. 3. RoI Pooling to extract fixed-size features for each region. 4. FC layers for class + box offsets. Flaw: still relies on selective search

Question 31

Faster R-CNN

Accepted Answer

replaces external proposal generator with region proposal network. fully end-to-end.

Question 32

Mask R-CNN

Accepted Answer

adds a third branch branch to Faster R-CNN to predict a pixel-level binary mask for the object. bilinear interpolation

Question 33

YOLO

Accepted Answer

divide image into a coarse grid. each grid cell directly predicts class probabilities and bounding boxes.

Question 34

SSD

Accepted Answer

predicts boxes from multiple different convolutional feature maps at different resolutions

Question 35

RetinaNet & Focal loss

Accepted Answer

down weight loss for easy samples so network focus on hard examples

Question 36

vanilla RNN

Accepted Answer

h_t = tanh(W_hh*h_t-1 + Wxh*xt) hidden state acts as memory

Question 37

vanishing/exploding gradients

Accepted Answer

because the exact same weight matrix W is multiplied repeatedly across time steps, gradients shrink to 0 or grow to inf

Question 38

bi-directional RNN

Accepted Answer

process sequence both ways, concating hidden states. good when you need future context

Question 39

LSTM

Accepted Answer

adds a separate cell state c_t. uses sigmoid gates to decide updating memory

Question 40

GRU (gated recurrent unit)

Accepted Answer

simplifiedLSTM. merges cell state and hidden state. merges forget and input gates into single update gate

Question 41

beam search

Accepted Answer

keep top k best overall sequences at every step

Question 42

seq2seq

Accepted Answer

standard encoder-decoder bottlenecks the entire input sequence into a single hidden vector

Question 43

Attention(Q,K,V)

Accepted Answer

softmax(QK^T/sqrt(D)) V

Question 44

query

Accepted Answer

what the current token is looking for

Question 45

key

Accepted Answer

what the other tokens contain

Question 46

value

Accepted Answer

actual information the token will pass along if selected

Question 47

why divide by sqrt(D)

Accepted Answer

prevent vanishing gradients, maintaining consistent variance

Question 48

encoder self-attention

Accepted Answer

Q,K,V all come from previous encoder layer. every token looks at eveyr other token

Question 49

decoder self-attention

Accepted Answer

Q,K,V come from previous decoder layer. masked so token can only look at past tokens

Question 50

encoder-decoder (cross) attention

Accepted Answer

Q comes from decoder. K and V come from encoder's final output

Question 51

transformer memory

Accepted Answer

O(N^2) w.r.t. sequence length

Question 52

positional encoding

Accepted Answer

adds sine/cosine waves of different frequencies to the input empeddings

Question 53

RoPE (Rotary Positional Embedding)

Accepted Answer

instead of adding position to embeddings, it rotates Q and K vectors in space.

Question 54

ViT (vision transformer)

Accepted Answer

split image into a grid of non-overlapping patches, flatten each patch, apply linear projection, add positional embeddings, feed them into Transfer encoder

Question 55

Swin (shifted window transformer)

Accepted Answer

solves quadratic cost of ViT. computes self-attention only within local windows. Merges patches in deeper layers

Question 56

Markov decision process

Accepted Answer

states (s), actions (a), transition model P(s' | s, a), and reward function r(s). Next state s' depends only on the current state s and action a

Question 57

Bellman-equations

Accepted Answer

V(s)=r(s) + discount * max_aE[V(s')]
Q(s,a) = r(s) + discount * E[max_a Q(s', a')]

Question 58

Q learning

Accepted Answer

uses neural net to approximate Q values. experience replay buffer and frozen target network to prevent instability

Question 59

double DQN

Accepted Answer

Standard DQN uses the max operator to both select and evaluate actions, causing overestimation. Double DQN uses the online network to select the action, and the target network to evaluate its value.

Question 60

dueling DQN

Accepted Answer

splits network head into two paths: one predicts the general state values V(s) other predicts advantage of each action A(s, a).

Question 61

REINFORCE

Accepted Answer

Learns the policy directly pi(a | s). Multiply gradient of the log-probability of an action by the actual reward received.

Question 62

Actor-Critic

Accepted Answer

Combines policy gradients (actor) with Q-learning (critic).

Question 63

PPO (proximal policy optimization)

Accepted Answer

uses clipped surrogate objective that prevents new policy from changing too much from the old policy. allows safe reuse of mini-batches.

"Know" box contains:
Time elapsed:
Retries:

Deep Learning final