click below
click below
Normal Size Small Size show me how
LLM
| Question | Answer |
|---|---|
| 1. What is the main limitation of context-free word embeddings like Word2Vec and GloVe? A) too computationally expensive B) assign the same embedding to a word regardless of its context C) require labeled data D) can only embed rare words | B Assign the same embedding to a word regardless of its context |
| 2. In transformer models, the purpose of positional encoding is to: A) Reduce model size B) Encode the sequence order of tokens C) Improve attention span D) Remove redundant information | B Encode the sequence order of tokens |
| 3. Which of the following is true about self-attention? A) compares queries from one sequence with keys from different sequence B) compares elements within the same sequence C) ignores position information D) used only in decoder blocks | B It compares elements within the same sequence |
| 4. Which task is BERT primarily designed for? A) Text generation B) Next-word prediction C) Masked Language Modeling D) Image captioning | C Masked Language Modeling |
| 5. What is the role of the query in a cross-attention mechanism? A) To retrieve tokens sequentially B) To query information from keys and values from another modality C) To predict the output labels D) To ignore modality differences | B To query information from keys and values from another modality |
| 6. In CLIP, the Image Encoder is most commonly based on which architecture? A) RNN B) ResNet or Vision Transformer (ViT) C) BERT D) CNN LSTM hybrid | B ResNet or Vision Transformer |
| 7. Which architecture uses masked self-attention to prevent tokens from seeing future tokens during training? A) Encoder Transformers B) Decoder Transformers C) CNNs D) Vision Transformers | B Decoder Transformers |
| 8. In multimodal retrieval, cosine similarity is used to: A) Translate text to image directly B) Compute similarity between image and text embeddings C) Generate embeddings from raw inputs D) Tokenize text efficiently | B Compute similarity between image and text embeddings |
| 9. Which type of tokenization is typically used in modern LLMs to handle rare words efficiently? A) Word-based tokenization B) Character-based tokenization C) Subword tokenization D) Punctuation-based tokenization | C Subword tokenization |
| 10. In Vision Transformers (ViT), the input image is first: A) Flattened into a 1D vector B) Split into patches and embedded C) Directly fed into a CNN D) Tokenized using text tokenizers | B Split into patches and embedded |