Question 1

________ comprises facts, observations, or perceptions

Accepted Answer

Data

Question 2

A ________ is a statement of some element of truth about a subject mater or a domain

Accepted Answer

Fact

Question 3

Alone, _______ represents raw numbers or assertions, and may therefore be devoid of context, meaning, or intent. However, it can easily be captured, stored, and communicated using electronic or other media.

Accepted Answer

Data

Question 4

________ is processed data that is in a form that is useful for making decisions

Accepted Answer

Information

Question 5

_______ typically involves the manipulation of raw _______ to obtain a more meaningful indication of trends or patterns in the data

Accepted Answer

Information/data

Question 6

Whether certain facts are considered ________  or only _______ depends on the individual who is using those facts

Accepted Answer

information/data

Question 7

The problem with too much ______ is that it offers no judgement and no basis for action

Accepted Answer

data

Question 8

The practical pursuit of computerized information retrieval began in the late _______

Accepted Answer

1940s

Question 9

A great increase in the production of specific literature coupled with the availability of computers led to interest in automatic ___________

Accepted Answer

document retrieval

Question 10

____________ is finding material (usually documents) of an unstructured nature (usually text) that satisfy an information need from within large collections (usually stored on computers)

Accepted Answer

Information Retrieval (IR)

Question 11

The term "___________" refers to data which does not have clear, semantically overt, easy-for-a-computer structure. It is the opposite of ___________, the canonical example of which is a relational database

Accepted Answer

unstructured data/structured data

Question 12

IR is also used to facilitate "__________" searches such as finding a document where the Title contains Java and the Body contains threading

Accepted Answer

semi-structured

Question 13

In the traditional model used in the field of information retrieval, information is organized into __________, and it is assumed that that there is a large number of them

Accepted Answer

documents

Question 14

Data contained in documents is __________ (without any associated schema)

Accepted Answer

unstructured

Question 15

Traditional examples of _________ systems are online library catalogs and online document-management systems. Such as those that store newspaper articles

Accepted Answer

information-retrieval

Question 16

By documents, we mean whatever __________ we have decided to build a retrieval system over. They might be individual memos or chapters of a book

Accepted Answer

units

Question 17

We will refer to the group of documents over which we perform retrieval as the ________. It is sometimes also referred to as a corpus

Accepted Answer

collection

Question 18

Information retrieval has also played a critical role in making the ________ a productive and useful tool

Accepted Answer

Web

Question 19

In the context of the Web, each _________ page is considered to be a document

Accepted Answer

HTML

Question 20

Documents are associated with a set of _____________

Accepted Answer

keywords

Question 21

_________-based information retrieval can be used not only for retrieving textual data, but for retrieving other types of data (such as video and audio data)

Accepted Answer

Keyword

Question 22

In ______ ______ retrieval, all the words in each document are considered to be keywords

Accepted Answer

full text

Question 23

_________ is the task of coming up with a good grouping of the documents based on their contents

Accepted Answer

Clustering

Question 24

Does clustering or classification have an unknown number of final groupings?

Accepted Answer

Clustering

Question 25

Does clustering or classification have a known number of classes?

Accepted Answer

Classification

Question 26

_________ is the task of deciding to which class(es), if any, each of a set of documents belongs

Accepted Answer

Classification

Question 27

Solutions for text information retrieval are generally not effective for _____, ______, or ______ information retrieval, unless the media object is associated with a textual description

Accepted Answer

image, audio, or video

Question 28

The dominant mode of text search is by its _________ in order to satisfy an information need

Accepted Answer

content

Question 29

In the most common mode of searching for text, the information need is represented by a ______, and the user may issue several _______ in pursuit of one information need

Accepted Answer

query/queries

Question 30

Our goal is to develop a system to address the ________ task (the most standard IR task)

Accepted Answer

ad hoc retrieval

Question 31

In the  ___________ task, a system aims to provide documents from within the collection that are relevant to an arbitrary user information need, communicated to the system by means of a one-off, user initiated query

Accepted Answer

ad hoc retrieval

Question 32

________: Topic which the user desires to find

Accepted Answer

Information need

Question 33

______: What the user conveys to the computer in an attempt to communicate the information need. May be incrementally developed until a user obtains desired results

Accepted Answer

Query

Question 34

Query vs Information Need
__________: Is drinking red wine effective at reducing the risk of a heart attack?
__________: "red" and "wine" and "heart" and "attack"

Accepted Answer

Information Need
Query

Question 35

The primary challenge in information retrieval is the ________ between the language of the user and the language of the author

Accepted Answer

difference/mismatch

Question 36

A(n) _________ is a system that ingests information, transforms it into a searchable format, and provides an interface for a user to search and retrieve information. This includes both hardware and software

Accepted Answer

IR system

Question 37

The overall goal of a(n) _____________ is to provide the information needed to satisfy the user's question while minimizing the user overhead in locating the information value

Accepted Answer

information retrieval system

Question 38

Information retrieval system architecture can be segmented into four major processing subsystems: ______, ______, ______, and ______

Accepted Answer

ingest, index, search, and display

Question 39

Which of the four IR  processing subsystems is concerned with the acquisition and initial normalization and processing of the source items?

Accepted Answer

Ingest

Question 40

Which of the four IR processing subsystems is concerned with taking the normalized item's processing tokens and metadata and creating the searchable index from it?

Accepted Answer

Index

Question 41

Which of the four IR processing subsystems is concerned with mapping the user search information need into a form that can be processed as defined by the searchable index and determining which items are to be returned to the user?

Accepted Answer

Search

Question 42

Which of the four IR processing subsystems is concerned with how the user can locate the items of interest in all of the possible results returned?

Accepted Answer

Display

Question 43

IR systems have much in common with _______ systems.: documents are stored in a repository, and an index is maintained; queries are evaluated utilizing the index to identify matches which are then returned to the user.

Accepted Answer

database

Question 44

The ______ in the field of information retrieval is different from that in database systems

Accepted Answer

emphasis

Question 45

A document matches an information need if the user perceives it to be ______

Accepted Answer

relevant

Question 46

_________ return all matching records, while _________ return a fixed number of matches, which are ranked by their statistical similarity

Accepted Answer

Database systems/search engines

Question 47

Updates are not as common in traditional _____ systems as in traditional ____ systems

Accepted Answer

IR/DB

Question 48

The ____________ (also known as exact-match retrieval) was used by the earliest search engines

Accepted Answer

Boolean retrieval model

Question 49

The Boolean retrieval model assumes that relevance is ______ (either relevant or not relevant)

Accepted Answer

binary

Question 50

In the Boolean Retrieval Model, we can pose any query which is in the form of a Boolean expression of terms: terms which are combined with the operators ______, ______, ______

Accepted Answer

AND, OR, NOT

Question 51

Simplest form of document retrieval is for a computer to scan through all the text of each document (commonly referred to as _______ through text)

Accepted Answer

grepping

Question 52

Avoid linearly scanning texts for each query by indexing the documents in advance by building an ______ matrix

Accepted Answer

incidence

Question 53

The structure of an incidence matrix includes: ______, ______, and a _______

Accepted Answer

rows, columns, and a matrix element (i,j)

Question 54

In an incidence matrix, do rows or columns correspond to words that appear in the collection (sorted alphabetically)?

Accepted Answer

Rows

Question 55

In an incidence matrix, do rows or columns correspond to documents that appear in the collection?

Accepted Answer

Columns

Question 56

In an incidence matrix, the matrix element (i,j) is ______ if the document in column j contains the word in row i, and is ______ otherwise

Accepted Answer

true/false

Question 57

__________ (or inverted file): Central concept in information retrieval

Accepted Answer

Inverted index

Question 58

A(n) ____________ consists of two major components: The search structure (the dictionary) and a set of inverted lists

Accepted Answer

inverted index

Question 59

The idea of a(n) __________ is that the lists contain the IDs of the documents that contain the corresponding vocabulary term

Accepted Answer

inverted index

Question 60

Each item in the inverted index list is conventionally called a _______. Each of which contains document ID and number of times a term appears in a document

Accepted Answer

posting

Question 61

Steps in building a(n) ____________:
1) collect the documents to be indexed
2) tokenize the text, turning each document into a list of tokens
3) linguistic preprocessing 
4) Index the documents

Accepted Answer

Inverted Index

Question 62

______________ - Each document is a list of normalized tokens (jumps & jump), which are the indexing terms

Accepted Answer

Linguistic preprocessing

Question 63

When building inverted indices, sort the list so that the terms are ________

Accepted Answer

alphabetical

Question 64

To build an index, it is necessary to determine what the document ______ for indexing is

Accepted Answer

unit

"Know" box contains:
Time elapsed:
Retries:

4315 Exam 1