click below
click below
Normal Size Small Size show me how
DMBOK - Chp 5
Data Modeling and Design
| Question | Answer |
|---|---|
| Data Modeling | The process of discovering, analyzing, and scoping data requirements, and then representing and communicating these data requirements in a precise form called the data model. |
| Goal of data modeling and design | To confirm and document an understanding of different perspectives, which leads to applications that more closely align with current and future business requirements, and creates a foundation to successfully complete broad-scoped initiatives. |
| Data modeling and design business drivers | Provide a common vocabulary around data Capture and document explicit knowledge about an organization's data and systems Serve as a primary communication tool during projects |
| Confirming and documenting understanding of different perspectives facilitates | Formalization - a data model documents concise definition of data structures and relationships Scope - a data model can help explain the boundaries for data context Knowledge retention - a data model can preserve corporate memory regarding a system |
| A data model describes an organization's | data as the organization understands it, or as the organization wants it to be. The model is a form of documentation for data requirements and data definitions resulting from the modeling process. |
| Types of data that can be modeled | Category information Resource information Business event information Detail transaction information |
| Data model components | Entities Relationships Attribute Domain |
| Data entity | a thing about which an organization collects information, sometimes referred to as the nouns of an organization; answers fundamental questions of who, what, when, where, why, or how |
| Entity type | A common alias for entity, as a type of something being represented; e.g., Jane is an entity and Employee is the entity type |
| Entity instance | Occurrences or values of a particular entity |
| Aliases for entity in other schemes | Dimensional schemes: dimension and fact table Object-oriented: class or object Time-based: hub, satellite, and link NoSQL: document or node |
| Entity definitions | Essential contributors to the business value of any data model. Clarify the meaning of business vocabulary and provide rigor to the business rules governing entity relationships. |
| Data relationship | The association between entities. |
| Aliases for relationships in other schemes | Dimensional: navigation path NoSQL: edge or link |
| Relationship cardinality | Captures how many of one entity (entity instances) participates in the relationship with how many of the other entity. Options are zero, one, or many |
| Arity of Relationships | Number of entities in a relationship. Common examples are unary, binary, and ternary |
| Unary (Recursive) Relationship | Self-referencing relationship involving only one entity. A one to many relationship describes a hierarchy, whereas a many-to-many describes a network or graph. |
| Binary Relationship | Most common on a traditional data model diagram, involves two entities. |
| Ternary Relationship | A relationship that includes three entities. |
| Foreign Key | Used in physical and sometimes logical relational data modeling schemes to represent a relationship. |
| Data attribute | A property that identifies, describes, or measures an entity. The physical correspondent is a column, field, tag, or node in a table, view, document, graph, or file. |
| Data identifier | Also called a key, set of one or more attributes that uniquely defines an instance of an entity. |
| Construction-type Keys | Simple Surrogate Compound Composite |
| Simple key | One attribute that uniquely identifies an entity instance |
| Surrogate key | A unique identifier for a table |
| Compound key | Set of two or more attributes that together uniquely identify an entity instance |
| Composite key | Contains one compound key and at least one other simple or compound key or non-key attribute. E.g., key on a multi-dimensional fact table |
| Function-type Keys | Super key Candidate key Business key Primary key Alternate key |
| Super key | Any set of attributes that uniquely identify an entity instance |
| Candidate key | Minimal set of one or more attributes that identifies the entity instance to which it belongs; minimal means that no subset of the candidate key uniquely identifies the entity instance |
| Business key | One or more attributes that a business professional would use to retrieve a single entity instance |
| Primary key | The candidate key that is chosen to be the unique identifier for an entity |
| Alternate key | candidate key that although unique, was not chosen as the primary key |
| Independent entity | Entity where the primary key contains only attributes that belong to that entity |
| Dependent entity | Entity where the primary key contains at least one attribute from another entity; Has at least one identifying relationship (primary key of the parent is migrated as a foreign key to the child) |
| Data domain | The complete set of possible values that an attribute can be assigned. |
| Domain types | Data Type (e.g., integer, date, etc) Data Format (e.g., alpha only, allowable characters, etc) List Range Rule-based (e.g., attribute 1 must be > attribute 2) |
| Data Modeling Schemes | Relational Dimensional Object-Oriented (UML) Fact-Based Time-Based NoSQL |
| Relational scheme | Provides a systematic way to organize data so that they reflect their meaning. Reduces redundancy in data storage. Most effectively managed in terms of two-dimensional relations. |
| Dimensional scheme | Capture business questions focused on a particular business process. Data is structured to optimize the query and analysis of large amounts of data. |
| Fact Table | Used within dimensional scheme, rows correspond to particular measurements and are numeric. |
| Dimension Table | Used within dimensional scheme, table represent the important objects of the business and contain mostly textual descriptions. Serve as the primary source for 'query by' or 'report by' constraints. |
| Snowflaking | Normalizing the flat, single-table, dimensional structure in a star schema into the respective component hierarchical or network structures |
| Grain | Description of a single row of data in a fact table |
| Conformed Dimensions | Built with the entire organization in mind instead of just a particular project; this allows these dimensions to be shared across dimensional models, due to containing consistent terminology and values |
| Conformed Facts | Use standardized definitions of terms across individual marts. |
| Object-Oriented scheme (UML) | Unified Modeling Language is a graphical language for modeling software. |
| Fact-Based Modeling (FBM) | Family of conceptual modeling languages, based in the analysis of natural verbalization (plausible sentences) that might occur in the business domain. |
| Object Role Modeling (ORM or ORM2) | Model-driven engineering approach that starts with typical examples of required information or queries presented in any external formulation familiar to users, and then verbalizes these examples at the conceptual level, in terms of simple facts |
| Time-Based Scheme | Used when data values must be associated in chronological order and with specific time |
| Data Vault | Detail-oriented, time-based, and uniquely linked set of normalized tables that support one or more functional areas of business. Hybrid approach between third normal form and star schema |
| NoSQL | Category of databases built on non-relational technology. Four main types: document, key-value, column-oriented, and graph |
| Document (NoSQL) | Instead of taking a business subject and breaking it up into multiple relational structures, they frequently store business subject in one structure called a docuent. |
| Key-Value (NoSQL) | Allow an application to store its data in only two columns with the feature of storing both simple and complex information. |
| Column-oriented (NoSQL) | Similar to RDBMS but can work with more complex data types including unformatted text and imagery. |
| Graph (NoSQL) | Designed for data whose relations are well represented as a set of nodes with an undetermined number of connections between these nodes. |
| Conceptual data model | Captures the high-level data requirements as a collection of related concepts. It contains only the basic and critical business entities within a given realm and function, with a description of each entity and the relationship between entities. |
| Logical data model | Detailed representation of data requirements, usually in support of a specific usage context, such as application requirements. Independent of any technology or specific implementation constraints. Extended from the conceptual model by adding attributes. |
| Physical data model (PDM) | Represents detailed technical solution, often using the logical model as a starting point and then adapted to work within a set of hardware, software, and network tools. Built for a particular technology. |
| Canonical model | Used for data in motion between systems. This model describes the structure of data being passed as packets or messages. A variant of a physical scheme. |
| View | A virtual table; provides a means to look at data from one or many tables that contain or reference the actual attributes. |
| Partitioning | the process of splitting a table. Vertically: create subset of columns Horizontally: create subset of rows |
| Denormalization | The deliberate transformation of normalized logical data model entities into physical tables with redundant or duplicate data structures. E.g., intentionally puts one attribute in multiple places. |
| Reasons to denormalize | Combining data from multiple other tables in advance to avoid costly run-time joins Creating smaller, pre-filtered copies of data to reduce costly run-time calculations Pre-calculating and storing costly data calculations Enforce user security |
| Normalization | The process of applying rules in order to organize business complexity into stable data structures. The basic goal is to keep each attribute in only one place to eliminate redundancy and inconsistencies. |
| First normal form (1NF) | Ensures each entity has a valid primary key, and every attribute depends on the primary key; removes repeating groups, and ensures each attribute is atomic (not multi-valued.) |
| Second normal form (2NF) | Ensures each entity has the minimal primary key and that every attribute depends on the complete primary key. |
| Third normal form (3NF) | Ensures each entity has no hidden primary keys and that each attribute depends on no attributes outside the key. The term 'normalized model' usually means the data is in 3NF. |
| Boyce/Codd normal form (BCNF) | Resolves overlapping composite candidate keys. |
| Fourth normal form (4NF) | Resolves all many-to-many-to-many relationships (and beyond) in pairs until they cannot be broken down into any smaller pieces. |
| Fifth normal form (5NF) | Resolves inter-entity dependencies into basic pairs, and all join dependencies use parts of primary keys |
| Abstraction | the removal of details in such a way as to broaden applicability to a wide class of situations while preserving the important properties and essential nature from concepts or subjects. |
| Generalization (method of abstraction) | Groups the common attributes and relationships of entities into supertype entities |
| Specialization (method of abstraction) | Separates distinguishing attributes within an entity into subtype entities |
| Deliverables of data modeling process | Diagram Definitions Issues and outstanding questions Lineage |
| Creating conceptual data model steps | Select scheme Select notation complete initial conceptual data model Incorporate enterprise terminology Obtain sign-off |
| Creating logical data model steps | Analyze information requirements Analyze existing documentation Add associative entities Add attributes Assign domains Assign keys |
| Creating physical data model steps | Resolve logical abstractions Add attribute details Add reference data objects Assign surrogate keys Denormalize for performance Index for performance Partition for performance Create views |
| Reverse engineering | The process of documenting an existing database, starting with the physical data model and working towards logical and conceptual data models. |
| Tools to assist data modelers | Data Modeling Lineage Data profiling Metadata repositories Data Model patterns Industry data models |
| ISO 11179 Metadata Registry | international standard for representing Metadata in an organization. |
| Database design principles | Performance and ease of use Reusability Integrity Security Maintainability |
| Data professionals must balance | data requirements of the information consumers and the application requirements of data producers |
| Design reviews should be held with | subject matter experts representing different backgrounds, skills, expectations, and opinions. |
| Components of Data Model and Design quality management | Develop Data Modeling and Design Standards Review Data Model and Database Design Quality Manage Data Model Versioning and Integration |