click below
click below
Normal Size Small Size show me how
DMBOK - Chp 6
Data Storage
| Question | Answer |
|---|---|
| What is the definition of data storage and operations? | The design, implementation, and support of stored data to maximize it's value. |
| What are the two sub-activities of Data Storage and Operations? | Database support and database technology support |
| What are the 4 inputs to Data Storage & Ops | Data Architecture; Data Requirements, Data Models; Service Level Agreements |
| Definition of database? | Any collection of stored data regardless of structure or content. |
| What type of improvement work should be done in collaboration with data modeling and data architecture? | Automating database development processes, developing tools, and processes that shorten each development cycle, reduce errors and rework and minimize the impact on the development team. |
| Developing and promoting the use of abstracted and reusable data objects prevents what kind of issue in the database? | Developing and promoting the use of abstracted and reusable data objects prevents applications from being tightly coupled to database schemas - 'object-relational impedance mismatch' |
| Database Administrators and developers need to agree on accepted methods for doing what? | Methods of ensuring data integrity and data security. |
| What is the definition of an instance? | An execution of database software controlling access to a certain area of storage. An organization will usually have multiple instances executing concurrently, using different areas of storage. Each instance is independent of all other instances. |
| What is the definition of node? | An individual computer hosting either processing or data as part of a distributed database . |
| What is the definition of schema? | A subset of a database objects contained within the dtabase or an instance. Schemas are used to organize objects into more manageable parts. |
| What is the definition of database abstraction? | A common application interface (API) is used to call database functions, such than an application can connect to multiple different databases without the programmer having to know all function calls for all possible databases. |
| Data lifecycle management involves implement which 5 main policies and procedures. | Acquisition, migration, retention, expiration, and disposition. |
| What are the 3 characteristics of a process for moving database changes to production environment? | Processes should be controlled, documented and auditable |
| What are the 3 types of DBAs described in this section? | Production, Application, Procedure and Development |
| What are some of the primary data operations management responsibilities for Production DBAs? | Ensure performance and reliability; error reporting; implement backup & recovery mechanisms; implement clustering and failover mechanisms if continual data availability is a requirement; execute other database maintenance activities such as archiving; |
| Which role creates the deliverable of mechanisms and processes for monitoring database performance such as workloads and data volume. | Production database administrator |
| How is an application DBA different from a Production DBA? | Application DBA is responsible for all the environments of one database while a Production DBA is responsible for data operations management. |
| Which type of DBA leads the review and administration of procedural database objects? | A procedural DBA |
| Which type of DBA leads the designing, creating, and managing of special use databases such as a training 'sandbox'? | A development DBA |
| What element of data storage are Network Storage Administrators concerned with? | NSAs are concerned with the hardware and software supporting data storage arrays as they have different needs and monitoring requirements. |
| What are the two types of database architecture types? | A database can be classified as either centralized or distributed. |
| How is a centralized database different from a distributed database? | A centralized system manages a single database while a distributed system manages multiple databases on multiple systems. |
| What are the 2 classifications for distributed systems? | Federated and non-federated |
| An integration of component database mgmt systems that are not autonomous. They are controlled, managed and governed by a centralized DBMS. | Non-federated database system. |
| Database type that is best for heterogeneous and distributed integration projects such as enterprise information integration. | Federated databases |
| A type of federated database used to securely manage financial transactions, contract management, or health information. | Blockchain database |
| The two basic types of database processing | ACID & BASE |
| What does the acronym ACID stand for? | Atomicity Consistency Isolation Durability |
| Atomicity (A of the ACID acronym) | All operations are performed or none of them are so if one part of the transaction fails, the entire transaction failes |
| Consistency (C of the ACID acronym) | The transaction must meet all rules defined by the system at all times and must void half-completed transactions |
| Isolation (I of the ACID acronym) | Each transaction is independent unto itself |
| Durability (D of the ACID acronym) | Once complete, the transaction cannot be undone. |
| BASE | Basically Available, Soft state, Eventual consistency |
| Basically Available (BA of the BASE acronym) | System guarantees some level of availability to the data even when there are node failures. |
| Soft state (S of the BASE acronym) | The data is in a constant state of flux; while a response may be given, the data is not guaranteed to be current |
| Eventual consistency (E of the BASE acronym) | The data will eventually be consistent through all nodes and in all databases, but not every transaction will be consistent at every moment. |
| CAP Theorem | A distributed system cannot comply with all parts of ACID at all times. At most 2 of the 3 properties of consistency, availability, and partition tolerance can exist in any shared-data system. |
| A very stable method of storing data persistently | Disk and Storage Area Networks (SAN) |
| Permanent story loaded into volatile memory when the system is turned on, and all processing occurs within the memory array. | IMDB (In-Memory Database) |
| Designed to handle data sets in which data values are repeated to a great extent. | Columnar-based databases |
| Combines access speed of memory-based storage with the persistence of disk-based storage | Flash memory |
| Production environment | The technical environment where all business processes occur. |
| Pre-production environments | Environments are used to develop and test changes before such changes are introduced to the production environment. |
| Development environments | This environment is used to create and test code for changes in separate environments, which then are combined the QA environment for full integration testing. |
| Test environment | Used to execute quality assurance and user acceptance testing. |
| Sandbox or experimental environment | An alternate environment that allows read-only connections to production data and can be managed by the users. |
| The 3 general ways databases are organized | hierarchical, relational, non-relational |
| Most rigid of the 3 organizational models | hierarchical |
| Relational databases | Based on set theory and relational algebra, where data elements or attributes are related into tuples (rows) |
| Multidimensional database technologies | Data stored in a structure that allows searching using several data element filter simultaneously. |
| Temporal database | A relational database with built-in support for handling data involving time usually valid time and transaction time. |
| Non-relational databases | A database where the data structure is no longer bound to a tabular relational design. |
| Column-oriented database | A non-relational database is mostly used in Business Intelligence applications as they can compress redundant data. |
| Spatial database | A database that is optimized to store and query data that represents objects defined in a geometric space |
| Object/Multi-media | Includes a hierarchical storage management system for the efficient management of a hierarchy of magnetic and optical storage media. |
| Flat file database | Any of various means to encode a data set as a single file such as a plain text file or a binary file. |
| Key-Value Pair database | A database contains sets of two items: a key identifier and a value. |
| Triplestore databases | A data entry composed of subject-predicate-object where the subject denotes a resource, the predicate expresses a relationship between the subject and the object, and the object itself. |
| Specialized databases | Designed for specialized situations that are managed differently from traditional relations databases such as CAD/CAM, and GIS |
| Archiving | The process of moving data off immediately accessible storage media and onto media with lower retrieval performance. |
| Partitioning | The archival process must be aligned with this strategy to ensure optimal availability and retention. |
| Tests of archive restoration | It is wise to schedule these regularly so as to avoid surprises in an emergency |
| When thinking about capacity and growth projections what three elements need to be considered? | Size to hold all data, how much data goes in and how quickly, and how much data comes out and how quickly |
| Change data capture | The process of detecting that data has changed and ensuring that information relevant to the change is stored appropriately. |
| Purging | The process of completely removing data from storage media such that it cannot be recovered. |
| Goal of data storage management | The cost of maintaining data should not exceed its value to the organization. |
| Replication | the same data is stored on multiple storage devised. |
| Active Replication | Recreating and storing the same data at every replica from every other replica |
| Passive Replication | Recreating and storing the data on a single primary replica from and then transforming its resultant state to other secondary replicas |
| Horizontal data scaling | Data scaling that uses more replicas |
| Vertical data scaling | Data scaling that has data replicas located further away geographically |
| Multi-master replication | Updates can be submitted to any database node and then ripple through to other servers. |
| Replication transparency | When data is replicated between database servers so that the information remains consistent throughout the database system and users cannot tell or even which database copy they are using. |
| Mirroring replication pattern | Updates to the primary database are replicated immediately to the secondary database as part of a two-phase commit process. |
| Log shipping replication pattern | When a secondary server receives and applies copies of the primary database's transaction logs at regular intervals. |
| Resiliency | The measure of how tolerant a system is to error conditions. |
| Highly resilient | If a system can tolerate a high level of processing errors and still function as expected, it is high resilient. |
| Resilient | A database that can detect and either abort or automatically recover from common processing errors. |
| Not resilient | A database that crashes upon the first unexpected condition. |
| 3 recovery types | Immediate, critical, non-critical |
| Data processing errors | data load failures, query return failures, obstacles to completing ETL or other processes |
| Methods to increase resiliency | Trap and re-route, detect and ignore, and flags |
| Sharding | A process where small chunks of the database are isolated and can be updated independently of other shards so replication is merely a file copy. |
| What are the tw main activities in Data Operations and Storage? | Database technology support and database operations support. |
| Database technology support | The selecting and maintaining the software that stores and manages the data. |
| Factors to consider when selecting DBMS software | Architecture and complexity; volume, velocity, application profile , business intelligence, hardware platform, operating system support; availability of supporting tools; performance benchmarks, scalability, memory and storage requirements |
| What elements other than data should be considered when planning for capacity and growth? | Other elements to be included in the planning are indexes, logs, and any redundant images such as mirrors. |
| Part of managing databases is defining the requirements. What are the 3 types of requirements? | Storage, Usage and Access |
| Business continuity | A plan for the business to continue in the event of a disaster or adverse event that impacts systems and their ability to use their data. |
| What two factors must be balanced when making plans for creating back ups | The importance of the data must be balanced against the cost of protecting it. |
| How should back up files be stored? | A separate filesystem, separate storage medium, in a secure off-site facility |
| Configuration identification process | To identify the attributes that define every aspect of a configuration for end-user purposes. |
| Configuration change control process | Set of processes and approval stages required to change configuration item's attributes and to re-baseline them. |
| Configuration status accounting | Ability to record and report on the configuration baseline associated with each configuration item at any point in time. |
| Configuration audits | A physical configuration audit ensures that a configuration item is installed in accordance with the requirements. A functional configuration audit ensures that performance attributes of a configuration item are achieved. |
| Physical data model | Storage objects, indexing objects and any encapsulated code objects required to enforce data quality rules, connect database objects, and achieve database performance. |
| Database performance depends on which two interdependent facets? | Availability and speed |
| Availability | The percentage of time that a system or database can be used for productive work. |
| Recoverability | The ability to reestablish service after interruption, and correct errors caused by unforeseen events or component failures. |
| Reliability | The ability to deliver service at specified levels for a stated period |
| Serviceability | The ability to identify the existence of problems, diagnose their causes, and repair/solve them |
| Built-in query optimizer (DB issues) | Uses stored statistics about the data and indexes to make decisions about how to execute a given query most effectively. |
| Poor coding (DB issues) | The most common cause of poor database performance. |
| Inefficient complex table joins (DB issues) | Uses views to pre-define complex table joins. Unlike stored procedures, these are opaque to the query optimizer |
| Insufficient indexing (DB issues) | Create the indexes necessary to support complex queries and queries involving large tables |
| Application activity (DB issues) | Applications should be running on a server separate from the Database mgmt system to avoid database performance issues. |
| Overloaded servers (DB issues) | For DBMS supporting multiple databases and applications there may be a breaking point. |
| Database volatility (DB issues) | Large numbers of table inserts and deletes over a short while can create inaccurate database distribution statistics. |
| Runaway queries (DB issues) | User may unintentionally submit queries that use a majority of the system's shared resources. |
| Test data set | Data that has been specifically identified to test a system. This can include verifying that a given set of inputs produces the expected output or challenging the ability of programming to respond to unusual or extreme input. |
| Data migration | The process of transferring data between storage types, formats or computer systems with as little change as possible. |
| Phases of data migration | Design, extraction, remediation, load, verification |
| Tools for managing databases | Modeling, monitoring, managing, and support |
| Test in lower environments | The best practice of testing upgrades and patches on the lowest level environment first - usually development and installing on the production environment last. |
| Risk and readiness revolves around which two central ideas? | Risk of data loss and risks related to technology readiness |
| Common data storage metrics | Count of DBs by type; Aggregated transaction statistics; capacity metrics (amount of storage used, number of storage containers, number of data objects); storage service usage; requests made against the storage services; improvement to performance |
| Performance metrics | transaction frequency and quantity; query performance; API service performance |
| Operational metrics | Aggregated statistics about retrieval time; backup size; data quality measurement; availability |
| Service metrics | issue submission, resolution, escalation count by type; issue resolution time |
| Information asset tracking | Ensuring that an organization complies with all licensing agreements and regulatory requirements through yearly audits of software license and annual support costs |
| Data audit | Evaluation of a data set based on defined criteria. Usually performed to investigate specific concerns about a data set to determine if the data was stored in compliance with contractual and methodological requirements |
| Data validation | The process of evaluating stored data against established acceptance criteria to determine its quality and usability. |