click below
click below
Normal Size Small Size show me how
DMBOK - Chp 11
Data Warehousing and Business Intelligence
| Term | Definition |
|---|---|
| Primary driver for data warehousing is | to support operational functions, compliance requirements, and Business Intelligence activities. |
| Goals of data warehousing | Support BI activity Enable effective business analysis and decision making Find ways to innovate based on insights from their data |
| Data warehousing guiding principles | Focus on business goals Start with the end in mind Think and design globally, act and build locally Summarize and optimize last, not first Promote transparency and self-service Build metadata with the warehouse Collaborate One size does not fit all |
| Business Intelligence (BI) two meanings | Refers to a type of data analysis aimed at understanding organizational activities and opportunities. Refers to a set of technologies that support this kind of data analysis. |
| Data Warehouse primary components | An integrated decision support database and the related software programs used to collect, cleanse, transform, and store data from a variety of operational and external sources. |
| Data Warehousing definition | Describes the operational extract, cleansing, transformation, control, and load processes that maintain the data in a data warehouse. |
| Approaches to Data Warehousing | Stores data from other systems Includes organizing data in ways that increase its value Makes data accessible and usable for analysis Makes reliable, integrated data available to authorized stakeholders Serves multiple purposes |
| Corporate Information Factory (CIF) - Bill Inmon | a subject oriented, integrated, time variant, and nonvolatile collection of summary and detailed historical data. |
| Corporate Information Factory (CIF) components | Applications Staging Area Integration and transformation Operational Data Store (ODS) Data marts Operational Data Mart (OpDM) Data Warehouse Operational reports Reference, Master, and external data |
| Dimensional Data Warehouse (Kimball) | A copy of transaction data specifically structured for query and analysis. Warehouse data is stored in a dimensional data model. |
| Star schema | Dimensional model comprised of facts, which contain quantitative data about business processes, and dimensions, which store descriptive attributes related to fact data. |
| Dimensional Data Warehouse components | Operational source systems Data staging area Data presentation area Data access tools |
| Data Warehouse storage areas | Staging area (intermediate data store between original source and centralized repository) Reference and Master Data conformed dimensions Central Warehouse Operational Data Store (ODS) Data Marts Cubes |
| Types of data integration processes | Historical data loads Ongoing updates |
| Historical Data (Inmon) | Inmon DW suggests that all data is stored in a single DW layer. This layer will store cleansed, standardized, and governed atomic level data. |
| Historical Data (Kimball) | Kimball DW suggests that the DW is composed of a combination of departmental data marts containing cleansed, standardized, and governed. |
| Historical Data (Data Vault) | cleanses and standardizes as part of the staging process. History is stored in a normalized atomic structure, dimensional surrogate, primary and alternate keys are defined. |
| Batch Change Data Capture (CDC) | DW are often loaded daily and serviced by a nightly batch window. The load process can accommodate a variety of change detection. |
| CDC Techniques | Time-stamped Delta Load Log Table Delta Load Database Transaction Log Message Delta Full Load |
| Near-real-time and Real-time architecture types | Trickle feeds (source accumulation) Messaging (bus accumulation) Streaming (target accumulation) |
| Trickle feeds (source accumulation) | Executes batch loads on a more frequent basis |
| Messaging (bus accumulation) | Extremely small packets of data are published to a bus as they occur. Target systems subscribe to the bus, and incrementally process the packets into the DW |
| Streaming (target accumularion) | Target system collects data as it is received into a buffer area or queue, and processes it in order. |
| Data Warehousing activities | Understand Requirements Define and maintain the DW and BI Architecture Develop the DW and Data Marts Populate the DW Implement the BI Portfolio Maintain data products |
| The best DW/BI architectures will | design a mechanism to connect back to transactional level and operational level reports in an atomic DW |
| Typical DW/BI project development tracks | Data: necessary to support the analysis Technology: back-end systems and processing Business Intelligence tools: suite of applications necessary for data consumers to gain meaningful insight |
| Source-to-target mapping | establishes transformation rules for entities and data elements from individual sources to a target system. The most difficult part of any mapping effort is determining valid links or equivalencies between data elements in multiple systems. |
| Data remediation or cleansing activities | enforce standards and correct and enhance the domain values of individual data elements. |
| Data transformation focuses on | activities that implement business rules within a technical system. |
| DW population key factors | Required latency Availability of resources Batch windows or upload intervals Target databases Dimensional aspects Timeframe consistency of the DW and DM |
| Implementation of BI Portfolio steps | Group users according to needs Match tools to User Requirements |
| Maintain Data Products components | Release Management Manage Data Product Development Lifecycle Monitor and tune Load Processes Monitor and tune BI Activity Performane |
| Data Dictionary | Describes data in business terms and includes other information needed to use the data. Often the content for the data dictionary comes directly from the logical data model. |
| Data and Data Model Lineage purposes | Investigation of the root cause of data issues Impact analysis for system changes or data issues Ability to determine the reliability of data, based on its origin |
| Data Integration Tools | Used to populate a data warehouse Enable scheduling of jobs that account for complex data delivery from multiple sources |
| Business Intelligence Tools Types | Operational Reporting to analyze business trends Business performance management includes formal assessment of metrics aligned with organizational goals Descriptive, self-service analytics to guide operational decisions |
| Operation Reporting | Involves business users generating reports directly from transactional systems, operational applications, or a data warehouse. |
| Business Performance Management (BPM) | Set of integrated organizational processes and applications designed to optimize execution of business strategy. |
| Operational Analytic Applications | Include the logic and processes to extract data from well-known source systems, a data model for the data mart, and pre-built reports and dashboards. |
| Online Analytical Process (OLAP) | refers to an approach to providing fast performance for multi-dimensional analytic queries. Multi-dimensional analysis with cubes is particularly useful where there are well-known ways analysts want to look at summaries of data. |
| Common OLAP operations | Slice: subset of multi-dimensional array corresponding to a single value Dice: A slice on more than two dimensions Drill Down / Up: navigate among levels of data Roll-up: computing all the data relationships for one or more dimensions Pivot: |
| Classic OLAP implementation approaches | Relational OLAP with star schema joins as a common database design technique Multi-dimensional OLAP uses proprietary and specialized multi-dimensional database technology Hybrid OLAP |
| Prototypes to Drive Requirements | Quickly prioritize requirements before the implementation activities begin by creating a demonstration set of data and applying discovery steps in a joint prototype effort |
| Self-Service BI | A fundamental delivery channel within BI portfolio. Typically funnels user activity within a governed portal. Visualization and statistical analysis tooling allows for rapid data exploration and discovery. |
| Audit Data that can be queried | In order to maintain lineage, all structures and processes should have the capability to create and store audit information at a grain useful for tracking and reportings. |
| Readiness Assessment / Risk Assessment | Prerequisite checklist Business support Aligned to business strategy Defined architectural approach |
| Release Roadmap | Warehouses are built incrementally. An incremental approach leveraging the DW bus matrix as a communication and marketing tool is suggested. |
| Configuration Management | Aligns with the release roadmap and provides the necessary back office stitching and scripts to automate development, testing, and transportation to production. |
| DW / BI Implementation Guidelines | 1. Readiness / Risk Assessment 2. Release Roadmap 3. Configuration Management 4. Organizational and Cultural Change |
| Organization and Cultural Change | Align projects behind real business needs with these success factors: Business sponsorship, Business goals and scope, Business resources, and Vision alignment |
| Data Warehouse Governance for regulated industries | Industries that are highly regulated and need compliance-centric reporting will benefit greatly from a well-governed data warehouse. |
| DW Governance critical functions | Those that govern the business-operational discovery or refinement area, and those that ensure pristine quality within the warehouse itself. |
| Enabling Business Acceptance of data | Data should be understandable, have verified quality, and have demonstrable lineage. |
| Customer / User Satisfaction | Perceptions of the quality of data will drive customer satisfaction but satisfaction is dependent on other factors as well, such as data consumers' understanding of the data and the operations team's responsiveness. |
| Reporting Strategy | Includes standards, processes, guidelines, best practices, and procedures. |
| Usage Metrics | DW usage metrics typically include the number of registered users, as well as connected users or concurrent connected users. |
| Subject Area Coverage Percentages | Measure how much of the DW (from a data topology perspective) is being accessed by each department. |