click below
click below
Normal Size Small Size show me how
DMBOK - Chp 8
Data Integration & Interoperability
| Question | Answer |
|---|---|
| Data Integration and Interoperability (DII) describes | processes related to the movement and consolidation of data within and between data stores, applications, and organizations. |
| Data Integration and Interoperability is critical to Data Warehousing and BI as well as MRDM because | These focus on transforming and integrating data from source systems to consolidated data hubs and from hubs to target systems where it can be delivered to data consumers. |
| Primary driver for Data Integration and Interoperability | The need to manage data movement efficiently. |
| Data Integration and Interoperability Suppliers | Data Producers IT Steering Committee Executives and Managers SMEs |
| Data Integration and Interoperability Participants | Data Architects Business and Data Analysts Data Modelers Data Stewards ETL, Service, Interface developers Project and Program Managers |
| Data Integration and Interoperability Consumers | Information consumers Knowledge workers Managers and Executives |
| The implementation of DII practices and solutions aims to: | Make data available in the format and timeframe needed Consolidate data physically and virtually Lower cost and complexity of managing solutions with shared models and interfaces Identify meaningful events and automatically trigger alerts and actions |
| ETL | Extract, Transform, and Load |
| The extract process includes | selecting the required data extracting it from its source staging the data |
| The transform process | makes the selected data compatible with the structure of the target data store |
| Examples of transformations | Format change Structure change Semantic conversion De-duping Re-ordering |
| The load process | physically storing or presenting the result of transformations in the target system. |
| ELT (Extract, Load, and Transform) | If the target system has more transformation capability than either the source or an intermediary application system, the order of process may be switched. |
| Mapping (synonym for transformation) | Both the process of developing the lookup matrix from source to target structures and the result of that process. |
| Batch processing | Most data moves between applications and organizations in clumps or files either on request by a human data consumer or automatically on a periodic schedule, often with a significant delay between source and target systems |
| Change Data Capture (CDC) | Method of reducing bandwidth by filtering to include only data that has been changed within a defined timeframe. |
| Near-real-time | Data is processed in smaller sets spread across the day in a defined schedule, with lower latency than batch processing. Usually implemented using an enterprise service bus. |
| Event-driven | Data is processed when an event happens, such as a data update, with lower latency than batch processing. |
| Asynchronous data flow (near-real-time) | The system providing data does not wait for receiving system to acknowledge update before continuing process. Implies that either the sending or receiving system could be off-line for some period without the other system also being off-line. |
| Synchronous (real-time) | Used when one data set must be kept perfectly in synch with the data in another data set. |
| Streaming (low latency) | Data flows from computer systems on a real-time continuous basis immediately as events occur. These solutions require a large investment in hardware and software. |
| Replication | Maintain exact copies of data sets in multiple locations to provide better response time for users located around the world. |
| Archiving | Data that is used infrequently or not actively used may be moved to an alternate data structure or storage solution. |
| Canonical model | Common model used by an organization or data exchange group that standardizes the format in which data will be shared. |
| Point-to-point | Data passed directly between systems. |
| Hub-and-spoke | Consolidates shared data (either physically or virtually) in a central data hub that many applications can use. All systems that want to exchange data do so through a central common data control system rather than directly with one another. |
| Publish-Subscribe (Pub & Sub) | Involves systems pushing data out and other systems pulling data in. |
| Application Coupling | Describes the degree to which two systems are entwined. Two systems that are tightly coupled usually have a synchronous interface. Tight coupling represents riskier operation. Where possible, loose coupling is preferred. |
| Orchestration | Describes how multiple processes are organized and executed in a system. All systems handling messages or data packets must be able to manage the order of execution of those processes, in order to preserve consistency and continuity. |
| Process Controls | Components that ensure shipment, delivery, extraction, and loading of data is accurate and complete as part of orchestration. |
| Enterprise Application Integration (EAI) | Software modules interact with one another only through well-defined interface calls (application programming interfaces - APIs). Data stores are updated only by their own software modules and other software cannot reach into the data in an application. |
| Enterprise Service Bus (ESB) | System that acts as an intermediary between systems, passing messages between them. Applications can send and receive messages or files and are encapsulated from other processes. This is an example of loose coupling. |
| Service-Oriented Architecture (SOA) | The functionality of providing data or updating data (or other data services) can be provided through well-defined service calls between applications. |
| Complex Event Processing (CEP) | Method of tracking and analyzing streams of information about things that happen, and deriving a conclusion from them. |
| Data Federation and Virtualization | Provides access to a combination of individual data stores, regardless of structure. Enables distributed databases to be accessed and viewed as a single database. |
| Data-as-a-Service (DaaS) | A delivery and licensing model. An application is licensed to provide services, but the software and data are located at a data center controlled by the software vendor. |
| Data Exchange Standards | Formal rules for the structure of data elements. A common model used by an organization or data exchange group that standardizes the format in which data will be shared. E.g., National Information Exchange Model (NIEM) |
| DII Plan and Analyze steps | Define Data Integration Requirements Perform Data Discovery Document Data Lineage Profile Data Collect Business Rules |
| Design Data Integration Solution steps | Design Data Integration Architecture Model data hubs, interfaces, messages, and data services Map data sources to targets Design data orchestration |
| Develop Data Integration Solutions steps | Develop data services Develop data flows (e.g., ETL) Develop data migration approach Develop a publication approach Develop complex event processing flows Maintain DII metadata |
| DII Tools | Data transformation engine (ETL tool) Data virtualization server Enterprise Service Bus Business rules engine Data modeling tool Data profiling tool Metadata repository |
| Organizational change must determine | whether responsibility for managing data integration implementation is centralized or whether it resides with decentralized application teams. |
| Data sharing agreement | Stipulates the responsibilities and acceptable use of data to be exchanged, approved by the business data stewards of the data in question |
| Data Integration Metrics examples | Availability of data requested Data volumes and speeds (e.g., speed of transmission) Solution costs and complexity (e.g., ease of acquiring new data) |