Save
Upgrade to remove ads
Busy. Please wait.
Log in with Clever
or

show password
Forgot Password?

Don't have an account?  Sign up 
Sign up using Clever
or

Username is available taken
show password


Make sure to remember your password. If you forget it there is no way for StudyStack to send you a reset link. You would need to create a new account.
Your email address is only used to allow you to reset your password. See our Privacy Policy and Terms of Service.


Already a StudyStack user? Log In

Reset Password
Enter the associated with your account, and we'll email you a link to reset your password.
focusNode
Didn't know it?
click below
 
Knew it?
click below
Don't Know
Remaining cards (0)
Know
0:00
Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

  Normal Size     Small Size show me how

DMBOK - Chp 8

Data Integration & Interoperability

QuestionAnswer
Data Integration and Interoperability (DII) describes processes related to the movement and consolidation of data within and between data stores, applications, and organizations.
Data Integration and Interoperability is critical to Data Warehousing and BI as well as MRDM because These focus on transforming and integrating data from source systems to consolidated data hubs and from hubs to target systems where it can be delivered to data consumers.
Primary driver for Data Integration and Interoperability The need to manage data movement efficiently.
Data Integration and Interoperability Suppliers Data Producers IT Steering Committee Executives and Managers SMEs
Data Integration and Interoperability Participants Data Architects Business and Data Analysts Data Modelers Data Stewards ETL, Service, Interface developers Project and Program Managers
Data Integration and Interoperability Consumers Information consumers Knowledge workers Managers and Executives
The implementation of DII practices and solutions aims to: Make data available in the format and timeframe needed Consolidate data physically and virtually Lower cost and complexity of managing solutions with shared models and interfaces Identify meaningful events and automatically trigger alerts and actions
ETL Extract, Transform, and Load
The extract process includes selecting the required data extracting it from its source staging the data
The transform process makes the selected data compatible with the structure of the target data store
Examples of transformations Format change Structure change Semantic conversion De-duping Re-ordering
The load process physically storing or presenting the result of transformations in the target system.
ELT (Extract, Load, and Transform) If the target system has more transformation capability than either the source or an intermediary application system, the order of process may be switched.
Mapping (synonym for transformation) Both the process of developing the lookup matrix from source to target structures and the result of that process.
Batch processing Most data moves between applications and organizations in clumps or files either on request by a human data consumer or automatically on a periodic schedule, often with a significant delay between source and target systems
Change Data Capture (CDC) Method of reducing bandwidth by filtering to include only data that has been changed within a defined timeframe.
Near-real-time Data is processed in smaller sets spread across the day in a defined schedule, with lower latency than batch processing. Usually implemented using an enterprise service bus.
Event-driven Data is processed when an event happens, such as a data update, with lower latency than batch processing.
Asynchronous data flow (near-real-time) The system providing data does not wait for receiving system to acknowledge update before continuing process. Implies that either the sending or receiving system could be off-line for some period without the other system also being off-line.
Synchronous (real-time) Used when one data set must be kept perfectly in synch with the data in another data set.
Streaming (low latency) Data flows from computer systems on a real-time continuous basis immediately as events occur. These solutions require a large investment in hardware and software.
Replication Maintain exact copies of data sets in multiple locations to provide better response time for users located around the world.
Archiving Data that is used infrequently or not actively used may be moved to an alternate data structure or storage solution.
Canonical model Common model used by an organization or data exchange group that standardizes the format in which data will be shared.
Point-to-point Data passed directly between systems.
Hub-and-spoke Consolidates shared data (either physically or virtually) in a central data hub that many applications can use. All systems that want to exchange data do so through a central common data control system rather than directly with one another.
Publish-Subscribe (Pub & Sub) Involves systems pushing data out and other systems pulling data in.
Application Coupling Describes the degree to which two systems are entwined. Two systems that are tightly coupled usually have a synchronous interface. Tight coupling represents riskier operation. Where possible, loose coupling is preferred.
Orchestration Describes how multiple processes are organized and executed in a system. All systems handling messages or data packets must be able to manage the order of execution of those processes, in order to preserve consistency and continuity.
Process Controls Components that ensure shipment, delivery, extraction, and loading of data is accurate and complete as part of orchestration.
Enterprise Application Integration (EAI) Software modules interact with one another only through well-defined interface calls (application programming interfaces - APIs). Data stores are updated only by their own software modules and other software cannot reach into the data in an application.
Enterprise Service Bus (ESB) System that acts as an intermediary between systems, passing messages between them. Applications can send and receive messages or files and are encapsulated from other processes. This is an example of loose coupling.
Service-Oriented Architecture (SOA) The functionality of providing data or updating data (or other data services) can be provided through well-defined service calls between applications.
Complex Event Processing (CEP) Method of tracking and analyzing streams of information about things that happen, and deriving a conclusion from them.
Data Federation and Virtualization Provides access to a combination of individual data stores, regardless of structure. Enables distributed databases to be accessed and viewed as a single database.
Data-as-a-Service (DaaS) A delivery and licensing model. An application is licensed to provide services, but the software and data are located at a data center controlled by the software vendor.
Data Exchange Standards Formal rules for the structure of data elements. A common model used by an organization or data exchange group that standardizes the format in which data will be shared. E.g., National Information Exchange Model (NIEM)
DII Plan and Analyze steps Define Data Integration Requirements Perform Data Discovery Document Data Lineage Profile Data Collect Business Rules
Design Data Integration Solution steps Design Data Integration Architecture Model data hubs, interfaces, messages, and data services Map data sources to targets Design data orchestration
Develop Data Integration Solutions steps Develop data services Develop data flows (e.g., ETL) Develop data migration approach Develop a publication approach Develop complex event processing flows Maintain DII metadata
DII Tools Data transformation engine (ETL tool) Data virtualization server Enterprise Service Bus Business rules engine Data modeling tool Data profiling tool Metadata repository
Organizational change must determine whether responsibility for managing data integration implementation is centralized or whether it resides with decentralized application teams.
Data sharing agreement Stipulates the responsibilities and acceptable use of data to be exchanged, approved by the business data stewards of the data in question
Data Integration Metrics examples Availability of data requested Data volumes and speeds (e.g., speed of transmission) Solution costs and complexity (e.g., ease of acquiring new data)
Created by: sshupert
 

 



Voices

Use these flashcards to help memorize information. Look at the large card and try to recall what is on the other side. Then click the card to flip it. If you knew the answer, click the green Know box. Otherwise, click the red Don't know box.

When you've placed seven or more cards in the Don't know box, click "retry" to try those cards again.

If you've accidentally put the card in the wrong box, just click on the card to take it out of the box.

You can also use your keyboard to move the cards as follows:

If you are logged in to your account, this website will remember which cards you know and don't know so that they are in the same box the next time you log in.

When you need a break, try one of the other activities listed below the flashcards like Matching, Snowman, or Hungry Bug. Although it may feel like you're playing a game, your brain is still making more connections with the information to help you out.

To see how well you know the information, try the Quiz or Test activity.

Pass complete!
"Know" box contains:
Time elapsed:
Retries:
restart all cards