Save
Upgrade to remove ads
Busy. Please wait.
Log in with Clever
or

show password
Forgot Password?

Don't have an account?  Sign up 
Sign up using Clever
or

Username is available taken
show password


Make sure to remember your password. If you forget it there is no way for StudyStack to send you a reset link. You would need to create a new account.
Your email address is only used to allow you to reset your password. See our Privacy Policy and Terms of Service.


Already a StudyStack user? Log In

Reset Password
Enter the associated with your account, and we'll email you a link to reset your password.
focusNode
Didn't know it?
click below
 
Knew it?
click below
Don't Know
Remaining cards (0)
Know
0:00
Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

  Normal Size     Small Size show me how

K10 - Combining Data

Evaluates the benefits and risks inherent in combining data

TermDefinition
What is meant by, combining data? It is the principle of combing more than one source of data from different data sets to make a usable data set, for analysis and hypothesis testing, etc
What other terms are used to describe combining data? Data merging, Joining, or Integration are other phrases used to describe processes for combining data.
What is meant by ETL? Extract, Transform, Load. - It is the process used to combine data from different sources. Extract focuses on getting the data, Transform focuses on cleaning the data. Load focuses on loading the data into the targeted location.
What is Middleware software? Essentially software’s that sit between systems and automatically moves the data and transforms the data between them
What is Fuzzy Matching? This is the process of finding records that are similar, not exactly identical. The Levenstein Distance is a helpful metric for analysing this.
Levenstein Distance. This is a mathematical way to measure how different two words are. It counts the minimum number of changes needed to turn one word into another.
Probabilistic Matching. Instead of just counting edits, this uses statistics to decide if two records are the same. It calculates the probability that two records belong to the same entity based on multiple fields. Good for messy real data.
What are some of the benefits for combining data? Improved decision making; Better data quality and accuracy; Enhanced insights and analysis; Elimination of data silos; Efficiency and automation; Real time monitoring and reporting; Support for advanced analytics.
What are the risks in combining data? Data quality issues; Privacy and compliance risks (eg. GDPR), Data integration complexity; Inaccurate matches; Security risks; Increased storage and processing costs; Data governance challenges;
what are some of the best practises for combining data. Data profiling; Standardisation, Metadata Management; Data Governance framework; Iterative testing; Monitoring and auditing.
 



Voices

Use these flashcards to help memorize information. Look at the large card and try to recall what is on the other side. Then click the card to flip it. If you knew the answer, click the green Know box. Otherwise, click the red Don't know box.

When you've placed seven or more cards in the Don't know box, click "retry" to try those cards again.

If you've accidentally put the card in the wrong box, just click on the card to take it out of the box.

You can also use your keyboard to move the cards as follows:

If you are logged in to your account, this website will remember which cards you know and don't know so that they are in the same box the next time you log in.

When you need a break, try one of the other activities listed below the flashcards like Matching, Snowman, or Hungry Bug. Although it may feel like you're playing a game, your brain is still making more connections with the information to help you out.

To see how well you know the information, try the Quiz or Test activity.

Pass complete!
"Know" box contains:
Time elapsed:
Retries:
restart all cards