Save
Upgrade to remove ads
Busy. Please wait.
Log in with Clever
or

show password
Forgot Password?

Don't have an account?  Sign up 
Sign up using Clever
or

Username is available taken
show password


Make sure to remember your password. If you forget it there is no way for StudyStack to send you a reset link. You would need to create a new account.
Your email address is only used to allow you to reset your password. See our Privacy Policy and Terms of Service.


Already a StudyStack user? Log In

Reset Password
Enter the associated with your account, and we'll email you a link to reset your password.
focusNode
Didn't know it?
click below
 
Knew it?
click below
Don't Know
Remaining cards (0)
Know
0:00
Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

  Normal Size     Small Size show me how

Data Analytics Exam

QuestionAnswer
What are the four stages of the Problem-Solving Methodology and one key activity in each? Analysis: identify requirements/constraints; Design: create detailed designs and evaluation criteria; Development: build, test and document solution; Evaluation: assess efficiency and effectiveness, recommend improvements.
In the analysis stage, what must you produce to define the scope and constraints? A requirements analysis that lists functional and non-functional requirements, constraints (time, cost, legal), stakeholders and scope boundaries.
Give a concise exam-style definition of a functional requirement and provide an example. A functional requirement describes what the system must do. Example: "Generate a weekly sales report that lists top 10 products."
Give a concise exam-style definition of a non-functional requirement and provide an example. A non-functional requirement describes how the system performs. Example: "Report must load within 5 seconds for 10,000 records."
Explain why documenting constraints is essential in the analysis stage. Constraints define limits (budget, time, legislation) that affect design choices and scope; they prevent unrealistic solutions and guide trade-offs.
What is a data dictionary and why is it used in design? A data dictionary lists datasets, fields, types and meanings; it provides a single reference ensuring consistent field usage and correct queries.
Describe an IPO chart and how it helps in design. Input-Process-Output chart maps inputs, processing steps and outputs; it clarifies required data transformations and stages for development.
List three design tools used in the study design and one purpose for each. Mock-ups (UI appearance), flowcharts/pseudocode (program logic), IPO charts (data flow and transformation).
What are evaluation criteria? Give two examples relevant to a dashboard project. Measures used to judge solution success. Examples: Accuracy of displayed figures; Load time under 3 seconds.
Explain what should be included in a project plan (Gantt chart). Tasks, start/end dates, durations, dependencies, milestones, resources and critical path to manage timeline and sequencing.
Define 'critical path' in project management. The sequence of dependent tasks that determine the shortest possible project duration; delays here delay the whole project.
What is the purpose of a research question in Unit 3 Area of Study 2? To focus data collection and analysis; it defines what the visualisation should answer and informs methodology and datasets.
Explain primary vs secondary data with an example of each. Primary data is collected firsthand (survey responses); secondary data is pre-existing (ABS census data).
What is data cleansing and why is it necessary before analysis? Removing errors, duplicates and inconsistent entries so analysis yields valid, unbiased results and correct visualisations.
Define data validation and give two common validation checks. Validation ensures data is reasonable at entry. Examples: existence checks and range checks.
Define data verification and give one example method. Verification confirms data matches original source. Example: double-data entry comparison or proofreading.
What is data integrity? Give two factors that threaten it. Data integrity = accuracy, completeness and reliability. Threats: human entry errors, corrupted transfers.
Explain why referencing datasets is required in Unit 3 projects. To acknowledge sources, allow verification of data provenance, and meet academic honesty and legal requirements.
Write an exam-style reason to prefer structured data from repositories over scraped web data. Repository data is likely to be clean, documented and reliable; web-scraped data may be inconsistent and require heavy cleansing.
Define descriptive statistics and give three examples used in Unit 3. Statistics that summarise data: mean, median, standard deviation.
How would you use mean and median to describe a skewed income dataset? Use median to represent central tendency (less affected by outliers) and mean to show average influenced by high/low values.
What is standard deviation and why is it useful? Measure of spread; shows how much values deviate from the mean, indicating variability in data.
Explain why you might use frequency tables in analysis. To summarise counts of categorical data and identify most common categories or distribution shapes.
What is an outlier and one method to handle it? A value far from others; handle by investigating source, correcting, removing, or using robust statistics (median).
Define data manipulation in the context of data analytics. Transforming, cleaning, aggregating and preparing raw data using SQL and spreadsheet functions for analysis and visualisation.
Give two spreadsheet functions useful for data cleaning and one purpose each. TRIM() removes extra spaces; IFERROR() handles calculation errors preventing analysis breakage.
What is the role of SQL in Units 3 & 4 data analytics? Extract, filter, aggregate and join data from relational databases to prepare datasets for analysis and visualisation.
Write a short SQL query to select Name and Score from Students where Score > 75. SELECT Name, Score FROM Students WHERE Score > 75;
How does GROUP BY interact with aggregate functions? GROUP BY groups rows by column values; aggregate functions (SUM, COUNT, AVG) compute metrics per group.
Provide a short SQL example using GROUP BY and HAVING to find products with sales count > 50. SELECT ProductID, COUNT(*) AS SalesCount FROM Sales GROUP BY ProductID HAVING COUNT(*) > 50;
Explain INNER JOIN in two lines and give when you'd use it. INNER JOIN returns rows with matching keys in both tables. Use it to combine student names with enrolment records matching StudentID.
Write a combined SQL example using SELECT, FROM, INNER JOIN, WHERE and ORDER BY. SELECT S.Name, C.CourseName FROM Students S INNER JOIN Enrolments E ON S.ID = E.StudentID INNER JOIN Courses C ON E.CourseID = C.ID WHERE C.Level='Advanced' ORDER BY S.Name ASC;
What is an index and how does it improve query performance? Index is a data structure (like a book index) that speeds up lookups on keyed columns, reducing search time on large tables.
Explain trade-offs of using SELECT * in exam answers. SELECT * returns all fields but is inefficient and can expose unnecessary data; list specific fields to improve performance and security.
What is normalisation and why is it important in database design? Process of organising tables to reduce redundancy and dependency, improving integrity and update efficiency.
Describe first normal form (1NF) briefly. Ensure each field contains atomic values and each record unique; no repeating groups.
What does referential integrity enforce? That foreign keys match existing primary keys, preventing orphan records and preserving relationships.
Give two reasons why you would de-identify data before publishing results. Protect privacy, comply with laws; reduce risk of re-identification.
What is an example of de-identification technique? Remove identifiers, aggregate small categories, or apply pseudonyms to personal fields.
Explain why ethics matter in data analytics projects. Ethics protect privacy, prevent harm, ensure fairness, avoid bias and maintain public trust in data use.
What is dataset bias? Give one cause and mitigation. Bias is skewed representation in data. Cause: non-representative sampling. Mitigation: adjust sampling method or weight data.
How would you apply APA referencing to cite a government dataset in Unit 3? Include author (agency), year, title, dataset type, and URL. Example: Australian Bureau of Statistics (2022), Dataset name, ABS, URL.
What is the purpose of a test table in the development stage? Compare expected and actual results for various test cases, documenting correctness and edge-case handling.
Give an example of a validation test case for a date-of-birth field. Test with valid DOB, future date (invalid), blank value (invalid), and boundary date to confirm checks.
Define alpha and beta testing and their differences. Alpha: developer/internal testing for functionality; Beta: limited user testing in real environment to assess usability and acceptance.
Explain two debugging techniques used during development. Breakpoints to inspect runtime state; logging output to trace execution and identify faults.
What is version control and why is it useful in software/data projects? System (e.g., Git) that tracks changes, enables collaboration, rollback and record of development history.
Describe what a School-Assessed Task (SAT) requires in Units 3 & 4 data analytics. Documented research question, project plan (Gantt), data collection and preparation, design folio and detailed design specs in Unit 3; development and evaluation in Unit 4.
What is meant by 'dynamic data visualisation'? Interactive graphics that update with live data feeds or respond to user input for exploratory analysis.
Give two tools commonly used to create dynamic visualisations. Tableau (with live connections) and Power BI (dashboards with refreshable datasets).
Explain why choosing the correct chart type matters using a time series example. Line charts show trends over time; bar charts compare discrete categories; wrong type can hide trends or mislead interpretation.
What is a dashboard and one best-practice design principle? Collection of visualisations for quick insight. Principle: prioritise clarity and avoid clutter; show most important metrics prominently.
Define 'storytelling with data' in one sentence. Using visualisations and annotations to guide the audience through findings and insights logically and persuasively.
What is an infographic and how does it differ from a dashboard? Infographic is static, curated for presentation; dashboard is interactive and often used for operational monitoring.
Explain two accessibility considerations for visualisations. Use high-contrast colours, provide text alternatives and avoid relying solely on colour to convey meaning.
What is data provenance and why does it matter? Record of data origin and processing steps; essential for reproducibility, trust and validating analysis.
Define 'metadata' and give one example field. Data that describes data. Example: date of collection, source name, or update timestamp.
What is a research question suitable for Unit 3 SAT? "How have monthly public transport passenger numbers changed in Melbourne suburbs between 2015–2024?" (measurable and data-available).
Explain why feasibility is important when selecting a research question. Ensures required data, time and tools are available to complete the project within constraints.
Give two sampling methods and one advantage of each. Random sampling (reduces bias); stratified sampling (ensures representation across groups).
What is the role of ethics approval in research involving human data? Formal review to protect participants, ensure consent, risk minimisation and compliance with ethical standards.
Define anonymisation vs pseudonymisation. Anonymisation removes identifiers irreversibly; pseudonymisation replaces identifiers with reversible codes.
Why might you choose pseudonymisation over anonymisation during analysis? Maintains the ability to re-link records for validation while protecting IDs during analysis.
Explain one legal implication of failing to protect personal data in a government dataset. Can lead to investigations by OVIC, legal penalties, reputational damage and mandatory remediation.
Name three security controls you would recommend for a dataset containing personal info. Encryption at rest and in transit, multi-factor authentication, role-based access control.
Define encryption and its two main types. Process of encoding data; symmetric (single shared key) and asymmetric (public/private key pair).
Explain how multi-factor authentication (MFA) improves security. Requires two or more proofs of identity (e.g., password + token), reducing risk from compromised credentials.
What is a data breach and the first two steps an organisation should take after detection? Unauthorised access/disclosure. Steps: contain the breach (disrupt access) and assess scope (what data, who affected).
Explain what log files are and how they assist incident investigation. Records of system events; they provide timestamps, user actions and sources to trace and reconstruct incidents.
What is cross-border data flow and one compliance consideration? Transfer of data outside jurisdiction; ensure receiving country has comparable protections and update contracts/clauses.
Define 'ethical use' of AI in data analytics in one sentence. Using AI transparently, avoiding unfair bias, ensuring explainability and protecting privacy and consent.
Give one strength and one limitation of using machine learning in analytics. Strength: identifies complex patterns; Limitation: requires large, high-quality data and can encode biases.
What is model overfitting and one method to prevent it? When model matches training data too closely, losing generalisation. Prevent by cross-validation or regularisation.
Why is documentation essential in the development stage? Documents maintain reproducibility, help handover, support testing, and satisfy authentication requirements.
What should be included in the evaluation section of Unit 4 SAT? Efficiency and effectiveness analysis, testing results, user feedback, and assessment of project plan accuracy.
Explain how to evaluate the effectiveness of an infographic. Check if it answers the research question, clarity of message, accuracy of data and user comprehension tests.
What is an efficiency metric you could use to evaluate a dashboard? Average load time per view or time taken to generate a report.
What is a usability test and one quick method to perform it? Evaluates how real users interact with a solution. Quick method: conduct 5-user task-based tests observing success rates and times.
Explain how you would assess the accuracy of processed data. Compare sample outputs to source data, run verification checks, and compute error rates or mismatch percentages.
What is regression testing and when is it used? Re-running tests after changes to ensure existing functionality still works during development or updates.
Give two examples of documentation to submit for a School-Assessed Task (SAT). Project plan (Gantt chart) and software requirements specification or data collection log and design folios.
What is internal documentation and why is it required for Unit 3 software modules? Code comments, README and design notes that explain structure and operation, facilitating maintenance and assessment.
Explain the difference between formative and summative evaluation in projects. Formative: ongoing feedback used to improve development; Summative: final assessment of outcomes at project completion.
What is an example of a non-digital constraint that might affect your SAT? Access to survey respondents limited by school holidays reducing primary data availability.
How would you structure an exam answer when asked to recommend security improvements? State the issue, reference relevant law (APP/IPPs/HPPs), propose specific controls, and justify each with expected impact.
Write an exam-style sentence linking APP 11 to a security control. "Under APP 11, the organisation must take reasonable steps to protect personal information, for example by encrypting databases and enforcing role-based access."
What is an effective way to present evaluation findings in a report? Use clear headings, include metrics and tables, visual examples, user feedback summaries and a concise conclusion with recommendations.
Explain how to demonstrate 'efficiency' improvements in a final report. Provide before/after metrics (e.g., query time reduced from X to Y), outline optimisation steps and quantify resource savings.
What is the significance of 'authentication' for VCE assessments? Teachers must verify student work is authentic; documentation and process logs support authenticity claims.
Give one method for ensuring student authentication in a school-based task. Keep development logs, version control commits, and teacher-supervised in-class assessments to verify authorship.
What is a practical exam tip when asked to evaluate a mock case study? State the relevant stage of the methodology, name laws/principles that apply, give specific technical and policy recommendations, and conclude with measurable success criteria.
Explain how to calculate the 'critical path' length from a Gantt chart in two steps. Identify sequence of dependent tasks with no slack; sum the durations of tasks on that sequence.
What is 'data lifecycle' and why is it relevant to privacy? Stages from collection to disposal; privacy must be considered at each stage (collection, storage, use, sharing, disposal).
Give an exam-style justification for using a relational database for the SAT. Relational databases enforce constraints and relationships, reduce redundancy and support complex SQL queries needed for analysis.
Explain how to evaluate the reliability of a secondary dataset. Check source credibility, update frequency, collection methods, sample size and cross-reference with other sources.
What is an exam-strong explanation of why you would run test cases with edge values? Edge values expose boundary behaviour and validation weaknesses ensuring the solution handles extremes and avoids runtime errors.
Name three common SQL aggregate functions and a short use-case for each. COUNT() for number of transactions, SUM() for total sales, AVG() for average customer rating.
How would you describe the concept of 'scalability' in a data project? The system's ability to handle increasing data volumes or users without performance degradation; important for future-proofing.
What is a real-world example of a dynamic data source for a dashboard? API feed from public transport live timetables or real-time sales data from an e-commerce platform.
Explain the role of user stories in design using one example. User stories capture requirements from perspective of user: "As a manager, I want daily sales graphs so I can monitor performance."
What is 'legal compliance' when using third-party APIs with personal data? Ensure provider terms allow intended use, secure cross-border transfers, and protect data per APPs/IPPs/HPPs.
Give one exam-ready mitigation for insider threats in a data project. Implement least-privilege access, audit logs, staff training and periodic access reviews.
Describe a short scenario where APP 6 applies and give the correct action. Company collected emails for order confirmation; using them for marketing breaches APP 6—obtain consent before marketing or stop the practice.
Explain how to present sampling bias in an exam answer for a survey. Define sampling bias, identify how survey method produced it, quantify impact if possible and propose sampling redesign or weighting.
What is a policy recommendation to meet IPP 5 openness for a council website? Publish a concise privacy policy page detailing data types collected, purposes, storage practices and contact for privacy enquiries.
Give two ways to reduce risk when outsourcing data storage to a cloud provider. Contractual clauses requiring equivalent privacy protections and data residency clauses; implement encryption with keys controlled by the organisation.
What is a practical check to perform before publishing visualisations with small category counts? Aggregate small cells or suppress categories to prevent re-identification of individuals in sparse bins.
How would you answer an exam question asking for three checks to ensure data quality? Verify against source records, run validation rules (type/range/existence), and perform consistency checks across fields.
Provide a concise model answer: Why is reproducibility important in data analytics? Reproducibility allows others to validate findings, ensures transparency, and supports reliability of conclusions.
What are two considerations when using historical datasets for predictive models? Data relevance (feature drift) and completeness; ensure training data reflects current context or retrain models appropriately.
Explain how to structure a short justification for choosing a visualization type in the exam. State the chart, state what relationship it shows (trend/comparison/proportion), and justify with data type and audience needs.
What are three common causes of poor dashboard performance and one fix each? Unoptimised queries → index/tune SQL; excessive visuals → reduce tiles; large data pulls → cache or pre-aggregate.
Describe one method to evaluate the usability of a dashboard quantitatively. Task completion time and success rate for representative user tasks measured across multiple users.
What is one concise way to mention 'ethical auditing' in an exam answer? Recommend independent review of datasets and models for bias, privacy risk and fairness before deployment.
What is a succinct summary line linking data governance to analytics success? Strong data governance ensures data quality, legal compliance and reliable analytics outputs, enabling trustworthy decisions.
Created by: LiamJ84
 

 



Voices

Use these flashcards to help memorize information. Look at the large card and try to recall what is on the other side. Then click the card to flip it. If you knew the answer, click the green Know box. Otherwise, click the red Don't know box.

When you've placed seven or more cards in the Don't know box, click "retry" to try those cards again.

If you've accidentally put the card in the wrong box, just click on the card to take it out of the box.

You can also use your keyboard to move the cards as follows:

If you are logged in to your account, this website will remember which cards you know and don't know so that they are in the same box the next time you log in.

When you need a break, try one of the other activities listed below the flashcards like Matching, Snowman, or Hungry Bug. Although it may feel like you're playing a game, your brain is still making more connections with the information to help you out.

To see how well you know the information, try the Quiz or Test activity.

Pass complete!
"Know" box contains:
Time elapsed:
Retries:
restart all cards