click below
click below
Normal Size Small Size show me how
Data Analytics Exam
| Question | Answer |
|---|---|
| What are the four stages of the Problem-Solving Methodology and one key activity in each? | Analysis: identify requirements/constraints; Design: create detailed designs and evaluation criteria; Development: build, test and document solution; Evaluation: assess efficiency and effectiveness, recommend improvements. |
| In the analysis stage, what must you produce to define the scope and constraints? | A requirements analysis that lists functional and non-functional requirements, constraints (time, cost, legal), stakeholders and scope boundaries. |
| Give a concise exam-style definition of a functional requirement and provide an example. | A functional requirement describes what the system must do. Example: "Generate a weekly sales report that lists top 10 products." |
| Give a concise exam-style definition of a non-functional requirement and provide an example. | A non-functional requirement describes how the system performs. Example: "Report must load within 5 seconds for 10,000 records." |
| Explain why documenting constraints is essential in the analysis stage. | Constraints define limits (budget, time, legislation) that affect design choices and scope; they prevent unrealistic solutions and guide trade-offs. |
| What is a data dictionary and why is it used in design? | A data dictionary lists datasets, fields, types and meanings; it provides a single reference ensuring consistent field usage and correct queries. |
| Describe an IPO chart and how it helps in design. | Input-Process-Output chart maps inputs, processing steps and outputs; it clarifies required data transformations and stages for development. |
| List three design tools used in the study design and one purpose for each. | Mock-ups (UI appearance), flowcharts/pseudocode (program logic), IPO charts (data flow and transformation). |
| What are evaluation criteria? Give two examples relevant to a dashboard project. | Measures used to judge solution success. Examples: Accuracy of displayed figures; Load time under 3 seconds. |
| Explain what should be included in a project plan (Gantt chart). | Tasks, start/end dates, durations, dependencies, milestones, resources and critical path to manage timeline and sequencing. |
| Define 'critical path' in project management. | The sequence of dependent tasks that determine the shortest possible project duration; delays here delay the whole project. |
| What is the purpose of a research question in Unit 3 Area of Study 2? | To focus data collection and analysis; it defines what the visualisation should answer and informs methodology and datasets. |
| Explain primary vs secondary data with an example of each. | Primary data is collected firsthand (survey responses); secondary data is pre-existing (ABS census data). |
| What is data cleansing and why is it necessary before analysis? | Removing errors, duplicates and inconsistent entries so analysis yields valid, unbiased results and correct visualisations. |
| Define data validation and give two common validation checks. | Validation ensures data is reasonable at entry. Examples: existence checks and range checks. |
| Define data verification and give one example method. | Verification confirms data matches original source. Example: double-data entry comparison or proofreading. |
| What is data integrity? Give two factors that threaten it. | Data integrity = accuracy, completeness and reliability. Threats: human entry errors, corrupted transfers. |
| Explain why referencing datasets is required in Unit 3 projects. | To acknowledge sources, allow verification of data provenance, and meet academic honesty and legal requirements. |
| Write an exam-style reason to prefer structured data from repositories over scraped web data. | Repository data is likely to be clean, documented and reliable; web-scraped data may be inconsistent and require heavy cleansing. |
| Define descriptive statistics and give three examples used in Unit 3. | Statistics that summarise data: mean, median, standard deviation. |
| How would you use mean and median to describe a skewed income dataset? | Use median to represent central tendency (less affected by outliers) and mean to show average influenced by high/low values. |
| What is standard deviation and why is it useful? | Measure of spread; shows how much values deviate from the mean, indicating variability in data. |
| Explain why you might use frequency tables in analysis. | To summarise counts of categorical data and identify most common categories or distribution shapes. |
| What is an outlier and one method to handle it? | A value far from others; handle by investigating source, correcting, removing, or using robust statistics (median). |
| Define data manipulation in the context of data analytics. | Transforming, cleaning, aggregating and preparing raw data using SQL and spreadsheet functions for analysis and visualisation. |
| Give two spreadsheet functions useful for data cleaning and one purpose each. | TRIM() removes extra spaces; IFERROR() handles calculation errors preventing analysis breakage. |
| What is the role of SQL in Units 3 & 4 data analytics? | Extract, filter, aggregate and join data from relational databases to prepare datasets for analysis and visualisation. |
| Write a short SQL query to select Name and Score from Students where Score > 75. | SELECT Name, Score FROM Students WHERE Score > 75; |
| How does GROUP BY interact with aggregate functions? | GROUP BY groups rows by column values; aggregate functions (SUM, COUNT, AVG) compute metrics per group. |
| Provide a short SQL example using GROUP BY and HAVING to find products with sales count > 50. | SELECT ProductID, COUNT(*) AS SalesCount FROM Sales GROUP BY ProductID HAVING COUNT(*) > 50; |
| Explain INNER JOIN in two lines and give when you'd use it. | INNER JOIN returns rows with matching keys in both tables. Use it to combine student names with enrolment records matching StudentID. |
| Write a combined SQL example using SELECT, FROM, INNER JOIN, WHERE and ORDER BY. | SELECT S.Name, C.CourseName FROM Students S INNER JOIN Enrolments E ON S.ID = E.StudentID INNER JOIN Courses C ON E.CourseID = C.ID WHERE C.Level='Advanced' ORDER BY S.Name ASC; |
| What is an index and how does it improve query performance? | Index is a data structure (like a book index) that speeds up lookups on keyed columns, reducing search time on large tables. |
| Explain trade-offs of using SELECT * in exam answers. | SELECT * returns all fields but is inefficient and can expose unnecessary data; list specific fields to improve performance and security. |
| What is normalisation and why is it important in database design? | Process of organising tables to reduce redundancy and dependency, improving integrity and update efficiency. |
| Describe first normal form (1NF) briefly. | Ensure each field contains atomic values and each record unique; no repeating groups. |
| What does referential integrity enforce? | That foreign keys match existing primary keys, preventing orphan records and preserving relationships. |
| Give two reasons why you would de-identify data before publishing results. | Protect privacy, comply with laws; reduce risk of re-identification. |
| What is an example of de-identification technique? | Remove identifiers, aggregate small categories, or apply pseudonyms to personal fields. |
| Explain why ethics matter in data analytics projects. | Ethics protect privacy, prevent harm, ensure fairness, avoid bias and maintain public trust in data use. |
| What is dataset bias? Give one cause and mitigation. | Bias is skewed representation in data. Cause: non-representative sampling. Mitigation: adjust sampling method or weight data. |
| How would you apply APA referencing to cite a government dataset in Unit 3? | Include author (agency), year, title, dataset type, and URL. Example: Australian Bureau of Statistics (2022), Dataset name, ABS, URL. |
| What is the purpose of a test table in the development stage? | Compare expected and actual results for various test cases, documenting correctness and edge-case handling. |
| Give an example of a validation test case for a date-of-birth field. | Test with valid DOB, future date (invalid), blank value (invalid), and boundary date to confirm checks. |
| Define alpha and beta testing and their differences. | Alpha: developer/internal testing for functionality; Beta: limited user testing in real environment to assess usability and acceptance. |
| Explain two debugging techniques used during development. | Breakpoints to inspect runtime state; logging output to trace execution and identify faults. |
| What is version control and why is it useful in software/data projects? | System (e.g., Git) that tracks changes, enables collaboration, rollback and record of development history. |
| Describe what a School-Assessed Task (SAT) requires in Units 3 & 4 data analytics. | Documented research question, project plan (Gantt), data collection and preparation, design folio and detailed design specs in Unit 3; development and evaluation in Unit 4. |
| What is meant by 'dynamic data visualisation'? | Interactive graphics that update with live data feeds or respond to user input for exploratory analysis. |
| Give two tools commonly used to create dynamic visualisations. | Tableau (with live connections) and Power BI (dashboards with refreshable datasets). |
| Explain why choosing the correct chart type matters using a time series example. | Line charts show trends over time; bar charts compare discrete categories; wrong type can hide trends or mislead interpretation. |
| What is a dashboard and one best-practice design principle? | Collection of visualisations for quick insight. Principle: prioritise clarity and avoid clutter; show most important metrics prominently. |
| Define 'storytelling with data' in one sentence. | Using visualisations and annotations to guide the audience through findings and insights logically and persuasively. |
| What is an infographic and how does it differ from a dashboard? | Infographic is static, curated for presentation; dashboard is interactive and often used for operational monitoring. |
| Explain two accessibility considerations for visualisations. | Use high-contrast colours, provide text alternatives and avoid relying solely on colour to convey meaning. |
| What is data provenance and why does it matter? | Record of data origin and processing steps; essential for reproducibility, trust and validating analysis. |
| Define 'metadata' and give one example field. | Data that describes data. Example: date of collection, source name, or update timestamp. |
| What is a research question suitable for Unit 3 SAT? | "How have monthly public transport passenger numbers changed in Melbourne suburbs between 2015–2024?" (measurable and data-available). |
| Explain why feasibility is important when selecting a research question. | Ensures required data, time and tools are available to complete the project within constraints. |
| Give two sampling methods and one advantage of each. | Random sampling (reduces bias); stratified sampling (ensures representation across groups). |
| What is the role of ethics approval in research involving human data? | Formal review to protect participants, ensure consent, risk minimisation and compliance with ethical standards. |
| Define anonymisation vs pseudonymisation. | Anonymisation removes identifiers irreversibly; pseudonymisation replaces identifiers with reversible codes. |
| Why might you choose pseudonymisation over anonymisation during analysis? | Maintains the ability to re-link records for validation while protecting IDs during analysis. |
| Explain one legal implication of failing to protect personal data in a government dataset. | Can lead to investigations by OVIC, legal penalties, reputational damage and mandatory remediation. |
| Name three security controls you would recommend for a dataset containing personal info. | Encryption at rest and in transit, multi-factor authentication, role-based access control. |
| Define encryption and its two main types. | Process of encoding data; symmetric (single shared key) and asymmetric (public/private key pair). |
| Explain how multi-factor authentication (MFA) improves security. | Requires two or more proofs of identity (e.g., password + token), reducing risk from compromised credentials. |
| What is a data breach and the first two steps an organisation should take after detection? | Unauthorised access/disclosure. Steps: contain the breach (disrupt access) and assess scope (what data, who affected). |
| Explain what log files are and how they assist incident investigation. | Records of system events; they provide timestamps, user actions and sources to trace and reconstruct incidents. |
| What is cross-border data flow and one compliance consideration? | Transfer of data outside jurisdiction; ensure receiving country has comparable protections and update contracts/clauses. |
| Define 'ethical use' of AI in data analytics in one sentence. | Using AI transparently, avoiding unfair bias, ensuring explainability and protecting privacy and consent. |
| Give one strength and one limitation of using machine learning in analytics. | Strength: identifies complex patterns; Limitation: requires large, high-quality data and can encode biases. |
| What is model overfitting and one method to prevent it? | When model matches training data too closely, losing generalisation. Prevent by cross-validation or regularisation. |
| Why is documentation essential in the development stage? | Documents maintain reproducibility, help handover, support testing, and satisfy authentication requirements. |
| What should be included in the evaluation section of Unit 4 SAT? | Efficiency and effectiveness analysis, testing results, user feedback, and assessment of project plan accuracy. |
| Explain how to evaluate the effectiveness of an infographic. | Check if it answers the research question, clarity of message, accuracy of data and user comprehension tests. |
| What is an efficiency metric you could use to evaluate a dashboard? | Average load time per view or time taken to generate a report. |
| What is a usability test and one quick method to perform it? | Evaluates how real users interact with a solution. Quick method: conduct 5-user task-based tests observing success rates and times. |
| Explain how you would assess the accuracy of processed data. | Compare sample outputs to source data, run verification checks, and compute error rates or mismatch percentages. |
| What is regression testing and when is it used? | Re-running tests after changes to ensure existing functionality still works during development or updates. |
| Give two examples of documentation to submit for a School-Assessed Task (SAT). | Project plan (Gantt chart) and software requirements specification or data collection log and design folios. |
| What is internal documentation and why is it required for Unit 3 software modules? | Code comments, README and design notes that explain structure and operation, facilitating maintenance and assessment. |
| Explain the difference between formative and summative evaluation in projects. | Formative: ongoing feedback used to improve development; Summative: final assessment of outcomes at project completion. |
| What is an example of a non-digital constraint that might affect your SAT? | Access to survey respondents limited by school holidays reducing primary data availability. |
| How would you structure an exam answer when asked to recommend security improvements? | State the issue, reference relevant law (APP/IPPs/HPPs), propose specific controls, and justify each with expected impact. |
| Write an exam-style sentence linking APP 11 to a security control. | "Under APP 11, the organisation must take reasonable steps to protect personal information, for example by encrypting databases and enforcing role-based access." |
| What is an effective way to present evaluation findings in a report? | Use clear headings, include metrics and tables, visual examples, user feedback summaries and a concise conclusion with recommendations. |
| Explain how to demonstrate 'efficiency' improvements in a final report. | Provide before/after metrics (e.g., query time reduced from X to Y), outline optimisation steps and quantify resource savings. |
| What is the significance of 'authentication' for VCE assessments? | Teachers must verify student work is authentic; documentation and process logs support authenticity claims. |
| Give one method for ensuring student authentication in a school-based task. | Keep development logs, version control commits, and teacher-supervised in-class assessments to verify authorship. |
| What is a practical exam tip when asked to evaluate a mock case study? | State the relevant stage of the methodology, name laws/principles that apply, give specific technical and policy recommendations, and conclude with measurable success criteria. |
| Explain how to calculate the 'critical path' length from a Gantt chart in two steps. | Identify sequence of dependent tasks with no slack; sum the durations of tasks on that sequence. |
| What is 'data lifecycle' and why is it relevant to privacy? | Stages from collection to disposal; privacy must be considered at each stage (collection, storage, use, sharing, disposal). |
| Give an exam-style justification for using a relational database for the SAT. | Relational databases enforce constraints and relationships, reduce redundancy and support complex SQL queries needed for analysis. |
| Explain how to evaluate the reliability of a secondary dataset. | Check source credibility, update frequency, collection methods, sample size and cross-reference with other sources. |
| What is an exam-strong explanation of why you would run test cases with edge values? | Edge values expose boundary behaviour and validation weaknesses ensuring the solution handles extremes and avoids runtime errors. |
| Name three common SQL aggregate functions and a short use-case for each. | COUNT() for number of transactions, SUM() for total sales, AVG() for average customer rating. |
| How would you describe the concept of 'scalability' in a data project? | The system's ability to handle increasing data volumes or users without performance degradation; important for future-proofing. |
| What is a real-world example of a dynamic data source for a dashboard? | API feed from public transport live timetables or real-time sales data from an e-commerce platform. |
| Explain the role of user stories in design using one example. | User stories capture requirements from perspective of user: "As a manager, I want daily sales graphs so I can monitor performance." |
| What is 'legal compliance' when using third-party APIs with personal data? | Ensure provider terms allow intended use, secure cross-border transfers, and protect data per APPs/IPPs/HPPs. |
| Give one exam-ready mitigation for insider threats in a data project. | Implement least-privilege access, audit logs, staff training and periodic access reviews. |
| Describe a short scenario where APP 6 applies and give the correct action. | Company collected emails for order confirmation; using them for marketing breaches APP 6—obtain consent before marketing or stop the practice. |
| Explain how to present sampling bias in an exam answer for a survey. | Define sampling bias, identify how survey method produced it, quantify impact if possible and propose sampling redesign or weighting. |
| What is a policy recommendation to meet IPP 5 openness for a council website? | Publish a concise privacy policy page detailing data types collected, purposes, storage practices and contact for privacy enquiries. |
| Give two ways to reduce risk when outsourcing data storage to a cloud provider. | Contractual clauses requiring equivalent privacy protections and data residency clauses; implement encryption with keys controlled by the organisation. |
| What is a practical check to perform before publishing visualisations with small category counts? | Aggregate small cells or suppress categories to prevent re-identification of individuals in sparse bins. |
| How would you answer an exam question asking for three checks to ensure data quality? | Verify against source records, run validation rules (type/range/existence), and perform consistency checks across fields. |
| Provide a concise model answer: Why is reproducibility important in data analytics? | Reproducibility allows others to validate findings, ensures transparency, and supports reliability of conclusions. |
| What are two considerations when using historical datasets for predictive models? | Data relevance (feature drift) and completeness; ensure training data reflects current context or retrain models appropriately. |
| Explain how to structure a short justification for choosing a visualization type in the exam. | State the chart, state what relationship it shows (trend/comparison/proportion), and justify with data type and audience needs. |
| What are three common causes of poor dashboard performance and one fix each? | Unoptimised queries → index/tune SQL; excessive visuals → reduce tiles; large data pulls → cache or pre-aggregate. |
| Describe one method to evaluate the usability of a dashboard quantitatively. | Task completion time and success rate for representative user tasks measured across multiple users. |
| What is one concise way to mention 'ethical auditing' in an exam answer? | Recommend independent review of datasets and models for bias, privacy risk and fairness before deployment. |
| What is a succinct summary line linking data governance to analytics success? | Strong data governance ensures data quality, legal compliance and reliable analytics outputs, enabling trustworthy decisions. |