click below
click below
Normal Size Small Size show me how
Data Analytics Exam
| Term | Definition |
|---|---|
| Data | Information that is collected, organized, and analyzed to support decision-making. |
| Structured Data | Data organized in fixed fields within a record or file, easily stored in tables. |
| Unstructured Data | Data without a predefined format, such as text, images, or videos. |
| Categorical Data | Data that represent characteristics or labels, such as gender or color. |
| Numerical Data | Data that represent measurable quantities or counts. |
| Metadata | Data that provides information about other data, such as author or file size. |
| Big Data | Extremely large datasets analyzed computationally to reveal patterns and trends. |
| ETL Process | Extract, Transform, Load – the process of preparing and moving data for analysis. |
| Fixed-width Format | File type where each column has a set width and position. |
| Delimited Format | File type where data is separated by commas, tabs, or other delimiters. |
| XML | eXtensible Markup Language used to structure and store data using custom tags. |
| HTML | HyperText Markup Language used to display content in web browsers. |
| JSON | JavaScript Object Notation, a lightweight format for data exchange. |
| Data Validation | Ensuring data completeness and integrity during extraction and processing. |
| Data Cleaning | Detecting and correcting inaccurate, incomplete, or inconsistent data. |
| Handling Null Values | Replacing or removing missing data to ensure accuracy. |
| Sorting | Arranging data in a specific order such as ascending or descending. |
| Filtering | Selecting data that meets certain criteria, e.g., State |
| Slicing | Extracting a portion or subset of a dataset. |
| Transposing | Switching rows and columns in a dataset. |
| Appending | Adding new data to the end of an existing dataset. |
| Truncating | Shortening text or records by removing unneeded parts. |
| Aggregation | Combining data to provide summarized insights like totals or averages. |
| Grouping | Organizing data into categories to perform aggregate calculations. |
| Merging/Joining | Combining datasets based on shared keys or identifiers. |
| Summarizing | Condensing detailed data into high-level statistics or insights. |
| Pivoting | Reshaping data from long format to wide format or vice versa. |
| Data Roll-up | Combining data into larger group totals for summary analysis. |
| Aggregation Level | The degree of detail or granularity in a dataset. |
| Temporal Granularity | The time-based level of detail, e.g., daily vs monthly data. |
| Spatial Aggregation | Summarizing data by geographic units such as regions or states. |
| Data Analysis | The process of inspecting and modeling data to discover useful information. |
| Descriptive Analysis | Summarizes past data to understand what happened. |
| Diagnostic Analysis | Examines data to understand why something happened. |
| Predictive Analysis | Uses data and models to predict future outcomes. |
| Prescriptive Analysis | Recommends actions based on data analysis and scenarios. |
| Aggregation Function | Mathematical operations like sum, average, count, min, or max. |
| Standard Deviation | Measure of how much data values vary from the mean. |
| Exploratory Data Analysis (EDA) | Visually and statistically examining data to find patterns or anomalies. |
| Boxplot | A visual graph that shows the distribution, median, and outliers of data. |
| Quartiles | Values that divide a dataset into four equal parts (Q1, Q3, IQR). |
| Data Mining | Using algorithms to identify hidden patterns or relationships in large datasets. |
| Machine Learning | AI technique where systems learn patterns from data to make predictions. |
| Evaluating Results | Interpreting and validating findings from data analysis. |
| Responsible Analytics | Using data in ways that are ethical, fair, and respectful of privacy. |
| GDPR | General Data Protection Regulation – EU law protecting personal data and privacy. |
| FERPA | Family Educational Rights and Privacy Act – protects student education records. |
| HIPAA | Health Insurance Portability and Accountability Act – protects medical data privacy. |
| IRB | Institutional Review Board – oversees ethical research involving human data. |
| PCI | Payment Card Industry standard for securing cardholder data. |
| PII | Personally Identifiable Information – data that can identify an individual. |
| Data Anonymity | Removing personal identifiers to protect privacy. |
| Interpretability | The degree to which a model's behavior can be understood by humans. |
| Accuracy | The closeness of a model’s predictions to actual values. |
| Confirmation Bias | Favoring information that supports preexisting beliefs. |
| Cognitive Bias | Systematic errors in thinking that affect judgments and decisions. |
| Motivational Bias | Bias resulting from personal incentives or desired outcomes. |
| Sampling Bias | When the sample collected is not representative of the population. |
| Pie Chart | Used to show parts of a whole. Best for displaying percentage or proportional data when the total adds up to 100%. |
| Bar Chart | Used to compare quantities of different categories. Best for categorical data comparisons across groups. |
| Line Chart | Used to show trends over time or continuous data. Ideal for time series and progression tracking. |
| Scatterplot | Used to show relationships or correlations between two numerical variables. |
| Box Plot | Used to show the distribution, spread, and outliers in a dataset. Useful for comparing variability across groups. |
| Histogram | Used to show frequency distributions of numerical data by grouping values into bins. |