click below
click below
Normal Size Small Size show me how
Data visualization
Exam 1
| Question | Answer |
|---|---|
| Data visualization | the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data |
| Data visualization makes it possible to see | insights and trends in the data that cannot be identified in the text/tabular form |
| The goal of visualization is | to aid our understanding of data by leveraging the human visual system’s highly tuned ability to see patterns, spot trends, and identify outliers |
| Why Visualize Data? | 1. Find patterns and see data in context 2. Make data accessible to everyone 3. Expand memory 4. Answer questions (or discover them) 5. Make decisions / Persuade others to make decisions 6. Persuade using evidence through narratives |
| Who created the table? | Claudius Ptolemy |
| Who created the coordinate system? | Rene Descartes |
| Who created the Line graph, Bar chart, pie chart? | William Playfair |
| Who created the Coxcomb? | Florence Nightingale |
| Who created the Histogram? | Karl Pearson |
| Who created the Semiologie Graphique? | Jacques Bertin |
| Who created the box plot? | John Tukey |
| who created the flow line map? | Charles Minard |
| who created the visual display of quantitive information book? | Edward Tufte |
| Categorical data | Nominal data that represents things, mutually exclusive labels without any numerical value. |
| Categorical data examples | name, gender, product types |
| Ordinal data | Similar to categorical data, except it has a clear order. Although ordinal values often have numbers associated with them, the interval between those values is arbitrary. |
| Ordinal data examples | level of education, satisfaction level, salary bands |
| Quantitative data - Discrete | Predefined at exact points, no “in between” |
| Quantitative data - Discrete Examples | you can have 175 patients but not 175.5 patients in a hospital, number of pageviews on a website |
| Quantitative data - Continuous | Infinite number of possible intermediate values. |
| Quantitative data - Continuous Examples | sales, profit, height |
| Temporal data | data that is collected over a time period, typically involves dates |
| Temporal data Examples | yearly sales data, monthly sales data, |
| Exploratory | testing a hypothesis and mining for patterns, trends, and anomalies (data scientist) |
| Exploratory Example | a business owner thinks that purchases on mobile vs. desktop peek at different times of the day |
| Explanatory | usually simple everyday visualizations —line charts, bar charts, pies, and scatter plots conveying a single message (managers) |
| Explanatory Example | Your managers ask you to build a report to show that firm spending on employee health benefits has declined |
| MICROSOFT EXCEL | Pros: Supports processing of data Relatively easy to learn Widely used Cons: Good for basic visualization - not interactive Require customization to adhere to design standards May not process large dataset (~1GB) |
| GOOGLE CHARTS | Pros: Free and open option to create charts Include interactive data graphics Integrates well with Google Apps Can easily access data from different computers Cons: May not be effective for very large datasets (approaching ~1GB) |
| MICROSOFT POWER BI | Pros: Integrates w/ Microsoft tools Highly intuitive user interface Import data from a wide range of sources Cons: Not many options for visuals bad at several relationships between tables issues when importing large datasets |
| TABLEAU | Pros Integrate a wide range of data sources and file types design and aesthetics Allows for interactive displays Powerful community collaboration Cons Expensive In the free version, any data becomes publicly available |
| What are the two main types of data visualization tools? | Out-of-the-box visualization and Programming |
| Out-of-the-box visualization | • Basic productivity applications • Visualization software and Business intelligence tools • Geospatial visualization tools |
| Programming | • Developer based packages |
| Local retail outlet: which products the retail outlet should continue to stock? Needs to analyze a dataset of roughly 30 products in stock, to examine trends of sales and profitability over the past one year. Which software tool should he use? | EXCEL AND GOOGLE CHARTS DUE TO PRICE SINCE ITS A LOCAL COMPANY |
| Marketing campaign analyst needs to evaluate the impact and return on investment of marketing campaigns and build dashboards and reports to drive marketing storytelling across different products and teams. Which software tool? | TABLEU AND MICROSOFT BI |
| Marketing analyst at an ecommerce platform needs to see which products the platform should continue to offer. Needs to analyze the sales, user reviews and comments, mentions on social media, and profitability of 3000 products sold. Which software tool? | PYTHON DUE TO THE VOLUME OF DATA AND TEXT |
| mediamanager for a consumer packages goods firm. Needs to examine different types of media spending by geographic areas such as designated market area and decide which regions the firm should increase media spending in. Which software tool? | GEOSPACIAL |
| Visual Cues - Position | • Commonly used on scatter plots • You compare values bases on where other are placed in the coordinate system - upward, downward, cluster, outlier |
| Visual Cues - Length | Commonly used on bar charts Length of bars in bar graph provides visual cues The longer the bar, the longer the absolute value • Start the axis at zero as people visually compare the distance from 0 to the end of the bar |
| Visual Cues - Angles | • Commonly used for pie charts • Commonly used to represents parts of a whole • Donut charts do not use angles since the center of the circle is cut out –arc lengths are used as visual cue |
| Visual Cues - Direction | • Commonly noticed in line graphs • Direction provides one basic visual cue • Direction helps with noticing trends -Slope be used to signal sharp/drastic changes in direction |
| VISUAL CUES - SHAPES | Shapes can be used to denote categories and objects • Visually shapes are readily recognized |
| VISUAL CUES - AREA | • Commonly used on area charts, bubble charts • Bigger objects represent greater values • Visually shapes are readily recognized • Make sure the scaling is correct |
| Visual Cues - Color | Hue –refers to the differ colors Saturation –refers to the density of a given color e.g gradients Be mindful of color blindness |
| Cartesian Coordinate ex. | bar and line graphs |
| Polar Coordinate ex. | pie graphs |
| Geographic Coordinate ex. | geospatial graphs |
| Scale – numerical & categorical | – numerical & categorical - time |
| Bar Chart | Intuitive • Appropriate for non–technical people • One of the most commonly used charts • Useful to visualize discrete data • Start axis at zero |
| What’s wrong with the pie chart? | ▪ Too many categories ▪ Have to switch back and forth between pie and labels ▪ Unintentional use of color ▪ Difficult to interpret data values |
| General rules about Pie Charts | ▪ Appropriate for non-technical audiences ▪ Widely used despite criticism ▪ Use when there are only a few categories ▪ Use when data sums to 100% ▪ Add labels for percentage ▪ No 3D – Keep it simple |
| when visualizing categories, use | a bar chart |
| when visualizing proportions, use | a pie chart, donut chart, or stacked bar chart |
| Visualizing Time | bar chart, line chart, Gantt chart, points |
| Visualizing Distributions | histogram, continuous density plot, box plot, stem and leaf plot |
| Temporal data Discrete Example | Number of people who graduate from A&M in the spring each year |
| Temporal data Discrete Example | Constantly changing • Can be measured at any time of the day at any interval - Temperature |
| Visualizing Relationships | Scatter plot |
| Correlation | means one thing tends to change a certain way as another thing changes. |
| Coefficient of correlation | quantifies how tightly coupled the values of two variables are with respect to each other • Take square root of R from regression estimation |
| weak correlation | 0-0.4 |
| moderate correlation | 0.4-0.6 |
| strong correlation | 0.6-1 |
| What chart to use when showing distribution of discounts? | ALWAYS BE A HISTOGRAM FOR DISTRIBUTION |
| what chart to use when showing proportion of sales by region? | pie chart |
| Measures are usually | numerical data like Shipping Cost |
| Inside of Tableau, measures are aggregations – they’re aggregated up to the granularity set by the dimensions in the view. | The value of a measure therefore depends on the context of the dimensions. |
| For example, the result for the sum of Shipping Cost is different if we have no dimensions in the view (just a single overall sum) | versus when we add Order Priority as a dimension – now we have a sum for each priority level. |
| what chart to use when showing strength of relationship between sales and profit? | Scatterplot ALWAYS shows relationship between two metrics |