click below
click below
Normal Size Small Size show me how
DA - K14-S11
“The principles of descriptive predictive and prescriptive analytics.
| Term | Definition |
|---|---|
| An example of Predictive Analytics | Predictive analytics uses data to forecast future outcomes |
| What is Predictive Analytics | Forecasting and predicting trends using historical data |
| The benefits of Predictive Analytics | Improves decision making as the business can act proactively instead of reactively, allowing you to be better prepared ahead of time. |
| What competitive advantage does it provide? | Gives a competitive edge over your competition if you can identify areas of weakness or trends over others. |
| The biggest risk with Predictive Analytics. | Data quality is a risk, if the quality of your data isn’t consistent or accurate, you may be making incorrect predictions. |
| Additional points on the risk of using Predictive Analytics | Over reliant on predictive models rather than your taking a step back and truly analysing the data yourself and taking your own approach. We sometimes know industries better than machines. |
| How would you ensure source data quality? | Clean the data; Remove errors; Standardise the data; Remove or fill missing values; Keep data uptodate; Check for bias; Check meta data of a file; Check file hashtags for data transfers |
| How can you check for the authenticity of a source data? | Check to find the root source of the data; Check the data lineage or audit trail of where the data came from; cross reference sources with other searches and authors for trusted confirmations of data sources. |
| Statistical Sanity Testing using Benford's Law | In many naturally occurring datasets (like financial records or street addresses), the leading digit is more likely to be a 1 (about 30% of the time) than a 9 (less than 5%). If the leading digits are evenly distributed (11% for each), the data may be fabricated. |
| Duplicate Analysis | Genuine data usually has a natural amount of "noise." An identical row appearing thousands of times without a logical reason (like a system error) suggests the data was "padded." |