click below
click below
Normal Size Small Size show me how
Marketing Analytics
Final
| Term | Definition |
|---|---|
| Regression Modeling | Captures the strength of a relationship between a single or numerical dependent or target variable one or more predictor variables. (numerical or categorical) |
| Dependent or target variables | Variables being predicted (purchase amount) |
| Independent variables | Make a prediction (email promotion & income level ) |
| Linear Regression | A relationship between the independent & dependent variables is represented by a straight line that best fits the data. |
| Simple Linear Regression | ŷ = b0 + b1x + e |
| Ordinary Least Squares (OLS) | Minimizes the sum of squared errors. The goal is to find the line that produces the minimum sum of squared errors in the estimating the distance from the points to the straight line. |
| Multiple Regression | Is used to determine whether two or more independent variables are good predictors of the single dependent variables. (price, promotion, weather, location ) |
| R2 | Measures the amount of variance in the dependent variable that is predicted by the independent variables. Closer to 1 better prediction, closer to 0 not a good predictor. |
| Descriptive Regression | Identifies the relationship between the dependent and independent variables. |
| Explanatory Regression | Identifies the causal inferences based on theory-driven hypothesis. |
| Training dataset | Develops the initial set of weights for the predictor variables. |
| Validation dataset | The regression model estimates the target variables when compared to the actual values. |
| Optional dataset | Test data, how well it performs to the third data set. |
| Main Focus | Represents causation & association between dependent and independent variables. Used to predict a new observation. |
| Model use and reporting | Interpreting the coefficients & the strengths of the relationships between the dependent and independent variables. |
| Datasets | Builds models. Divided in Training dataset & Validation dataset |
| Performance Measures | Uses coefficients significance, goodness of fit, & overall model. |
| Mean Absolute Error (MAE) | Measures the absolute difference between the predicted & actual values of the model. |
| Mean Absolute Percentage Error (MAPE) | Is the percentage absolute difference the prediction is, on average from the actual target |
| Root Mean Squared Error (RMSE) | Indicates how different the residuals are from 0. |
| Residuals | The difference between the observed & predicted value of the dependent variable. |
| Hold-out validation | 2/3 of the data is randomly selected to build the regression model. |
| N-fold cross validation | Dataset is divided into samples. Datasets mutually exclusive are folded into 10. One dataset is selected as a validation dataset and the other 9 are training datasets. |
| Dummy Variables | Dichotomous value from categorical value. |
| Overfitting | Samples of characteristics are included in the regression model that cannot be generalized to new data. |
| Multicollinearity | Situation where the predictor variables are highly correlated with each other. Can be misleading, the independent variables coefficient estimates can be unstable, & size of the coefficient can also be incorrect. |
| Feature Selection | Either quantitatively or qualitatively |
| Backward Elimination | The least predictive feature is removed, repeated until the variables remaining in the regression model are statistically significant. |
| Forward Selection | Creates separate regression models for each predictors. |
| Stepwise Selection | Adds variables at each step, but also includes removing variables that no longer meet the threshold. |
| Cluster Analysis | Segmenting a market using shared characteristics. |
| K-means Clustering | Uses the mean value for each cluster & minimizes the distance to individual observation. |
| Cluster Centroids | A cluster seed is randomly selected & designated as initial cluster centroids, means are calculated. |
| "Elbow" chart | Evaluates the reduction in cluster error as a large number of clusters. |
| Silhouette score | Identifies the optimal number of clusters for the data. +1= correct cluster, 0 =poor fit |
| Standardized Variables | Converts variables to comparable measures. |
| Hierarchical clustering | Identifies subgroups |
| Agglomerative Clustering | bottom-up approach, each observation considered a different cluster. |
| Divisive Clustering | top-down approach, all records assigned to a single cluster. |
| Euclidean | The distance is measured as the true straight line distance between two points. |
| Manhattan | City Block, distance measures path with right turns. |
| Matching | Measures the similarity between 2 observations with values that represent the minimum difference between two points. |
| Jaccard's | Measures the similarity between 2 observations based on how dissimilar two observations are from each other. |
| Complete Linkage | The maximum distance between observations in 2 different clusters. |
| Single Linkage | The shortest distance from an object in a cluster to an object from another cluster. |
| Average Linkage | The group average of observations from 1 cluster to all observations. |
| Ward's Method | Applies a measure of the sum squares within the clusters summed over all variables. |
| Dendrogram | Treelike graph provides illustration of the hierarchy of clusters in the dataset. |
| Influencers | Individuals who initiate or actively engage other in conversation & are often well-connected to others in the network. |
| Social Network Analysis | Identifies relationships, influencers, information dissemination patterns, & behaviors among connections in network. |
| Nodes | An entity, vertex. (people or product ) |
| Edges | Links relationship between nodes. (friendship or family ties ) |
| Edge weight | The strength of the relationship between two nodes. (Thicker the line, higher the exchange ) |
| Graph | Visualization that shows the nodes relationships. |
| Singleton | A node that is unconnected to all other in the network. |
| Egocentric Networks | An individual network. (Alters) |
| Directed vs. undirected networks | Network connections can be directional, shows how each node can be connected to others. Two-way direction= undirected |
| Density | Measures the extent to which the edges are connected in the network & indicates how fast the information is transmitted. (higher density, faster the information is transmitted) |
| Distribution of Nodes | Measures the degree of relationship or connectedness. |
| Measures of Centrality | The influence of nodes has in the network & node's strategic network position. |
| Degree of Centrality | Measures centrality based on the number of edges connected to the node. |
| Indegree vs. outdegree | Indegree= # of connections that point in toward a node Outdegree= # of arrows that begin with the node & point toward other nodes. |
| Betweenness Centrality | Measures # of times a node is on the shortest path between other nodes. |
| Closeness Centrality | Measures the proximity of the node to all other nodes in the network. (higher the closeness scores, shorter the distance) |
| Eigenvector Centrality (relation centraility ) | Measures the # of links from a node & the # of connections those nodes have. (0 no centraility to 1 high centraility) |
| Louvain Communities | Measures non-overlapping communities or groups of closely connected nodes in the network . |
| Polarized Crowd | Structure is split into 2 groups. |
| Tight Crowds | Twitter topics are all highly interconnected by similar conversations. |
| Brand clusters | Structures have many independent participants that might share information about a popular topic or brand but are not interacting much with each other. |
| Community Cluster | Structure represents groups that are large & connected, has a few independent participants. |
| Broadcast Networks | Structure represents participants that disseminate & appear like a hub-&-spoke structure that has a central anchor. |
| Support Network | Structure represents unconnected participants that are connected by the anchor & result is outward spokes. |
| Link Predictions | Objective is to predict new links between unconnected nodes. (looking for possible links) |
| Digital Marketing | Marketing touchpoints that are executed electronically through a digital channel to communicate & interact with current & potential customers & partners. |
| Owned Digital media | Managed by a company, includes touchpoints = email marketing, social media pages, & company websites. (personalized content) |
| Paid Digital media | Exposures that the company pays others. (influencers, sponsors) |
| Earned Digital media | Communication or exposure not initiated or posted by the company. |
| Digital marketing analytics | Enables marketers to monitor, understand, & evaluate the performance of digital marketing initiatives. |
| Quantity of impressions or visitors | How many people see the advertisement (impressions)? |
| User Demographic | The various demographics |
| Geography | The location of the owend, paid, & earned media. |
| Traffic source | Which sources are the visitors choosing to click on paid advertising or to enter the company social media pages or websites. |
| Campaigns | How are different marketing campaigns driving visitors to the website? |
| Pageviews | Represents the # of sessions during the particular web page was viewed. |
| Frequency of engagement | Frequency of visitors returning to a site, within certain time frame. |
| Site speed | The quickness of user interaction |
| Bounce Rate | Rate at which customers are arriving to your site, ¬ clicking on anything else. |
| Click-through rate | The rate at which customers are clicking on paid advertsiment or campaign emails. |
| Site Content | The pages that are most & least visited. |
| Site search | The things that visitors are searching for. |
| Conversion Rate | The effectiveness of the marketing campaign & website design. The ratio of visitors who have become customers. |
| Conversion by traffic source | The origins of visitors entering the site and result in conversions. |
| A/B testing (split testing ) | Enables marketers to experiment with different digital options to identify which one are likely to be most effective. (test and respond to how visitors respond to change) |
| Treatment | Digital marketing intervention being tested. (placement or color) |
| Multivariate Testing | Enables companies to test whether changing several different variables on their websites at the same time leads to a higher conversion rate. |
| Multichannel attributions | Assess how, when, & where these various touch points influence customers. |