Tuesday, March 17, 2009

Analytical harakiri-Ignoring latent class models

My black books are out and here are some notes that were jotted down on projects;
  • Scenario 1: You're working on an analysis to study drivers of purchase intent and you keep feeling that different sets of of customers may have different drivers. The client however, has not asked for segmenting the customer base.
  • Scenario 2: Your data is a mix of continuous, ordinal and nominal variables, yet you continue to try and segment your customers by using some distance metrics. You wonder why two customers are slotted in different segments though there is not much difference in their average spend or other variables?
  • Scenario 3: You know the major themes that your client's brand stands for but are unable to fully break them out from the data into meaningful, actionable sets and subsets for understanding or positioning purposes.

The reason why researchers and analysts continue to grapple with these issues in market research data is because they are not getting the most from their analysis. One reason I love statisticians in social science is because they are the 'early adopters' for most new and emerging techniques in statistics. They quickly work on and distill new learning's in the area to their everyday problems and thus boldly go where the rest of us are timid or lazy to venture.

How do we get more from our analyses in the areas of regression, choice, factor and segmentation problems? Latent class or finite mixture models are the answer. These set of models differ from the traditional models in their structure due to the inclusion of one or more discrete latent(unobserved) variables into the model relationship along with the observed. Thus categories/classes of these unobserved variables are interpreted as latent segments.

Some key advantages of latent class models are:

  1. They are less affected by data not conforming to modeling assumptions(linear relationship, normal distribution, homogeneity)
  2. They work with mixed scale(continuous, ordinal, nominal, count) type variables in the same analysis
  3. They are able to simultaneously do two analyses together i.e. segment and predict thus eliminating the need for two steps in an analysis

In the area of segmentation, these models bring in a model based approach and an ability to accommodate categorical and continuous data and predictive and descriptive segmentation under a common modeling framework. This leads to far superior and insightful results in testing and estimation of market size and structure and profile of market segments. The probability based segment prediction criteria provides a more realistic picture of market reality since consumers can belong to more than one segment at a time. Some areas that latent class segmentation models should be used to 'do more' with the data are classical segmentation problems including descriptive ones, global segmentation and studying change in segments over time.

When working with factor analyses, latent class factor models are better able to make composites out of variables because they handle non-continuous data in a more elegant way. Plots and perceptual maps generated from the analysis score over the traditional technique because the factor scores have probabilistic interpretations. The categorical nature of the composites allows a more holistic extraction of themes and sub themes and helps in developing a more precise brand positioning strategy. Use of fewer variables to form factors is an added plus. Applying latent class factor models to attribute data(with ordinal and categorical scales) is an absolutely delightful exercise once the results are compared to traditional factor analysis. The new models provide a more accurate and vivid picture of the brand/product.

Very often analysts pass off the low predictive power of a regression model to a 'lack of all the right explanatory variables' and tell clients that the model may have been better if they had more data. What they don't check is that the same model may not hold for all customers. Latent class regression modeling allows for a simultaneous segmentation and regression of the data thus unearthing latent segments that may have different regression equations and estimates. This makes the estimation more precise and gives clients a more informative way to look at a drivers analysis. Applications of latent class models in regression lie in the area of conjoint analysis, customer satisfaction studies, purchase intent drivers or any traditional regression model that benefits from explanation of unobserved variance in the data.

Latent class models thus represent powerful improvements in model building, prediction and insight generation over traditional approaches to segmentation, factor and regression analysis. They truly allow the data to talk much more and analysts need to take them mainstream by learning to use them and unleashing them on projects which can benefit from the same.

No comments: