Tuesday, March 31, 2009

Quantifying creativity in developing and evaluating package design

I'm a product and graphic design buff, and as I sat drooling over Phaidon's wonderful book Area_2 on upcoming graphic designers, I wondered if quantitative research could really pick out winners in this creative field? I am no expert but I can tell instantly if I like what I see or don't like what I see in a visual image. If I am undecided, I need to process the visual and then understand it before I take a call.

Yes, things work a lot differently when product packages are on the shelf and consumers are filtering the visual among others with heaps of information in their heads (brand affinity, the time they have, size of package they buy, advertising awareness, frequency of buying and a lot more). But is it so hard to pick a winner quantitatively when it comes to package design or do companies simply rely more on non quantitative or flawed quantitative approaches to choose a winner?

I am a loyal Tropicana consumer and I thought the change in package design for the brand smacked of 'not listening to consumers and not quantifying their voice in research', else why would a design change that drastic (it makes the brand look ordinary) make it through research? If consumers called, e-mailed and telephoned to express their feelings about the new design, where were these consumers when the design was tested? A well done online test (among other tests) with the right samples of loyalists and other segments would have saved PepsiCo a lot of grief.


What could have gone wrong in the research? Some hypotheses I generated about the consumers in the study:

  • They were the wrong sample(it can happen)
  • They were not enough in size and voice
  • They gave wrong answers
  • They favored the new design but had a violent reaction later when they saw it on the shelf and wanted the old packaging back(blame it on the recession)
  • They were misinformed or did not understand the research
  • They were not taken seriously about something as creative as packaging design
  • They could not evaluate the new design clearly since it was a radical change from the original
  • ...

A combination of qualitative and rigorous quantitative research (and I prefer quantitative for all but initial research) can pack a punch when it comes to developing and evaluating package design. Here is how to get it right:

  1. Set goals and objectives for the new design using qualitative research.
  2. Communicate the objectives and vision for the new design clearly to package designers.
  3. Evaluate the initial rough designs through online testing. Identify the best four or five.
  4. Fine tune the best designs through quantitative research.
  5. Quantitatively test the best designs via various simulated tests (online or offline) to identify the winner.
  6. Go ahead with the winner design only if it emerges as a clear winner with respect to the control (keeping in mind the status quo bias in marketing research).

Online package design research tools are helping marketers evaluate and quantify how consumers will react to the creative aspects of the design. Package Design Magazine talks about three of these solutions.

Pure quantitative analysis of a creative process like package design is still viewed with skepticism among marketers. However, using the numbers to aid in the creative process helps companies avoid big mistakes and let's designers work and create within a framework that echoes the consumer's needs and wants.

One loyal customer is happy Pepsi scrapped the new Tropicana package and bought back the old.

Why frequentist statistical approches still win in analysis of market research data?

I recently read part two of Ray Kent's article 'Rethinking Data Analysis-Some alternatives to frequentist approaches' in the latest issue of the International Journal of Market Research(Vol. 51 Issue 2). The article makes a case for looking at alternatives such as bayesian statistics, configurational and fuzzy set analysis, association rules in data mining, neural network analysis, chaos theory and the theory of the tipping point when data does not meet the requirements of frequentist approaches.

My point of view on the article is :
  1. As someone who works in this field, it is annoying to be constantly told about limitations of frequentist methods that I am aware of.
  2. The reasons for lack of adoption of newer more appropriate techniques in market research are more basic than researcher knowledge(or lack of in this case), challenges in presenting results or client adoption.

Here are some reasons why a lot of market researchers continue to rely primarily on frequentist approaches:

  1. Most researchers are not statisticians thus find it hard to understand and apply complex newer techniques. In fact most market research companies don't have an adequate number of statisticians on board.
  2. Companies need to put money, research and time behind these techniques in order to sell them to clients(we see this trend among data analysis software companies like SAS, Sawtooth software, Latent Gold etc). Without this, it is difficult for lone researchers to push newer ways of analysing data to clients.
  3. Researchers prefer to be 'shown' and not 'told' how these new techniques are applicable to their industry. A lot more collaboration is needed among academicians and practicing researchers to apply these techniques to relevant data in order to see the merits. Trying to replicate results of published articles in real time still falls under 'exploratory research not paid for by the client'.
  4. While it makes sense to argue for an approach that looks at data using a variety of techniques, in reality researchers are pushed for time and looking at various alternatives is very hard.

It feels good to get that off my chest...


Wednesday, March 18, 2009

Breakthrough Ideas for 2009: Should you get together a global analytics team?

I read Thomas Davenport and Bala Iyer's article 'Should you outsource your brain?' in the Harvard Business Review 'Breakthrough ideas for 2009' with mixed feelings. The title of the piece is provocative and fails to address the real issue. It talks about how companies are now also outsourcing their decision making analytics along with other less cerebral jobs to countries like China, India and Eastern Europe. Towards the end of the piece they mention a shortage of talent, deep domain expertise at the vendor's end and project structure of the off shoring engagement as some reasons for outsourcing important decisions to third parties.

Here is my problem with this piece(apart from the title). The issue I believe is not about whether companies should outsource their brains(decision making based on analytical insights would be a more appropriate term) but about how these companies can gain competitive advantage through analytics by leveraging a global talent pool of professionals. That this global talent pool happens to reside primarily in India, China and Eastern Europe is incidental.

Shortage or a lack of talent in analytics exists across the world not just the US. In India, for example while there are a larger number of people with statistics degrees, they still lack skills that allow them to apply the same to real life business problems. Statistics and analytical thinking are hard to teach and the teaching methods available at universities across the world do not match the requirement of the industry. It is then left up to individual organisations and passionate practitioners to train and mentor new professionals in this field or wait for them to learn by experience. In the short term however, this training and mentoring does not figure as a top priority in most companies. Thus as a result everyone is left fighting over the same small talent pool and no one wins.

Analytics required to take strategic decisions involves the working together of people with different skill sets, for example-the marketing manager, key people in marketing and other areas in charge of various customer portfolios and an analytical team of people adept at data management, statistical analysis and insight generation. For companies that are comfortable with analytical decision making the main challenge lies in the recruitment and retention of analytical talent. I know of instances where six months to a year are kept aside to hire middle level talent for advanced analytical jobs. Under these circumstances what should these companies do?

Graduate programs in statistics and mathematics in the US are flush with students from China, India and Eastern Europe. Most of these graduates go on to find jobs in programming, risk management, database management and bio-statistics. Ironically, the pharmaceutical industry in the US has managed to leverage these global talent pools much better than any other. The rest of us could learn from their experience.

Since analysis of data, technical model building and statistics are necessary to make decisions in analytically savvy companies, pulling in professionals from across the world to aid in the process is the smart way to go. What this does is give these companies the firepower they need to make sound decisions rather than those based on the past or driven by hasty analysis or the gut.

The trend for 2009 is not that businesses are outsourcing their thinking but that they are recognising that adding talent from across the globe to their rolls helps them get ahead in the game.

Tuesday, March 17, 2009

Analytical harakiri-Ignoring latent class models

My black books are out and here are some notes that were jotted down on projects;
  • Scenario 1: You're working on an analysis to study drivers of purchase intent and you keep feeling that different sets of of customers may have different drivers. The client however, has not asked for segmenting the customer base.
  • Scenario 2: Your data is a mix of continuous, ordinal and nominal variables, yet you continue to try and segment your customers by using some distance metrics. You wonder why two customers are slotted in different segments though there is not much difference in their average spend or other variables?
  • Scenario 3: You know the major themes that your client's brand stands for but are unable to fully break them out from the data into meaningful, actionable sets and subsets for understanding or positioning purposes.

The reason why researchers and analysts continue to grapple with these issues in market research data is because they are not getting the most from their analysis. One reason I love statisticians in social science is because they are the 'early adopters' for most new and emerging techniques in statistics. They quickly work on and distill new learning's in the area to their everyday problems and thus boldly go where the rest of us are timid or lazy to venture.

How do we get more from our analyses in the areas of regression, choice, factor and segmentation problems? Latent class or finite mixture models are the answer. These set of models differ from the traditional models in their structure due to the inclusion of one or more discrete latent(unobserved) variables into the model relationship along with the observed. Thus categories/classes of these unobserved variables are interpreted as latent segments.

Some key advantages of latent class models are:

  1. They are less affected by data not conforming to modeling assumptions(linear relationship, normal distribution, homogeneity)
  2. They work with mixed scale(continuous, ordinal, nominal, count) type variables in the same analysis
  3. They are able to simultaneously do two analyses together i.e. segment and predict thus eliminating the need for two steps in an analysis

In the area of segmentation, these models bring in a model based approach and an ability to accommodate categorical and continuous data and predictive and descriptive segmentation under a common modeling framework. This leads to far superior and insightful results in testing and estimation of market size and structure and profile of market segments. The probability based segment prediction criteria provides a more realistic picture of market reality since consumers can belong to more than one segment at a time. Some areas that latent class segmentation models should be used to 'do more' with the data are classical segmentation problems including descriptive ones, global segmentation and studying change in segments over time.

When working with factor analyses, latent class factor models are better able to make composites out of variables because they handle non-continuous data in a more elegant way. Plots and perceptual maps generated from the analysis score over the traditional technique because the factor scores have probabilistic interpretations. The categorical nature of the composites allows a more holistic extraction of themes and sub themes and helps in developing a more precise brand positioning strategy. Use of fewer variables to form factors is an added plus. Applying latent class factor models to attribute data(with ordinal and categorical scales) is an absolutely delightful exercise once the results are compared to traditional factor analysis. The new models provide a more accurate and vivid picture of the brand/product.

Very often analysts pass off the low predictive power of a regression model to a 'lack of all the right explanatory variables' and tell clients that the model may have been better if they had more data. What they don't check is that the same model may not hold for all customers. Latent class regression modeling allows for a simultaneous segmentation and regression of the data thus unearthing latent segments that may have different regression equations and estimates. This makes the estimation more precise and gives clients a more informative way to look at a drivers analysis. Applications of latent class models in regression lie in the area of conjoint analysis, customer satisfaction studies, purchase intent drivers or any traditional regression model that benefits from explanation of unobserved variance in the data.

Latent class models thus represent powerful improvements in model building, prediction and insight generation over traditional approaches to segmentation, factor and regression analysis. They truly allow the data to talk much more and analysts need to take them mainstream by learning to use them and unleashing them on projects which can benefit from the same.

Tuesday, February 24, 2009

Predicting the success of Slumdog Millionnaire and other movies

The dust from the Oscar award ceremony has settled and my fellow countrymen continue to go about their businesses with big smiles on their faces over the oscar wins for AR Rahman, Resul Pookutty and the film. A question that I keep thinking about is whether analytics could have predicted the success for Slumdog?

Movie box office revenue prediction is coming of age and Hollywood is beginning to recognise that it may need the help of number crunchers to utilise it's funds better by getting behind films with higher chances of success. I remember reading in detail the article by Ramesh Sharda a few years ago, where he deployed a neural net model to predict the box office receipts of movies before their theatrical release. Risk and money involved for investors in the movie business is very high, coupled with the fact that a large portion of a movie's total revenue comes from the first few weeks after release. Interest in the ability of statistical and mathematical models to predict this revenue not only for funding purposes but for better distribution and marketing strategy has been growing in the last few years.

There are some companies and individuals who have cracked the code using a variety of variables. While there are still many expert sceptics out there, making predictions that do significantly better than chance present a win-win solution for both the developer of these algorithms and the investor and studio.

  • Can it be done? YES, YES and YES! With more and more people throwing their weight behind the science of the subject, prediction algorithms in this space continue to get better and better.

  • Is it easy? NO! That's where the frustration and challenge among data crunchers lies.

What's my recipe to get this right? In my experience(yes I've had the pleasure of taking a shot at this exciting problem) employ the 'layered accuracy approach'.

  1. Decide which part of the problem you want to tackle-pre release or post release prediction(first few weeks) or both.

  2. Identify the structure of your base model(this is the model that will provide you with the benchmark predictive power and understanding of the revenue aspect of the movies). Try a model structure that is easy to execute and interpret and fits the data well. Make decisions about quantitative vs. behavioral models, point estimates vs. classification into revenue groups, segment models or all movie population models. .

  3. Use tried and tested variables relevant to the model being built-star power, no. of screens, genre, MPAA rating, time of release, competition at time of release, critics ratings, sequel etc. I recommend that you breakdown any variable that is still too dense-for example create your own version of the traditional genre variable as it usually does not add much in it's present form.

  4. Use other not so mainstream variables-plot, positive buzz on internet forums and the Hollywood blacklist for starters. This is your creative space, use it to construct variables that you believe can add more punch to the model.

  5. Build the model and examine predictive accuracy and insights. Rank order the insight variables. If something does not make sense explore it again.

  6. Validate the model to see that it stands up tall.

  7. Try another model structure and see if you get better results(it's all about accuracy Watson, even a little more lift counts when we are talking millions of dollars).

  8. Get a movie fanatic data cruncher to do all the above for you(I promise the predictive accuracy will dramatically improve).

  9. Explore other non-conventional ways to better your prediction accuracy. A big area now is prediction markets.

As science makes the business of revenue prediction in movies and other entertainment areas much easier, the issue becomes less about whether we could have predicted the success of Slumdog Millionaire and more about if we want to. Malcolm Gladwell presents this case so eloquently in his absolute must-read piece in The New Yorker.

Thursday, February 19, 2009

The Practical Statistician-A Toolkit

I have had the pleasure of working with a lot of statisticians, mathematicians, data miners and econometricians (let's call them PEMD-persons extracting meaning from data, for ease) in my career. An observation I have often made is that while all of them know the tools of their trade, only a few eventually go on to become excellent practitioners or as I call them 'practical statisticians' in the industry. What is it that these experts have that gets them far ahead in their trade? A toolkit that helps them survive the real world journey. Here is the list of items in that toolkit:


Item #1: Pen and notebook (a thick one)-they carry this around at all times even to bed. This helps them make copious notes when others are talking and think aloud when they are structuring their thoughts, attacking problems and analyzing outputs. They guard this notebook zealously and get visibly upset if it ever gets lost or misplaced. They recognize that in order to streamline loads of work, manage their time well, analyze the problem fully and present the output lucidly without going insane they must structure their thoughts. Written matter is the key.

Item #2: Three books for reference and speed reading skills-one is usually about the software they are using, the other two are the best applied texts on most used techniques in their field and new emerging areas(which no one else has a clue about). They read many more research articles than other people (and yes they usually do that during their breaks or in their leisure time). If they don’t understand an article the first time round, they absolutely have to read it again and again till they do.

Item #3: Data dirty fingers-they execute projects no matter how high they rank in the corporate hierarchy. They recognize that leading from the front means ability to do the work at the back end especially when all hell breaks lose.

Item #4: Non-technical speak-they are able to communicate their ideas and statistical methods to a wide audience without using statistical jargon.

Item #5: Graphs-they like to graph data and get a sense of numbers visually. This ability to look at both numbers and graphs helps them get a finer sense of the data and what they don’t know and must find out from it.

Item #6: A good dose of imagination, critical thinking and skepticism-they function like detectives and for them most business problems present cases to be cracked. After the project starts they devote all effort in cracking the case oblivious to everything and everyone else.

Item #7: Mentoring and training calendar-unless they pass on their wisdom and how they put the problem, method and experience together, they know they will continue to do the same work over and over again.

Item #8: A broad view of their role -they define their role rather than let client's, coworkers and organizations peg them. They like their roles to be larger and ‘more whole’ not constrained by their degree and specialization.

Item #9: Practical adequate solutions-while striving for the best solution, they recognize that they may need to deliver less optimum solutions based on project constraints and client readiness.


Item #10: A passion for statistics-especially it's applications in different fields, and an understanding of what it can and cannot do.

Wednesday, February 18, 2009

Trends: R vs. SAS-What's really at the heart of the matter?

Okay, I promised myself that I would not jump into this debate and I bit my tongue and fingers like a thousand times last week. Go ahead and shoot me I'm only human.

Here goes...

Methinks this R vs. SAS debate is less about the merits and demerits of the two software and more about the David vs. Goliath(or Hare vs. Tortoise) effect. David, in this case also provides strong competition in a slightly monopolistic market situation.



I have worked with SAS and I don't have a strong opinion against it(except for it's really bad graphics). I am new to R and I like it(yes, there will be some pet peeves as time goes by). I have also used most other competing software in this space(SPSS, Minitab, Stata, Matlab etc).


So what's the issue you may ask? Well, no matter what anyone says(or posts), I believe one of the main reasons that R is generating such a lot of press(and don't get me wrong-it has strong merits) is the fact that with all it's merits it is also FREE! Whether we like to admit it or not, it bothers us that we have to pay for using SAS when R which is as good(if not better in some areas) is available for zero cost. Would the same debate be as heated if R did not deliver? I doubt it.

Add to this the point that R comes in as the 'underdog' that most of us like to see win and you get a better idea of why there is so much angst all around on this issue.

Enough said.