Monday, April 13, 2009

Poor randomized testing-why a rose by any other name does not smell as sweet?

While rigorous testing of new ideas, offerings and approaches is the order of the day at companies like Capital One(my hero), Amazon, Google, Netflix, some retailers, direct marketers and pharmaceutical companies; at many others important decisions are still based on 'gut feel' and 'wrong evidence'. In spite of the availability of software, capability and adequate research in the area of randomized testing, most companies still continue to flounder when it comes to executing a test.

The two main reasons why I have seen testing break down(in spite of good intentions and an adequate hypothesis) are-

  1. Lack of rigor in the design

  2. Execution of a half-hearted test to show evidence

The lack of rigor in the test design creeps in in many ways:

  • Small sample sizes(not adequate to yield statistically valid results)-Clients usually quote costs as an issue for the same, however a large margin of error in the results make the test a no go right from the start. This applies to not just the overall sample sizes but also sample sizes for the breakouts at which data needs to be analyzed and reported.

  • Inadequate matching of test to control groups-Not enough analysis and matching is done of the test and control groups which should be almost comparable. Thus results from the analysis cannot be pegged to the new stimulus due to confounding factors present. The rush to start the experiment is another reason for this lack of fit between test and control.

  • Wrong number of cells in the design-While complex designs, usually factorial exist that reduce the cells needed without compromising reads on the data, simple less adequate designs continue to be used. While I like the idea of simple models being able to explain complex phenomenon, that should not be a deterrent to the use of more complex models for complex real world scenarios.

  • A too short testing period-In a rush to complete the test and convey results, clients don't give the test the adequate time it needs to generate stable metrics(especially if those metrics have a high variance).

Since most marketers recognize the need for a 'test-learn-roll out' approach, the second reason why randomized tests fail is harder to understand. There seems to exist a 'need to test' to show evidence of 'having tested' and the results from such tests are couched in scientific jargon with liberal extrapolations. Initiative roll out decisions are made on the basis of these tests with numerous rationalizations, for example:

  • The results pan out for some regions, they will thus work at a national level
  • The results are positive even though margin of errors are large, with a big enough sample things will be fine

Here is my advice for marketers -

DON'T TEST if a new approach cannot be tested(for whatever reasons some of them valid). Use a think tank of key executives to do a swot analysis and go with the final call on the same.

DON'T TEST if you don't want to test due to a lack of belief in testing or a disinclination to test with rigor. Roll out the new product without testing and be ready to explain to the boss if the initiative fails. Something that looks and feels like a test is not a test.

BUT...

DO TEST if you-

  1. Want to find out what really works and put your hypothesis under a rigorous scanner.
  2. Want to optimize the money you put behind a new product or idea before pushing it to customers(who may be unwilling to accept it).
  3. Want to learn and apply and not make the same mistakes twice.

Saturday, April 11, 2009

Trends: Recommendations-Tell me what else I should buy and do it well

Here are three scenarios that address the power of recommendations and how they can work for consumers and marketers-

Scene 1: I log in to Amazon and search for the book 'Predictably Irrational', their recommendation algorithm tells me the other books that customers who bought this book have also bought i.e. 'Sway', 'Outliers', 'The Drunkard's Walk', 'The Black Swan', 'The Wisdom of Crowds' and many more. Sometimes the recommendations are interesting enough for me to look through them and I end up buying more books than I budgeted for.

Scene 2: I enter Debenhams the UK department store with my son in tow for a quick buy to wear at an anniversary lunch. I am in a huge rush thus getting it right quickly is the key. I show the shop assistant the style I am looking for and she promptly picks up three of the same kind and hands them to me. While I try them on, she comes back to give me some more tops that match my style. She tells me what a deal I would be getting on them-Betty Jackson designs at 70% reduction, that's a steal! Well you guessed it, I buy three tops and a pair of shoes and walk out happy and satisfied after thanking her personally.


Scene 3: I call Nirulas for a home delivery order and ask for my favorite item on the menu, their hot chocolate fudge (HCF's for short). For those new to this homegrown north Indian brand-they have the best hot chocolate fudge in the world. Well, before I can say 'some extra nuts and chocolate please' the order taker tells me if I were to add some extra nuts and chocolate, they would charge me Rs 17 extra for each. While the consumer in me is chagrined at having to cough up money for something I got for free for years, the data analyst in me realizes someone's been analyzing the orders and pricing better.

Recommendations make sense to us because they help us sift through piles of information and focus quickly on what will maximize our buying experience i.e. finding relevant, new and interesting things. However for them to work, the underlying assumptions must hold:

  1. They must come from a deemed 'trusted' source whose judgment we value

  2. They must hit our sweet spot in terms of experience

  3. They must be consistent and thus build trust


How does this translate at ground level, with data on purchases being recorded both offline and online-very soon I envisage walking into a store(physical or online) to be told not just what I should buy based on my taste but what else I should be looking at. While a lot of e-commerce websites offer this personalized shopping experience via crude and sophisticated variants of recommendation algorithms to users, recommendations generated to fit individual customer preferences still have a long way to go.

Consumers inundated with loads of choice want good subsets of that choice but within the context of 'what they like or would like'. Marketers would like to offer the consumer products that have a higher probability of being bought. Looking at past historical purchase data or user rating of items attempts to marry the need of the customer with that of the marketer. The problem lies in how to read and interpret what the customer is looking for? My experience has been that the answer is tied to satisfaction and loyalty. If the customer comes back for more and increases his burn rate over time, then what you are recommending is working-if not, then there is scope for improvement in the recommendation algo. Testing what recommendations worked may help in this process in fine-tuning what did not work. Analyzing customers who picked up recommended items vs. those that did not for a particular purchased product may also lend insights into what may be going on.


An interesting article by Anand V. Bodapati titled "Recommendation Systems with Purchase Data" in the Journal of Marketing Research Vol XLV Feb 2008, talks about why recommendation decisions should be based not on historical purchase probabilities but on the elasticity of purchase probabilities to the action taken on the recommendation.


How would I rank the suggestions given by the three companies based on my experience with them and would I go back for more?

  1. Debenhams: Bang on. I got what I wanted at a good price and looked at a right variety of relevant alternatives before making my choice(remember time was an issue). Would definitely go back.

  2. Amazon: It's a hit or a miss and the list of suggestions is very long and not always worth the browse. They could do better but it's not bad. Would definitely go back.

  3. Nirulas: While I appreciate that someone recognized that the 'extras' needed to be paid for, I would like some suggestions like 'try our Jamoca almond fudge' or the 'Mango Tango is to die for'. They could do much much better. Would definitely go back(It's a monopolistic situation-no other brand comes close on the HCF).