Movie box office revenue prediction is coming of age and Hollywood is beginning to recognise that it may need the help of number crunchers to utilise it's funds better by getting behind films with higher chances of success. I remember reading in detail the article by Ramesh Sharda a few years ago, where he deployed a neural net model to predict the box office receipts of movies before their theatrical release. Risk and money involved for investors in the movie business is very high, coupled with the fact that a large portion of a movie's total revenue comes from the first few weeks after release. Interest in the ability of statistical and mathematical models to predict this revenue not only for funding purposes but for better distribution and marketing strategy has been growing in the last few years.
There are some companies and individuals who have cracked the code using a variety of variables. While there are still many expert sceptics out there, making predictions that do significantly better than chance present a win-win solution for both the developer of these algorithms and the investor and studio.
- Can it be done? YES, YES and YES! With more and more people throwing their weight behind the science of the subject, prediction algorithms in this space continue to get better and better.
- Is it easy? NO! That's where the frustration and challenge among data crunchers lies.
What's my recipe to get this right? In my experience(yes I've had the pleasure of taking a shot at this exciting problem) employ the 'layered accuracy approach'.
- Decide which part of the problem you want to tackle-pre release or post release prediction(first few weeks) or both.
- Identify the structure of your base model(this is the model that will provide you with the benchmark predictive power and understanding of the revenue aspect of the movies). Try a model structure that is easy to execute and interpret and fits the data well. Make decisions about quantitative vs. behavioral models, point estimates vs. classification into revenue groups, segment models or all movie population models. .
- Use tried and tested variables relevant to the model being built-star power, no. of screens, genre, MPAA rating, time of release, competition at time of release, critics ratings, sequel etc. I recommend that you breakdown any variable that is still too dense-for example create your own version of the traditional genre variable as it usually does not add much in it's present form.
- Use other not so mainstream variables-plot, positive buzz on internet forums and the Hollywood blacklist for starters. This is your creative space, use it to construct variables that you believe can add more punch to the model.
- Build the model and examine predictive accuracy and insights. Rank order the insight variables. If something does not make sense explore it again.
- Validate the model to see that it stands up tall.
- Try another model structure and see if you get better results(it's all about accuracy Watson, even a little more lift counts when we are talking millions of dollars).
- Get a movie fanatic data cruncher to do all the above for you(I promise the predictive accuracy will dramatically improve).
- Explore other non-conventional ways to better your prediction accuracy. A big area now is prediction markets.
As science makes the business of revenue prediction in movies and other entertainment areas much easier, the issue becomes less about whether we could have predicted the success of Slumdog Millionaire and more about if we want to. Malcolm Gladwell presents this case so eloquently in his absolute must-read piece in The New Yorker.