Friday, December 5, 2008

Trends: Going the way of R and other open source software

My colleague Girish recently mailed me a New York Times business computing article about how data analysts have taken to R as the open source programming language.

The article took me back in time to 1996, when I was a graduate student in the US. Fellow statisticians were raving about R as the new generation data crunching language and something that was going to give other data packages a run for their money. We were at that point doing our statistical number crunching on student licenses of SAS. While intrigued with the whole issue, I was too busy learning applied statistics and SAS and just getting through grad school semesters.

Now years later it is with a feeling of deja vu that I read the article because today I am much closer to embracing R as 'the' crunching language for myself and our business.

We've done the testing and it's won hands down every time;

  • Ease of use

  • Readily available code modules(learning from others is a key here-we techies love to outsmart each other)

  • Wonderful graphics

  • Excellent data manipulation

  • No fees

  • Ability to customise

  • Lots more...

While competitors are quick to dismiss it, R works because it has created a democratic community of statisticians and others who like to see number crunching become easier and more visual. The fact that it is open source provides the added kick to be able to create customised modules that the community can use. It blends programming and statistical skills together more elegantly than I have ever seen. The fact that it has a fan following among my tribe is therefore not surprising.

Thus, is R and other open source software the way to go-absolutely! The reasons are many but let me rank order them based on how we took the leap-

  1. Stacks up and beats competition on most data crunching modules.
  2. Easy to use.
  3. Collaborative value model: the conviction that a collective community can create better thought and tools than a competitive one.
  4. Better service: less downtime, quicker error resolution and a help desk of people dedicated to fixing issues.
  5. Excellent customisation options: The ability to create what you want for your business and put it out there.
  6. Cutting edge graphics.
  7. The geek factor-the thrill of creating, bettering and showing off to other like minded individuals cannot be underestimated.
  8. Lower technology cost: while this is great, believe me this is not the main reason that businesses use open source.

1 comment:

Girish said...

What's encouraging is that recently, I read that Zementis ( has been working with the R community to extend the support for the Predictive Model Markup Language (PMML) standard which allows model exchange among various statistical software tools.

What this would mean is that if one develops models in R, then he/she can easily deploy and execute these models in the Zementis ADAPA scoring engine using the PMML standard.

Zementis also offers access to the Amazon EC2 cloud computing infrastructure. This not only eliminates potential memory constraints in R but also speeds execution and real-time predictive analytics.

Open source statistical computing and data mining sure is gathering steam..