Having recently moved from an academic research lab to a private company, I realized how different it is to design an algorithm for a research paper and for a product that will be used by non-expert users. In academic research, people tend to have all sorts of tricks to do appropriate pre-processing and to choose appropriately the parameters. Indeed, they develop a dual expertise: expertise of their algorithms and expertise of the datasets they use to demonstrate them.
Besides introducing a lot of bias in the comparisons (and making results unreproducible), this shows that Machine Learning is often closer to an art than to a science.
It is a really tough challenge to actually build algorithms that can be used by someone without Machine Learning expertise. Indeed, you always face the dilemma of including tunable parameters (which may improve the performance in certain cases but are hard to manipulate for a non-expert), or pre-set them to fixed values (which may give average performance on most cases).
Chih-Jen Lin has done interesting work in the direction of implementing SVM in a usable way. He recently gave a very nice talk about this.
Machine Learning has been considerably influenced by theoreticians, which is a good thing because it helped building solid foundations. However, there has been very little influence from the experimental sciences. As a result, two important things are lacking:
- proper design of experiments,with a concern for reproducibility of results, and a community-wide agreement on an experimental methodology
- concern for usability (in other words, reproducibility or results by non-experts)
I hope that, as the research community grows and establishes more contacts with industry (where there is a booming demand), these issues will be addressed more systematically.