This rant is inspired by this wonderful blog post which I found through Greg Linden's blog.
Most people who've worked with me might know that I'm very skeptical about using machine learning algorithms. In most of my work I've avoided using them as a solution to a problem.
My problem with machine learning algorithms is the way they are used. I think a beautiful example to illustrate this failure to use them is genre classification. Countless papers have been published claiming around 80% classification accuracy. There have even been a number of papers indicating that these 80% are close to the disagreement level between humans (i.e. the machines 80% are as good as the genre classification performance of any human).
Anyone who has seriously looked at such trained genre classifiers in more detail will have wondered why the measured accuracy is almost perfect and yet the results on a new data set are often not so satisfactory. The simple solution to this specific problem is that instead of genre classification accuracy most researchers have been measuring the artist classification accuracy because training and test set often included pieces from the same artists and most pieces of an artist belong to the same genre. (I've been arguing for the use of an artist filter since 2005, and yet I still see lots of papers published which ignore the issue completely...)
Anyway, the point is that people using machine learning algorithms often consider their problem to be solved if their model performs well on their test set. I often have the impression that no effort is made to understand what the machine has learned.
Most of the time when I explain that I'm skeptic about machine learning I'm confronted with raised eyebrows. How can someone seriously challenge best practice methods? Surely anyone who is skeptic about machine learning has not understood what it is about?
Despite all the responses I've received so far, today after having read the wonderful blog post mentioned above, I feel like there are lots of people out there (many of which are surely a lot smarter than me) who are skeptic about the use of machine learning algorithms. The next time I'm trying to explain why I think that blindly using and trusting a machine learned model is not a solution I'll point to Google's ranking of search results :-)
Having said that, I think there is a lot of potential for machine learning algorithms that generate human readable explanations for the models they generate. In particular, in the same way any human data analyst uses experience, common sense, and data to justify any decision when building a model, I'd like to see machine learning algorithms do the same. In addition, it would be nice if (like a human learner) the algorithm could point to possible limitations which cannot be foreseen based solely on the given data.
I guess I should also add that all of this is just a matter of definitions. With machine learning I mean black boxes which people use without bothering to understand what happens inside them. In contrast to my definition, many consider statistical data mining to be an important part of machine learning (which I sometimes don't, because it requires specific domain knowledge, as well as human learning, understanding, and judgment). Furthermore, I have no doubts that Google applies machine learning algorithms all the time, and that combining machine learning with human learning is a very natural and easy step, unlike my above rant might indicate.