Tuesday, 10 June 2008

Machine Learning Rant

This rant is inspired by this wonderful blog post which I found through Greg Linden's blog.

Most people who've worked with me might know that I'm very skeptical about using machine learning algorithms. In most of my work I've avoided using them as a solution to a problem.

My problem with machine learning algorithms is the way they are used. I think a beautiful example to illustrate this failure to use them is genre classification. Countless papers have been published claiming around 80% classification accuracy. There have even been a number of papers indicating that these 80% are close to the disagreement level between humans (i.e. the machines 80% are as good as the genre classification performance of any human).

Anyone who has seriously looked at such trained genre classifiers in more detail will have wondered why the measured accuracy is almost perfect and yet the results on a new data set are often not so satisfactory. The simple solution to this specific problem is that instead of genre classification accuracy most researchers have been measuring the artist classification accuracy because training and test set often included pieces from the same artists and most pieces of an artist belong to the same genre. (I've been arguing for the use of an artist filter since 2005, and yet I still see lots of papers published which ignore the issue completely...)

Anyway, the point is that people using machine learning algorithms often consider their problem to be solved if their model performs well on their test set. I often have the impression that no effort is made to understand what the machine has learned.

Most of the time when I explain that I'm skeptic about machine learning I'm confronted with raised eyebrows. How can someone seriously challenge best practice methods? Surely anyone who is skeptic about machine learning has not understood what it is about?

Despite all the responses I've received so far, today after having read the wonderful blog post mentioned above, I feel like there are lots of people out there (many of which are surely a lot smarter than me) who are skeptic about the use of machine learning algorithms. The next time I'm trying to explain why I think that blindly using and trusting a machine learned model is not a solution I'll point to Google's ranking of search results :-)

Having said that, I think there is a lot of potential for machine learning algorithms that generate human readable explanations for the models they generate. In particular, in the same way any human data analyst uses experience, common sense, and data to justify any decision when building a model, I'd like to see machine learning algorithms do the same. In addition, it would be nice if (like a human learner) the algorithm could point to possible limitations which cannot be foreseen based solely on the given data.

I guess I should also add that all of this is just a matter of definitions. With machine learning I mean black boxes which people use without bothering to understand what happens inside them. In contrast to my definition, many consider statistical data mining to be an important part of machine learning (which I sometimes don't, because it requires specific domain knowledge, as well as human learning, understanding, and judgment). Furthermore, I have no doubts that Google applies machine learning algorithms all the time, and that combining machine learning with human learning is a very natural and easy step, unlike my above rant might indicate.

5 comments:

Jeremy said...

Amen!

On the other hand, would the artist filter have been found without plugging everything into a "black-box" and observing the bad behavior? Heuristics, however, suffer the same draw back - sometimes the assumed knowledge or truth has flaws (e.g., detect a saxophone, so it must be Jazz). The important thing, I think, is to "break the design" and discover when it will fail and, hopefully, how likely this is to occur.

Anonymous said...

I think there is a lot of potential for machine learning algorithms that generate human readable explanations for the models they generate.

I think that you have hit the nail on the head. I absolutely agree with this. I was talking with Gary Marchionini about this a year ago, and we ended up calling it "Explanatory Information Retrieval". I think there should be more of this.

I think you're also correct when you draw a distinction between machine learning algorithms, themselves, and the manner in which those algorithms are applied. I am also somewhat of a skeptic when it comes to pure, "bag of features", black box machine learning. I agree with you that there are aspects of what machine learning is trying to do that could be incorporated into some sort of Explanatory IR system.

Elias said...

@ jeremy (not the different one):

On the other hand, would the artist filter have been found without plugging everything into a "black-box" and observing the bad behavior?

I encountered the need for an artist filter while playing with playlist generation algorithms. So the answer is: no, it wasn't black boxes that helped me understand that problem in the evaluation.

However, I believe Brian had previously been talking about album/production effects when measuring artist identification performance. Those effects are related to the artist effect in genre classification in many ways. I'm not sure how Brian found out about those, maybe he did so using black boxes (but I doubt it).

Will Dwinnell said...

My problem with machine learning algorithms is the way they are used. ... I mean black boxes which people use without bothering to understand what happens inside them.

It seems as though your criticism centers on misapplication, and has little if anything to do with machine learning. Isn't misapplication always bad, though? I'm not clear on why you think this problem is worse in machine learning than elsewhere.

Elias said...

@ Will:

I agree: misuse is a problem everywhere. For example, p-values and significance tests have probably been misused just as much if not more. (And yet I work with p-values every day.)

However, it seems to me that the danger of misusing significance tests is well understood (at least there are plenty of critical Wikipedia articles). On the other hand, it seems that the there is a lot less awareness of how machine learning algorithms are being misused.