Saturday, 24 March 2007

Is industry outrunning MIR research?

I feel lost when trying to get an overview of all the different MIR related services emerging monthly. Industry is moving extremely fast and coming up with solutions to problems which some researchers (like me) were trying to solve.

For example, I've done some work on algorithms that can classify genres by analyzing audio. The quality of the best algorithms I've seen is not so great. On an interesting music collection with several not completely obviously distinguishable genres (obvious like techno vs. death metal) the algorithms perform a lot worse than humans. Why bother with that if a better solution already exists? With a better solution I mean one that is not limited to only genres, one that assigns multiple categories to each item: tagging by people. If you doubt it, you might want to check out the tags. I (and many other researchers) have been checking them out recently (check out Paul’s blog for some interesting ideas and comments on tagging).

So instead of trying to do some simple mathematics to understand music, just give millions of users the option to tag songs and artists (and give them benefits for doing so). If you don't trust the masses, hire experts. If you think I'm kidding check out If my ears didn't fail me, Tim Westergreen (founder of recently said in an interview that they got 600.000 tracks in their database. He also once mentioned that it takes an expert 20 minutes to annotate each track. (100 person years of work - if it’s true. Spare a moment to think of all the wonderful MIR research things you could do with that kind of data.)

Btw, it's not just music classification that seems solved. You’ll find services that successfully create playlists, give recommendations, ...

I'd also dare to say that it's not just stuff I've been working on that seems to be outrun by industry. Have you ever tried Midomi query-by-humming (or singing, or whistling)? It works!

People in the MIR research community have been working on query-by-humming for many years. As far as I know the mainstream research direction was to extract the melody information from human input, and compare it to melody information extracted from the music, and match them.

I know nothing about query-by-humming research. However, I've seen demonstrations and the results were not so great. Some people (like me) don’t sing/hum accurately enough. Furthermore, extracting melody information from songs is a lot harder than it seems. So while researchers were working on the problem and envisioning how great a system would be that allows you to query a large archive by humming, the problem was solved by Midomi.

Midomi found a shortcut to the problem. Instead of relying on computers that understand music, they found a clever way to use the information humans can give them easily. (For an interesting analysis check out Cristian Francu’s post on the music-ir list on 2007/02/02.)

I'm not saying that any of the research was wasted. Not one second of it! Extracting (and matching) melody information is extremely interesting. It's a big step towards truly understanding music. Also developing algorithms that can analyze music and classify it into styles/genres/moods will always remain an extremely interesting research direction, despite tagging. However, it seems that in some cases the motivation for our work needs to be reformulated.

Or maybe MIR research should just give up? Wait and see what industry comes up with and then solve the problems that are left over?

Well, I got to continue writing my ISMIR paper now... which btw is my answer to all these questions :-)

And I hope everyone else is busy writing their ISMIR papers, too.


jeremy said...

Perhaps we could go into a bigger discussion in person, but I think the solutions industry is currently providing are quick fix bandaids, and not deeper, long term solutions. They might work well in the short run, but not great in the long run.

Take Pandora for example. While I love what they are doing, how well does it scale? And if it does scale, what happens when my information need does not match one of the 400 categories or labels that they pre-defined, six years ago? What if there is some acoustic/statistical/musicological feature in a song that I really like, that really defines the essence of a song for me? If that definition does not match Pandora's definition, I will be sore out of luck.

Let's look at this from the perspective of the genre task. I heard George T at ISMIR 2005 even say that he believes that the genre task is kinda silly, that it is a good first stepping stone, but that ultimately "search by genre" is not the way to go forward. However, the things that we learn by trying to label genre are going to move forward with us to the next generation of music information retrieval. What I mean is, ultimately we won't care about actually labeling songs. But the acoustic/statistical/musicological features, that we have trained automatic algorithms to extract from songs, those will be usable for much better, much more profound music similarity measures, and thus music similarity search engines.

And at that point, things like Pandora or, which rely on human annotation, will collapse. Or at least be overwhelmed by the sheer volume of songs x people. The long tail is too long for human to come back and reannotate everything. At that point, a system that automatically extracts some new essence of songs will be able to serve a much larger population.

Know what I mean?

Elias said...

Jeremy, thanks for your comments!

Scalability is a very interesting point.

I guess on Pandora's behalf one could argue that the foundations of music haven't really changed much in western culture in the last hundreds of years. I'm not a musician, but as far as I can tell there haven't been too many changes with respect to the musical scales we listen to, the harmonic structure, and even the instruments don't seem to change much. Even if there are changes, Pandora does seem to allow a certain degree of flexibility in their system... after all they are including classical music right now, which surely requires a very different description schema as the other forms of music they were playing so far.

You also mention using gerne classification to improve music similarity measures. That's something I've been trying since 2003. I'm sure you are aware of the "glass ceiling" (Aucouturier & Pachet, 2004). I've been knocking my head very hard against it... Maybe it just takes someone with smarter brains to crack it, but I'm also inclined to think that it's comparable to what we have been seeing in speech processing or image processing. To get computers to do the simplest tasks that humans can do seemingly without any efforts, we might need to construct an artificial intelligence which has a common sense and sensors similar to ours (or at least a good understanding of how our sensors work). And of course, such a machine would need to know about human emotions. Talking about scalability, I don't see the necessary research budgets scale up to that anytime soon :-)

I also don't see collapsing. What time frame were you thinking of? I'd bet a lot that will manage to scale more or less smoothly :-)

It would be great to have some numbers available for discussions like this... for example, how many new songs does audioscrobbler track every month? How is the ratio between the average number of times a song was listened to vs. number of total songs available changing? Based on numbers like that it shouldn't be too hard to predict the day the music recommendation as we know it dies.

I got to get back to writing my ISMIR paper now... I'll try to include the word scalability somewhere in the intro. (Thanks again, Jeremy!)

jeremy said...

I also don't see collapsing. What time frame were you thinking of? I'd bet a lot that will manage to scale more or less smoothly :-)

Elias, I don't mean that or Pandora will "die", in that they will go out of business or will stop providing their service. What I actually said was "And at that point, things like Pandora or, which rely on human annotation, will collapse."

By collapse, I mean "be unable to provide a search service that handles particular types of user information needs".

Let me give you a more focused example. Suppose a new style, let's even say "genre" (though I personally think that term is too broad) of music suddenly appears. That genre is broadly defined by a certain type of percussive timbre combined with a certain type of funky rhythm combined with.. oh.. let's say.. accordions.

If Pandora's human label annotators have not marked all the songs in the collection with that percussive timbre, or with that rhythm type, or with that instrument type, then they will never be able to find similar songs of this nature, without going back and manually annotating tens of millions of songs, searching for that particular rhythm and instrument, etc. Thus, in the face of this particular information need, Pandora will "collapse".

By the same token, even if has a few people that manually, "wisdom of crowd" tag a few of these songs with the correct rhythm and timbre information, since there is no way of propagating those labels, there is no way of telling whether a song with no tags is an instance of this genre or not. Again, will "collapse" when trying to meet the user's information need.

That's all I meant.

Elias said...

Thanks again, Jeremy. I understand why Pandora has scalability issues.

Regarding ...
If a few people start listening to this completely new type of music, then based on their previous listening experience can figure out that other listeners (with a similar previous listening experience) might want to give it a try too. (No tagging needed.)

Am I missing something?

jeremy said...

If a few people start listening to this completely new type of music, then based on their previous listening experience can figure out that other listeners (with a similar previous listening experience) might want to give it a try too. (No tagging needed.)

Well, I was abstractly considering the "I have listened to this song" metadata as a sort of social tag. Not a keyword tag, but a behavior tag. Kinda like what Steve Gillmor talks about when he talks about "gestures".

But the point is this: Until the first person listens to the song, no one is going to know that the song exists. The "long tail" of songs x information needs means that there are going to be a lot of singleton, disconnected graphs.

I think I am still not doing a full job of explaining what I mean. :-(

Elias said...

I think I am still not doing a full job of explaining what I mean. :-(

Well, even if I might be missing the points... at least you have given me some interesting stuff to think about in the last days :-)

I wonder how much of a problem singleton, disconnected graphs are. If a song doesn't connect to the world, couldn't you blame the artist?

I think Musicians always have, and always will need to make the first steps. They need to reach out and connect to an audience. Play gigs. Give friends music to listen too. Sure, getting attention is hard, and it's getting harder. But communities like can help them find their audience.

Anyway, I hope you'll be attending ISMIR. I'm sure there will be lots to discuss.

Paul said...

Great post Elias, on a subject that I think about quite a bit. And great comments too Jeremy.

Jeremy points out that for systems like Pandora, scalabilty is a big issue. Now, if you talk to Pandora, they will tell you that there is no shortage of out-of-work musicians that they can put to work classifying music. But
when it gets down to it, it still costs them at least $10 to analyze a song. In a world where there are 50,000 new albums released per year (about 500,000 tracks), they can keep up. But when everyone with a laptop and garageband is putting tracks on the net, it is not hard to imagine a world where there are 1,000,000 new tracks released onto the web *every day*. Already, Pandora has to have a gatekeeper that decides which music is in and which is out. So it is mid-tail at best. Now for many people, that is just where they want to be. They want all the junk filtered out, so in the end, the gatekeeper that Pandora has turns out to be a feature. A listener can count on Pandora having filtered out the 90% of music that is crap. and other social systems work really well. I'm amazed at how many different ways one can manipulate the data to get good music recommendations. I think is only skimming the surface with what they can do with their data. I don't think they will collapse. But they do have problems that are difficult to deal with. With millions of users contributing billions of pieces of taste data, there is incredible inertia in their system. It is really difficult for a new artist to make an entry into a system like With the social model we are back to the old days where the 'rich get richer'. Popular bands get recommended a lot, and because they are recommended a lot, they are played a lot, which means they are recommended a lot. Another problem these social systems tend to have is their lack of transparency as to why they recommended something. Pandora can tell you it recommended music because it has a 'minor tonality with female vocals similar to Evanescence' while can only tell you it is recommending Y because people who listen to X also listen to Y.

And finally, even with the depth that, at some point you come across a track that has only been listened
to by a few hundred people (or less). These tracks have only sparse similarity data, it gets difficult to position these tracks in the similarity space. There just isn't enough data. If you want to do nifty things like visualize a music space, then this sparesness becomes a real problem. The automatic methods don't have these problems.
I think that the social method that we've seen be so successful in the 'industry' have their strengths, but they also have their weaknesses. The content-based methods that we see coming out of the MIR community really are the only way to deal with the incredible growth in the amount of new music that we'll see in the next few years.

Elias said...

Thanks for your comments Paul. It all makes sense to me, except the part where you say:

With the social model we are back to the old days where the 'rich get richer'.

I'd like to challenge that. But I might need to think about it a bit more.

It seems odd that a service that helps you discover things you've never heard before is supposed to favor those that are popular already. Isn't the greatest part about that you can beam yourself far out of the mainstream with just a few clicks? In fact, a significant part of the community seems to be quite proud that they are not mainstream. (I'm sure you've seen the inofficial mainstream-o-meter already.)

Elias said...

I forgot to add that I found the taste-o-meter comes via the music interfaces blog. user "v11v11v" (the author of music interfaces) writes:

Or to put it another way, how underground are you? Fun for subscribers. I eagerly entered my own ID and learnt that I’m 10.01% mainstream.

Btw, I'm 40% mainstream :-(

Stephan said...

hihi! 8.89% mainstream, so proud :) and thanks for the controversial positions on this thread!