Wednesday, 23 January 2008

The future of MIR?

The European Commission (Unit DG INFSO/E2) is planning to invest 2B€ in research and development (2009-2010, FP7). They recently sent out emails asking for comments from researchers:

“[…] we would very much like to hear your views on what you think are the most pressing problems and ripe opportunities in your field. We are interested as much in scientific advances as in innovative applications or infrastructural initiatives in domains where they are likely to have a large positive impact.”

(The context is “knowledge technologies, interactive media and online content”.)

It’s an interesting question to think about.

I’m tempted to point them to work on unstructured collaborative tagging of music, ie, folksonomies. There are lots of interesting opportunities there, some of which might still be there by the time the FP7 projects start.

If you have any ideas, let the EC know:
http://cordis.europa.eu/ist/kct/fp7-consultation.htm

UPDATE: Jeremy points out some interesting topics the EU should funding in the comments.

5 comments:

jeremy said...

I’m tempted to point them to work on unstructured collaborative tagging of music, ie, folksonomies. There are lots of interesting opportunities there, some of which might still be there by the time the FP7 projects start.

I still think that there is a large opportunity in music structure inference and matching. Pulling out the actual notes and rhythms and keys and chords from an audio track, and using that to infer mood, style, etc. (and even just matching raw notes and rhythms) is still a largely unsolved problem for MIR. It gets at a whole different dimension of the music than social tagging.

I think the EU should fund more research into that area

Elias said...

Jeremy,

Music segmentation is surely one of the most interesting research directions in MIR, but it's also been around for a while now. I'm not sure if there are many "ripe opportunities" in that direction? For some sub problems such as chroma-based segmentation (e.g. for chorus detection) working solutions have already been around for many years. Also solutions for related problems such as cover song detection have been around for a while now. Do you think we should be expecting a big break through in the next years?

Btw, I think that tags associated to music on a large scale could be used to aid research on music segmentation. (Even if each tags is associated with the whole piece of music and not individual segments.) For example, given 20k pieces frequently tagged "piano", and many more pieces which are frequently tagged with all sorts of tags but never with "piano": it would be conceivable to train an algorithm which can identify piano segments in a piece. (Maybe I should add that I'm thinking of Last.fm tags, where, for example, it often happens that a piece is tagged "piano" although there are only few segments in the piece where a piano can be heard.)

I'm curious what the FP7 will look like at the end. I wouldn't be surprised if part of it is about recommendations, personalization, social networks, mobile applications, and distributed computing. Although that probably wouldn't make too much sense as most of these are being tackled by industry already... and shouldn't academia have a long term perspective that is way beyond what industry is currently focusing on?

jeremy said...

I'm not sure if there are many "ripe opportunities" in that direction?

We're just barely getting to the point where we can extract semantically meaningful features from music, automatically. But no one is really applying it, at any decent scale, to the problem of recommendation and general similarity. Y'know, like Pandora does. That's the research I'd like to see. Pandora-like music similarity, but not with human music semantic tags.. with automatic music semantic tags.

Btw, I think that tags associated to music on a large scale could be used to aid research on music segmentation. (Even if each tags is associated with the whole piece of music and not individual segments.)

I'm thinking about something more along the lines of using automatically-processed musical knowledge about a song to aid recommendation. For example, suppose one person tags a song with the "son clave" rhythm. That same person tags another song with a "rumba clave" rhythm. Now, if enough people tag have both son clave and rumba clave pieces in their collection, you might start to recommend rumba clave rhythms to someone searching for songs with the son clave rhythm.

But with musical analysis of the audio, you would not need to have any social tags at all. You would just detect that a certain song had a rhythm. That rhythm happens to be son clave. And you would detect that another song has a rumba clave rhythm. And then you could do the recommendation, directly, based on the similarity of the rhythmic patterns.

In this way, you can get at many more attributes to the song than you will ever get through social tagging. With social tagging, there is a long tail of songs that only have a single tag. And many songs that are actually of the son clave rhythm will not be tagged as such. If you use musical analysis, you can label *every* song in your collection with its rhythm. Whether or not it has a tag. And then you can also have semantic similarity distances between tags, where the "waltz" rhythm would likely be very distant from the son clave, and the "rumba clave" rhythm would be very close.

Does that make sense?

jeremy said...

Also solutions for related problems such as cover song detection have been around for a while now.

FWIW, I think cover song detection solutions are trying to solve a different problem.

Take my own work from ISMIR 2002, using chord sequence similarities. One thing that I applied it to was the problem of detecting, not covers, but composed variations. (Michael Casey has a paper about the continuum between pure covers, through variations, on to genric similarity. Variations are not covers.)

What was most interesting about that work, however, was not necessarily the ''true'' variations that were found. What was interesting were the songs that were not necessarily composed as variations, but that had very similar harmonic progressions. Some Mozart pieces retrieved some plausible Beatles pieces, for example. You could really hear similar chord progressions.

So chord progressions could be used, along with rhythmic patterns, along with timbral features, to aid the recommendation process.

Certain types of music really are characterized by chord progressions, e.g. 12 bar blues. Jazz often has a II-V-I progression, for example. So that information could be used to help infer song similarites, which is something that you do not get from either folksonomic tags, nor from Pandora-level human music content tagging.

So, like said, there are some techniques that have used features like chords to do cover song detection. But that's not what makes chords interesting. What makes them interesting is that you can use them to find variations as well as genre-similar songs. And as far as I know, no one is really doing that on a large scale yet. Which is why I think it would be an interesting project for FP7.

Elias said...

Jeremy,

First of all sorry for the delayed response. I found your comments very interesting, but I've been a bit busy.

I totally agree with you: there should be more research on what you call "Pandora-like music similarity" using audio analysis. The EU should be funding research in that direction.

But I think it would need to be funded with a longer term perspective. Many of the important features that are used in Pandora's impressive music genome project are far from trivial to extract from audio. You mention rhythm as an example where reaching an acceptable accuracy might be possible in the near future. I hope you're right, but I remain skeptic (I'm aware that ball room dance music has been classified successfully. But beyond that I don't remember having seen anything that would indicate that rhythm similarity, or even just classifying rhythms in general, is something that will be solved in the next 3 years.)

As far as I'm concerned the EU should dump a significant amount of money into the labs of research institutes that have demonstrated their ability to do interesting research and give them at least the time it takes to find the right PhD candidates to work on the topics, and the time it takes to complete a PhD (i.e. at least 4 years). 2 year projects targeted at ripe opportunities doesn't necessarily seem like the best context for such research to me.

You make a really good point about scalability. I think whenever the EU decides to fund MIR research, they should also provide the necessary funding for the researchers to build up an environment that allows them to run large scale experiments. (I.e. give them the money to have access to large collections of music, and money to build up the clusters they'll need to crunch the data).