MIR Research: 2007

Tuesday, 11 December 2007

London Calling

Last.fm is expanding! We are seeking over 20 people to join in on the fun. The complete list is here.

And one of the open positions is related to MIR research! :-)

Btw, have I mentioned that working at Last.fm has been so much fun that I've totally forgotten about updating this blog? We get free organic fruit, we got a table tennis and table soccer table, we get enough data every day to keep our Hadoop clusters busy for hours analyzing it, we get to design and build the next generation of music recommenders and contribute to lots of other fun projects at Last.fm, we got users supplying us with instant feedback on any changes we make, and we also got the Best American pizza place and the Best kebab just around the corner (they actually mention they are the best in their name, so it must be right).

See also my previous post about MIR research at Last.fm.

Somewhat related: Christopher Raphael just posted a tenure-track position at Indiana university. Unfortunately no URL was given, but I guess the position only targets people who have already subscribed to music-ir. It's nice to see so many opportunities in MIR.

Monday, 1 October 2007

Life after ISMIR 2007

As usual after an ISMIR I’m totally burnt out. The program was intense. My brain is still trying to absorb all the interesting conversations, ideas, results, and theories that are echoing in my ears. Some of the things I saw made me realize how close we are to reach some of the scenarios we’ve been talking about in the last 5 years (for example, Klaas Bosteels demonstrated a content-based playlist generator that learns from user feedback which he implemented on his tiny MP3 player). Other things made me reconsider assumptions I’ve made in the past (for example Hamish Allan et al. presented interesting work on music similarity: "Methodological Considerations in Studies of Musical Similarity").

It was also wonderful to have the opportunity to spend time with people that seem very familiar although I only see them once every one or two years. It’s strange how reading papers can make the authors seem so familiar.

One of the many highlights of ISMIR 2007 was when Don Byrd announced the locations of future ISMIRs. I was particularly happy to hear that ISMIR 2009 will be organized by Masataka Goto et al. (in Japan!). However, ISMIR 2008 organized by Youngmoo Kim, Dan Ellis, and Juan Bello et al. (US) and ISMIR 2010 organized by Frans Wiering et al. (Netherlands) will surely be great, too. It’s good to see ISMIR, for the first time in its history (afaik), planed out for 3 years in advance. And I heard some rumors that ISMIR 2011 will be held in Vienna again because it was such a huge success ;-)

Tuesday, 25 September 2007

ISMIR Highlights

ISMIR is only halfway through and I can’t believe how many interesting things I’ve already missed. I guess it’s unavoidable given parallel sessions and so many poster presentations in limited time. Nevertheless, my brain is already overflowing and I feel burnt out. In fact there were so many interesting presentations, that I don’t have enough time to write all of them down (although it would be a great way to remember them).

One thing that might have a very high impact is that IMIRSEL is just about to launch their online evaluation system. Researchers will be able to submit their newest algorithms and find out how well they do compared to others. I think having an evaluation system like that is actually worth a lot to the whole community, and I can well imagine that (once all issues are solved) research labs will have something like a paid subscription which allows them to use a certain amount of CPU time on IMIRSELs clusters. However, to be truly successful they’d need to be 100% neutral and transparent. (Which I think means they shouldn’t have IMIRSEL show up in their rankings, and they should clarify how One Llama is linked to IMIRSEL.)

I also liked the poster Skowronek, McKinney, and Van de Par presented (“A Demonstrator for Automatic Music Mood Estimation”). They allowed me to test their system with one of my own MP3s which I had on my USB drive (I used a song from Roberta Sá) and it did really well. Another demo I liked a lot was the system Peter Knees presented (“Search & Select – Intuitively Retrieving Music from Large Collections”). Unfortunately I was asked to leave after I had been playing around with the demo for a bit too long, I guess. Ohishi, Goto, Itou, and Takeda (“A Stochastic Representation of the Dynamics of Sung Melody”) showed me some videos which I thought were simply amazing. Apparently it isn’t hard to compute them (once you know how to extract the F0 curve), but I’ve never seen the characteristics of a singing voice visualized that way. The demo of Eck, Bertin-Mahieux, and Lamere (“Autotagging Music Using Supervised Machine Learning”) was really impressive too… and it was interesting to learn that Ellis (“Classifying Music Audio with Timbral and Chroma Features”) found ways to use chroma information to increase artist identification performances. (And his Matlab source code is available!!) I once worked on a similar problem, but never got that far. Btw, it seems that chroma is everywhere now :-)
I was also happy to see that Flexer’s poster (“A Closer Look on Artist Filters for Musical Genre Classification”) was receiving a lot of attention. I liked his conclusions. There were also lots of interesting papers in the last two days. For example, I liked the paper presented by Cunningham, Bainbridge, and McKay (“Finding New Music: A Diary Study of Everyday Encounters with Novel Songs”). I particularly liked their discussion on how nice it would be to have a “scrobble everywhere” device that keeps track of everything I ever hear (including ring tones).

Beatles Chord Transcriptions

Chris Harte from the C4DM announced last night that he completed his amazing effort of transcribing the chords for all songs on the 12 studio albums of the Beatles. The transcriptions are extremely high quality. Anyone who wants a copy just needs to contact him. This will definitely boost research in any direction related to chords (chord recognition, chord progressions, harmony analysis...). It's also a good excuse for any research lab to buy the complete Beatles collection. Btw, don't forget to cite his work when you use his annotations! ;-)

Below are excerpts from the two emails he sent to the music-ir mailing list, so that Google can index them (Afaik he hasn’t set up a website for this yet).

(Chris' email is christopher dot harte at elec dot qmul dot ac dot uk).

Chris Harte wrote in his first email (Sep 24, 2007, 9pm):

[...] I have just completed work on the full set of chord transcriptions for the beatles songs from all 12 studio albums.

The verification was done by synthesizing the transcriptions in MIDI then putting that back together with the original audio (with correct timing and tuning) so that people could spot any errors by listening through to them.

Hopefully, after the verification process that we have just completed, these transcriptions should now be accurate enough to serve as a ground truth for various kinds of chord and harmony work in the MIR field.

If you would like a copy of the new version of the collection then please let me know and I will send them to you. [...]

Chris Harte wrote in his second email (Sep 25, 2007, 4am):

[...] To clear up a few things:

The transcription files are in wavesurfer ".lab" format which is just flat text arranged like this:

Start-time end-time label
Start-time end-time label
Start-time end-time label
...

Times are in seconds.

".lab" files can be opened as a transcription pane in wavesurfer (I have made a wavesurfer conf file set up for showing these transcriptions nicely if people need one) and also in Sonic Visualiser as an annotation layer (use "A point in time" for the "each row specifies" option when loading in sonic visualiser).

The chord symbols used in the transcriptions basically conform to the syntax described in our ISMIR 2005 paper "Symbolic Representation of Musical Chords: A Proposed Syntax for Text Annotations" available here:
http://ismir2005.ismir.net/proceedings/1080.pdf

There has been one slight change to the syntax described in this paper which is that now a chord symbol, which is defined as a root and a list of component degrees, should not automatically be assumed to include the given root note unless the '1' degree is explicitly included in the list
- e.g. C major can be written C or C:maj which are both equivalent to writing C:(1,3,5) so the "major" pattern should be (1,3,5) instead of just (3,5). This makes it possible to annotate a chord where it is obvious that the intended harmony is C major even though only the notes E and G are present by using C:(3,5). I hope that makes sense...

For those who do not already know, I have written a set of tools for manipulating these chord symbols in matlab (they don't use any toolkits so I guess they should also work fine in Octave) - if you would like a copy of those then let me know. There will be an updated version of these tools available soon as well.

For more information on the chord symbols, chord tools and transcription process, my long awaited (long awaited by me at any rate...) PhD thesis will include a whole chapter about it all. I hope to submit the thesis sometime around christmas this year. [...]

Sunday, 23 September 2007

One Llama, IMIRSEL, MIREX

One of the interesting things I learned in the recommendation tutorial today is that IMIRSEL launched a startup called One Llama. Seems like they have some ideas on how to make money with MIR technologies. I wonder how many of the MIREX participants were aware of this before submitting their latests implementations to IMIRSEL.

ISMIR Highlight: Recommendation Tutorial

Paul Lamere and Oscar Celma did a wonderful job presenting the recommendation tutorial. I wouldn't be surprised if this turns out to be my personal highlight of ISMIR 2007. They presented an overview of all the standard techniques used for recommendations, they talked about the typical (and unsolved) problems recommenders face, they had plenty of examples, and they also presented results from an evaluation of recommenders. The parts I personally liked best were the in depth analysis of tags and folksonomies, the part they called "novelty and relevance" (with interesting ideas on how to reach deeper into the long-tail), the analysis of artist similarity networks, and the evaluation of recommenders. They also made an interesting point about how nice it would be to have something like a Netflix competition for music recommendation. I'm guessing the slides of the tutorial will be online soon. I highly recommend having a look at them ;-)

I only attended the recommendation tutorial, but I've been told the other tutorials were also really well done. Seems like this year's ISMIR is not only the best in terms of number of papers submitted, number of people attending, best location ever, but also best content ever! ;-)

Btw, Paul's blogging about ISMIR in case you haven't noticed yet. And a number of pictures have already been uploaded to flickr tagged ismir2007.

Saturday, 22 September 2007

Fun things to do with fingerprinting

Erik Frey just posted some fun things he did using Last.fm's fingerprinter. For example, it's really easy to find out if an artist released "live" versions that are identical to the studio version except that some cheering has been added. In terms of false positives and fingerprinting he raises some interesting questions.

Friday, 21 September 2007

The most frequently cited ISMIR paper

I just did a quick Google scholar search to find the most frequently cited ISMIR paper. I'm not sure if I missed any, but it seems the most frequently cited paper is "Mel Frequency Cepstral Coefficients for Music Modeling" (PDF) presented in 2000 by Beth Logan. According to Google scholar it has been cited 127 times as of today. My coauthors and I have cited that paper several times :-)

MFCCs were originally developed in the speech processing community. Back in 2000 it wasn't obvious if the same techniques could just be "copied and pasted" to music information retrieval. MFCCs are now a very standard technique that are being used to compute music similarity, classify genres, identify instruments, segment music, ... In fact, today MFCCs are so common that they are often mentioned in ISMIR papers without citing a source.

Thursday, 20 September 2007

One evening and no testing

It’s been a long and busy day, and it’s taken me a while to go through the flood of emails that landed in my inbox today. Several of those were related to a singing microwave which seems to be at the height of its career.

Another interesting email I found in my inbox explains how the system that scored highest in several MIREX 2007 tasks was built:

“The system was not tuned - in fact it was not tested on any dataset (from the competition or otherwise) beyond making sure it was outputting feature values into its feature files and was in fact cobbled together in one evening.”

Something that has never been tested before, and sounds like a preliminary prototype outperformed them all. Since it hasn't been tweaked yet, the system probably has a very good potential to generalize, and probably can easily be tweaked to add at least another 1 or 2 percentage points accuracy to the genre classification results. That's pretty impressive.

Talking about MIREX I would like to add the following to clarify things I have written in a previous blog post:

I highly value MIREX, it's a driving force behind advances in MIR. I've personally learned a lot from it.

I understand that IMIRSEL has sacrificed a lot to make MIREX happen. It's been an amazing effort organized by Stephen Downie and his team.

I'm sorry my comments on the conflicts of interest issues have been perceived as personal attacks. That was not my intention.

I realize that my previous blog post on the topic should have clearly stated that: I'm fully (and always have been) convinced that no one at IMIRSEL had the intention to cheat. I have absolutely no doubts about that.

However, I'm still fully convinced that IMIRSEL submissions should not be listed in the same ranking as the submission of others.

Tuesday, 18 September 2007

Marsyas & Music Classification

One of the most interesting things I found in the MIREX results so far has been the good performance of George Tzanetakis in different categories. He scored highest in mood classification, and did well in the other classification tasks.

Since George is well known to have published some of the most frequently cited papers on music classification this isn't really interesting news. However, what's really interesting is that George did so well despite using Marsyas. Marsyas is open source and has been around in the MIR community for as long as I can remember. At the ISMIR 2004 and MIREX 2005 evaluations Marsyas didn't do too well (although, afaik it's always been by far the fastest implementation). Perhaps as a results, I've recently been seeing fewer papers on genre classification using Marsyas as baseline. But given the excellent performance this year, I think it's fair to say it has re-established itself as the baseline for any new music classification algorithm. In fact, it has done so well, that I doubt we will see any papers in the near future which can report significant gains compared to this solid baseline. (Btw, never forget to use an artist filter when evaluating genre classification performance!)

Overfitting and MIREX

IMIRSEL (the organizer of MIREX) hasn't officially responded yet to the conflicts of interest of organizing a non transparent evaluation and at the same time participating in it. What I've heard from others is that they don't see any problems with it.

Btw, has anyone else noticed that they won in every classification category where overfitting is a big issue? However, in a very related category (mood classification) where overfitting isn't an issue (thanks the a human component in the evaluation) they were outperformed by several others.

Furthermore, IMIRSEL never had their name put down on the list of potential candidates. Given the lack of transparency of the respective MIREX tasks I think this is something every participant should have known before submitting their work. Btw, so far it isn't even known who the researchers are who actually did the work. AFAIK, no entry so far in the history of ISMIR evaluations has been submitted without mentioning who the authors are.

Btw, as to now, IMIRSEL are the only ones in the genre classification task who haven't published an abstract (describing what their algorithm does and how it was optimized) yet. (They also haven't submitted one yet for the other tasks they won in.)

UPDATE:
Regarding anonymous MIREX submissions I just remembered that at the ISMIR 2004 evaluation hosted by MTG allowed anonymous submissions... and some authors did choose to do so. (However, as I already mentioned in the comments of this post: MTG clearly stated that they did not participate in the tasks they organized to avoid any conflict of interest.)

Monday, 17 September 2007

Vocaloid 2 is a big hit in Japan

Vocaloid 2 is Yamaha software that sings. The $100 software seems to be a big hit now in Japan. Read more about it here and here.

Congratulations to MTG who contributed largely to the development of Vocaloid! Seems like we're a lot closer now to having 5 billion new (anime) songs per week flood the MIR universe.

(Thanks Norman for the pointer!)

MIREX Results Online!

The MIREX results just got posted by Stephen Downie. Interestingly the organizers scored highest in a number of categories. To be honest, if I were a participant in a task like genre classification I’d be a bit suspicious. (Knowing the distribution of the genres before hand can be a huge advantage when designing an algorithm.)

Congratulations to Tim Pohle and Dominik Schnitzer (two very clever PhD students I once worked together with in Vienna) who scored highest in the audio similarity task. I wouldn’t be surprised if they also had one of the fastest implementations. Tim also scored second highest last year in the same task. And Dominik recently made the results of his Master’s thesis available (open source playlist generation).

Congratulations also to Joan Serrà and Emilia Gomez (a former SIMAC colleague) who scored highest in the cover song identification task.

And congratulations to everyone who participated and the organizers for managing to complete all the tasks before ISMIR!

Listening

The best thing about working in MIR research is that it’s part of the job to spend lots of time listening to music. Which makes me realize that I've been working very hard this weekend ;-)

I spent the last hours listening to music listening I found playing around with a recommender. It's one of those recommenders which takes one of my favorite tracks as input and returns a list of similar tracks. Seeing how amazing some of the recommendations are makes me wonder if I’ll ever again bother to browse lists of similar artists to find new music.

620, 10, 5

Michael Fingerhut announced today on the music-ir list that his complete list of ISMIR papers now contains 620 entries (including this year’s papers). That’s an impressive pile of papers that the ISMIR community has produced since 2000... Btw, there have been only 10 papers so far which contained “recommend” in the title. 5 of those will be presented this year... Given that there will also be a tutorial on recommendation, and that I've mostly been blogging about recommendations recently makes me wonder if recommendations is about to establish itself as one of the core topics of MIR?

Monday, 10 September 2007

Good Recommendations (2)

Inspired by Paul’s ongoing evaluation I tried my own tiny little evaluation.

As seed I used Le Volume Courbe. I recently stumbled upon Charlotte while browsing the Last.fm music profiles of friends.

I wanted to find more of the same (unfortunately she only recorded one album), and the most obvious place to start was the Last.fm similar artist list. There’s lots of good music there, but nothing that I enjoyed as much. I also browsed the top listener profiles, and profiles of people who commented on the Last.fm page for Le Volume Courbe. Again I found lots of great music, but not really more of the same.

Next obvious stop was Pandora, but they never heard of Le Volume Courbe before. So I tried iLike but they didn’t know about any similar artists and ZuKool couldn’t help either. MyStrands had a long list, but after sampling the first two on the list I had the impression that they are pointing me in the wrong direction (too much towards electronic music). Amazon had some interesting recommendations (first time I heard about shoegaze) but not really more of the same. And finally my flat mate recommended some great and related music, but also not really more of the same.

So my preliminary verdict is: either there isn't more of the same out there, or the music recommendation services I tried need to be improved.

UPDATE: I just had a look at the AMG similar artist list. There's some interesting recommendations there, some of which I had already stumbled upon while browsing similar artists on Last.fm, but still nothing that's truly more of the same.

UPDATE Part 2: I just tried the AMG Tapestry Demo suggested by Zac in the comments. It's a lot more convenient than browsing the AMG pages, and it's similar to Last.fm's similar artist radio stations (except that it's only 30 second previews). Nevertheless, there were some recommendations on the list that I appreciated (and hadn't found in the AMG list of similar artists). However, somehow the recommendations seem to be missing some of the "darkness" I like about Le Volume Courbe. Anyway, it's great to have so many nice ways of exploring similar artists.

UPDATE Part 3: I just sampled some of the artists on the list Paul posted in the comments. Whatever system he's using, it's doing a great job in surfacing very unknown artists, some of which are even hardly present on Last.fm, and most of which seem to be present on myspace (which makes me wonder if his recommendation machine is gathering information from there?). Again, I failed to find more of the same. However, a number of the recommendations were related (in particular, some were related with respect to the lo-fi, singer-songwriter, DIY aspects I like about Le Volume Courbe), and since none of the recommendations in his list had shown up in any of the previous recommendations I had seen (at least as far as I can remember) it was rather refreshing to hear them. It's really nice to see a recommendation machine that has such a strong emphasis on surfacing rather unknown artist.

UPDATE Part 4: Oscar suggested in the comments to try the hype machine, which I did. I found some interesting comments about Le Volume Courbe there, but didn't really find more of the same. I also tried the much hyped SeeqPod, but they not only failed to find music related to Le Volume Courbe, but also gave some not so trust worthy recommendations for Mozart. Others I tried and that didn't have any results were musicmatch and musicplasma.

UPDATE Part 5: Ian mentioned that ZuKool now has Le Volume Courbe in their catalog. I gave it a quick try by adding all the songs from the album into the list (because I didn't like how when I'd only choose one track all other tracks from the same album would show up in the recommendation list). The results were as refreshing as those from Paul's list (none of them I had previously seen in a recommendation list) and they were interesting to listen to. However, I couldn't find more of the same. Btw, getting Celeste Zepponi's "Jesus Is Here" recommended when searching for similar music to Le Volume Courbe suggests that ZuKool completely ignores socio-cultural information, which makes it an interesting alternative to all other music recommenders I use.

Saturday, 8 September 2007

Good Recommendations

Paul launched a very interesting survey on music recommendations. The results will be presented at their tutorial in 2 weeks at ISMIR and I'm sure presentation slides will be available online after that. I highly recommend participating :-)

Trying to answer the questions I realized how difficult it can be to recommend music given just one artist. Would it be good to recommend someone a rather unknown (= not so popular) artist when they are looking for something similar to an extremely popular artist (like The Beatles)? Or would it be better to recommend similar artists which are also very popular? Btw, in the case of The Beatles, would it really make sense to recommend John Lennon and other members of the group? And if someone is looking for music similar to a not so well known artist, would it make sense to recommend similar but popular artists? Or is it safe to assume that this person already knows these?

Evaluating recommendations is another very interesting topic… and I’m very curious what the outcomes of Paul’s survey will be.

Friday, 31 August 2007

Researchers far ahead of industry

Delete the filter and other playlist generation tools you might have installed on your system, here comes Dominik's open source solution.

Thanks Klaas for the link!

Thursday, 30 August 2007

MIR Related Books

Here’s two books I’m currently reading and which I would like to recommend.

Last night a DJ saved my life
The history of the disc jockey
Bill Brewster and Frank Broughton
Website

I’m still in the late 70s, but it’s been a wonderful voyage through time so far. I think anyone working on DJ-ing algorithms might enjoy this. The book starts out with the impact recorded music had, the first broadcasts, and then dives very deep into the role of the DJ in terms of discovering and distributing rare records, breaking new records, creating new styles of music, and enabling others to have a great time. The book isn’t structured like a research paper, but it’s very well researched.

Net, Blogs and Rock ‘n’ Roll
How digital discovery works and what it means for consumers, creators and culture
David Jennings
Blog, Last.fm Profile

I just started reading this one, but it’s been fun so far. Sometimes it feels a bit like all of the Music 2.0 hype words thrown into a mixer, but beyond that there's lots of interesting information in the book about everything related to Music 2.0. This book might be a great source of inspiration for project proposals or motivations/introductions for research papers on MIR technologies that could somehow fit under the Music 2.0 umbrella.

Most of all I love contrasting the two books with each other. The two worlds couldn’t be more different and yet they are somehow about the exact same thing.

Wednesday, 29 August 2007

Fingerprinting and Music Recommendation

It's a huge step forward for Last.fm:
Audio Fingerprinting for Clean Metadata

Soon Last.fm will stop recommending Roberta Sa because I've listened to Roberta Sá. And once Last.fm's catalog is cleaned up, all Last.fm users will be able to use our massive meta data catalog to fix and extend the tags of their own MP3 collections. Furthermore, the quality of recommendations, neighbors, the radio stations, and just everything should gradually improve as a nice side effect of this major clean up.

Btw, the fingerprint extraction code is open source :-)

Monday, 27 August 2007

Exciting times for music recommendations

The Filter recently secured another round of financing worth USD 5 million. Not too long ago MyStrands secured a massive USD 25 million. And today's job announcement on the music-ir mailing list sounds like BMAT has started to build their own social music recommendation web site, too. Btw, wouldn't it make sense for BMAT and MyStrands to work together? Anyway, that's just a tiny sample of all the companies working on music recommendations, some of the startups seem very promising.

There’s still a very long way to go. But with every step forward, music listeners will find it easier to discover new artists. And artists will find it easier to find an audience. (Btw, if you use Last.fm you might notice some larger steps forward in the next months.)

It's amazing how much has happened since 2001 (when I finished my Master thesis on a related topic). It’s fun to be working in such a dynamic environment. And it's never been easier to discover amazing music.

Sunday, 26 August 2007

ISMIR: Short List of Papers

I just compiled my short list of papers I don’t want to miss at ISMIR 2007 which starts in about 4 weeks. Of course I’m interested in all papers, but if I run out of time while exploring posters, or need to choose between different sessions, I’ll prefer the ones listed here.

Fuzzy Song Sets for Music Warehouses
To be honest, this is just on the list because given the title I don’t have the slightest clue what this paper is about. I know what fuzzy sets are thanks to Klaas. I’m guessing that a music warehouse is a synonym for a digital library of music. I wonder if the second part of the title got lost?

Music Clustering with Constraints
Another title that puzzles me. Seems like titles have been cut off a lot. They forgot to mention according to what they are clustering the music. Number of musical notes in a piece? AFAIK, most clustering algorithms have some form of constraints. For example, in standard k-means the number of clusters is constrained. When using GMMs it is very common to constrain the minimum variance of an individual Gaussian. Anyway, I’m into clustering algorithms, so this could be an interesting presentation.

Sequence Representation of Music Structure Using Higher-Order Similarity Matrix and Maximum-Likelihood Approach
The author of this one has done lots of interesting stuff in the past. I’m curious what he’s up to this time. Music structure analysis is definitely something very interesting that could be very useful in many ways.

Algorithms for Determining and Labelling Approximate Hierarchical Self-Similarity
Again at least one of the authors has have done very interesting stuff in the past and I’m really interested in music structure analysis.

Transposition-Invariant Self-Similarity Matrices
I’m only guessing but this one could be about self-similarity with respect to melody. (I’m guessing that the previous 2 are focusing on self-similarity with respect to timbre or chroma.) Melodic similarity is a lot harder than timbre similarity. I’m curious how they did it.

A Supervised Approach for Detecting Boundaries in Music Using Difference Features and Boosting
If I miss this presentation I might upset my coauthors ;-)

Automatic Derivation of Musical Structure: A Tool for Research on Schenkerian Analysis
I had to Google Schenkerian. It sounds interesting.

Improving Genre Classification by Combination of Audio and Symbolic Descriptors Using a Transcription System
I’m very curious what kind of symbolic descriptors the authors used. Note density? I’ve seen lots of work on audio-based genre classification, and some work on using MIDI (which is usually referred to as symbolic information, but the authors could also mean something very different with symbolic). I’m pretty sure I’ve read at least one article on the combination of audio and MIDI information, but I don’t think I’ve ever seen anyone actually succeed. I’m curious what results the authors got, and I hope they used an artist filter.

Exploring Mood Metadata: Relationships with Genre, Artist and Usage Metadata
Let me guess: pop is usually happy and upbeat, and death metal is rather aggressive? :-) I wonder though what usage metadata is (if people listen to it while driving their cars, working, jogging etc?).

How Many Beans Make Five? The Consensus Problem in Music-Genre Classification and a New Evaluation Method for Single-Genre Categorisation Systems
Single-category classification? I think I’m good at that ;-) (Yes, I know that with single they mean binary classification.) Anyway, I’m curious what the authors say about genre classification and consensus. The authors probably have a very different perspective than I do.

Bayesian Aggregation for Hierarchical Genre Classification
I hope they either compare it to existing techniques, or use evaluation DBs that have been used previously. And I hope they used an artist filter. I’m very curious though what they aggregated.

Finding New Music: A Diary Study of Everyday Encounters with Novel Songs
If I had a very, very short list of papers I wouldn’t want to miss, than this would be on it :-)

Improving Efficiency and Scalability of Model-Based Music Recommender System Based on Incremental Training
Made in Japan, what else is there left to say? ;-)
This would also be on the very, very short list of presentations I wouldn’t want to miss.

Virtual Communities for Creating Shared Music Channels
I’m guessing that this could be really interesting, but I wish the title was more specific. Under the same title one could present, for example, how Last.fm groups and their group radio stations work, or how people get together on Last.fm to tag music to create their own radio stations.

MusicSun: A New Approach to Artist Recommendation
Another title that’s missing lots of information, nevertheless, I won’t skip this one.

Evaluation of Distance Measures Between Gaussian Mixture Models of MFCCs
I’m curious which approaches they tested and how and what their conclusions are.

An Analysis of the Mongeau-Sankoff Algorithm for Music Information Retrieval
Another title that sent me to Google. This time there were only 15 results, none of which did a good job in explaining it to me. Anyway it has MIR in the title, so I think I should have a look.

Assessment of Perceptual Music Similarity
Sounds like a follow-up of the work they presented last year. I’m very curious. I hope they got more than 2 pages in the proceedings. I’d love to read more on this topic.

jWebMiner: A Web-Based Feature Extractor
Sounds like there’s more great software from McGill for everyone to use.

Meaningfully Browsing Music Services
I’ve seen a demo that included Last.fm, so I really can’t miss this one.

Web-Based Detection of Music Band Members and Line-Up
Personally I would be tempted to just use MusicBrainz DB for that. I wonder how much more data the authors could find by crawling the web in general.

Tool Play Live: Dealing with Ambiguity in Artist Similarity Mining from the Web
Artist name ambiguity is an interesting problem, I wonder what solution they are presenting.

Keyword Generation for Lyrics
I’m guessing these are keywords that summarize the lyrics? I wonder if they use some abstraction as well to classify, for example, a song as a love song.

MIR in Matlab (II): A Toolbox for Musical Feature Extraction from Audio
I use Matlab everyday and I don’t think I’ve heard of this toolbox before, sounds interesting.

A Demonstration of the SyncPlayer System
I think I saw a demo of this at the MIREX meeting in Vienna. If I remember correctly the synchronization refers mainly to synchronizing lyrics with the audio but it can do lots of other cool stuff, too.

Performance of Philips Audio Fingerprinting under Desynchronisation
I have no clue what desynchronisation is, but I know that fingerprinting is relevant to what I work on.

Robust Music Identification, Detection, and Analysis
This could be another paper on fingerprinting?

Audio Identification Using Sinusoidal Modeling and Application to Jingle Detection
More fingerprinting fun.

Audio Fingerprint Identification by Approximate String Matching
Seems like fingerprinting has established itself as a research direction again :-)

Musical Memory of the World –- Data Infrastructure in Ethnomusicological Archives
It’s not directly related to my own work, but sounds very interesting.

Globe of Music - Music Library Visualization Using Geosom
A visualization of a music library using a metaphor of geographic maps? I’m curious how using a globe improves the experience.

Strike-A-Tune: Fuzzy Music Navigation Using a Drum Interface
I hope they’ll let me have a try :-)

Using 3D Visualizations to Explore and Discover Music
I believe I’ve seen this demo already, but I never got to try it out myself. I hope the waiting line won’t be too long.

Music Browsing Using a Tabletop Display
If the demo is interesting I’ll forgive them their not very informative title ;-)

Search&Select -– Intuitively Retrieving Music from Large Collections
I like the authors work. I’m very curious what he built this time.

Ensemble Learning for Hybrid Music Recommendation
It has the words music recommendation in the title, and the authors have done some interesting work in the past.

Music Recommendation Mapping and Interface Based on Structural Network Entropy
Another music recommendation paper, I’m guessing this one is about a certain MyStrand visualization. I’m particularly interested in the “structural network entropy” part.

Influence of Tempo and Subjective Rating of Music in Step Frequency of Running
My guess is that tempo has an impact and that this impact is even higher for music I like? But I wouldn’t expect the subjective rating to have a very high impact. I often notice how I start walking to the beats of music I hear even if I don’t like the music.

Sociology and Music Recommendation Systems
Another paper I’d put on the very, very short list :-)

Visualizing Music: Tonal Progressions and Distributions
Sounds great! I should check if they already have some videos online.

Localized Key Finding from Audio Using Nonnegative Matrix Factorization for Segmentation
I’m curious how the author used a nonnegative matrix factorization for this task. I’ve never used one, but I thought they are usually used for mixtures. However, segments (like chorus and instrument solos) are usually not best described as mixtures?

Invited Talk
Sounds like I’ll learn interesting things about copyright, creative commons, and other intellectual property issues involved in music information retrieval.

Audio-Based Cover Song Retrieval Using Approximate Chord Sequences: Testing Shifts, Gaps, Swaps and Beats
I mainly want to know what the author has been up to, but I’m also interested in cover song detection.

Polyphonic Instrument Recognition Using Spectral Clustering
I want to see this one too, but it’s at the same time as the previous paper. The papers use rather similar techniques and deal with rather similar problems. I don’t understand why they were put up to compete with each other. Something non-audio related would have been a much better counter part.

Supervised and Unsupervised Sequence Modelling for Drum Transcription
I wonder how good their drum transcription works. I hope they have lots of demos.

A Unified System for Chord Transcription and Key Extraction Using Hidden Markov Models
Again a paper I really don’t want to miss but it’s at the same time as the one above. There are so many papers that don’t deal with extracting interesting information from audio signals that I absolutely don’t understand why they arranged this parallel session the way they did.

Combining Temporal and Spectral Features in HMM-Based Drum Transcription
I’m not sure if I’ll check out this one or the one below. Both are really interesting.

A Cross-Validated Study of Modelling Strategies for Automatic Chord Recognition in Audio
Sounds like they might have some interesting results.

Improving the Classification of Percussive Sounds with Analytical Features: A Case Study
I must see this one because I recently did some work on drum sounds. I’m curious if the authors include all sorts of percussive instruments (such as a piano) or if it’s drums mainly.

Discovering Chord Idioms Through Beatles and Real Book Songs
I’d love to see this one, too :-(
Don’t get me wrong: I fully support parallel sessions (there isn’t really an alternative given this many oral presentations) but unfortunately the sessions weren’t split in a way that would allow me to see everything I would like to see. Why not put chords and alignment parallel to each other?? To demonstrate my point I won’t list any papers of the alignment session.

Automatic Instrument Recognition in a Polyphonic Mixture Using Sparse Representations
Another strange thing about how the sessions were split is that one parallel session always ends 15 minutes earlier than the other one. Do the organizers expect that everyone from the other session runs to the other session? I’d prefer if all sessions would end at the same time and thus make it easier to find a group to go join for lunch. Anyway, sounds like an interesting paper.

ATTA: Implementing GTTM on a Computer
It’s been a while since I first heard a presentation on GTTM. I guess it’s about time to refresh my knowledge.

An Experiment on the Role of Pitch Intervals in Melodic Segmentation
I have no clue… but segments often have different “local keys”. The chords within keys are usually clearly defined. Each chord has specific pitch intervals… I wonder what experiment they did.

Vivo - Visualizing Harmonic Progressions and Voice-Leading in PWGL
A visualization!

Visualizing Music on the Metrical Circle
Another visualization :-)

Applying Rhythmic Similarity Based on Inner Metric Analysis to Folksong Research
I’m curious how they compute rhythmic similarity. I have seen a lot of work on extracting rhythm information, but haven’t seen much on computing similarities using it.

Music Retrieval by Rhythmic Similarity Applied on Greek and African Traditional Music
Another rhythmic similarity paper :-)

A Dynamic Programming Approach to the Extraction of Phrase Boundaries from Tempo Variations in Expressive Performances
A long time ago I did some work on segmenting tempo variations… I’m curious how they represent tempo (do they apply temporal smoothing?) and how well detecting phrase boundaries works given only tempo. (Why not use loudness as well?)

Creating a Simplified Music Mood Classification Ground-Truth Set
Sounds like this might also be related to the MIREX mood classification task.

Assessment of State-of-the-Art Meter Analysis Systems with an Extended Meter Description Model
I wonder how good state-of-the-art methods work for meter detection.

Evaluating a Chord-Labelling Algorithm
Chord detection is great.

A Qualitative Assessment of Measures for the Evaluation of a Cover Song Identification System
Cover song detection is great, too.

The Music Information Retrieval Evaluation Exchange “Do-It-Yourself” Web Service
Wow! I wonder if they will have a demo ready?

Preliminary Analyses of Information Features Provided by Users for Identifying Music
I have no clue what this one is about, but it’s probably MIREX related.

Finding Music in Scholarly Sets and Series: The Index to Printed Music (IPM)
One of the many things I know nothing about, but it sounds interesting.

Humming on Audio Databases
I wonder if they provide a demo, and if they can motivate people to use it. (It will probably be more fun listening to people sing than see if their system works.)

A Query by Humming System that Learns from Experience
Would be nice to have this one right next to the previous one.

Classifying Music Audio with Timbral and Chroma Features
Another one for the very, very short list. I’m curious how the author combined the features, and if he measured improvements, and if he did artist identification or genre classification (and if he used an artist filter if so).

A Closer Look on Artist Filters for Musical Genre Classification
Sounds like something everyone should be using :-)

A Demonstrator for Automatic Music Mood Estimation
I definitely want to see this demonstration.

Mood-ex-Machina: Towards Automation of Moody Tunes
I wonder what this sounds like.

Pedagogical Transcription for Multimodal Sitar Performance
I wonder if it’s so pedagogical that I can understand it?

Drum Transcription in Polyphonic Music Using Non-Negative Matrix Factorisation
Not sure what’s new here, but I’ll be there to find out.

Tuning Frequency Estimation Using Circular Statistics
No clue what this is about. My best guess would be that it’s related to the pitch corrections I’ve seen in chord transcription systems.

TagATune: A Game for Music and Sound Annotation
Wow another music game! I haven’t heard of this one yet and Google hasn’t either. I’m very curious how it differs from the Listen Game and the MajorMinor game.

A Web-Based Game for Collecting Music Metadata
Would be great if they publish some usage statistics.

Autotagging Music Using Supervised Machine Learning
I’m very curious what results they got.

A Stochastic Representation of the Dynamics of Sung Melody
Another Japanese production :-)

Singing Melody Extraction in Polyphonic Music by Harmonic Tracking
I wonder how high the improvements were by tracking the harmony.

Singer Identification in Polyphonic Music Using Vocal Separation and Pattern Recognition Methods
I wonder how they evaluated this. Did all singers have the same background instruments and sing in the same musical style?

Transcription and Multipitch Estimation Session
I know nothing about multipitch estimation. But I hope to hear some nice demonstrations in the session.

Identifying Words that are Musically Meaningful
I wonder what the most musically meaningful word is. At Last.fm I think it’s “rock”. Another word very high up in the Last.fm ranks is “chillout” :-)

A Semantic Space for Music Derived from Social Tags
I’m curious what their tag space looks like.

The Music Ontology
I don’t know much about ontologies, but it sounds like this is the one and only one for music, so I better not miss it.

Signal + Context = Better Classification
I love this title. I hope the first author will be presenting it.

A Music Information Retrieval System Based on Singing Voice Timbre
I’ll probably be totally exhausted from having seen so many presentations and posters by this time, but I’ll try to reserve some energy to be able to concentrate on this talk.

Poster session 3 (MIREX)
Usually one of the highlights at ISMIR. I hope the MIREX teams manages to have the results ready in time. Only about 4 weeks left to get everything done.

Methodological Considerations in Studies of Musical Similarity
I wish this paper would have been published before I wrote my thesis. But I guess it's never too late to learn :-)

Similarity Based on Rating Data
Sounds like something Last.fm has been doing since years: The ratings are measured based on how often people listen to a song. Then standard collaborative filtering techniques are applied. The results are not too bad. I’m guessing that the authors used very sparse data compared to the data Last.fm has. Another paper I’d put on my very, very short list.

A Study on Attribute-Based Taxonomy for Music Information Retrieval
I wonder if this is similar to Pandora’s music genome project?

Variable-Size Gaussian Mixture Models for Music Similarity Measures
I wonder if and how the author was able to measure significant improvements.

Towards Integration of MIR and Folk Song Research
I like folk music, and I like MIR.

From Rhythm Patterns to Perceived Tempo
I’m curious how they approach this. A rhythm pattern (as defined in music books) does (AFAIK) not have any tempo information and can be played at different tempi. But I’m sure this is an interesting paper :-)

The Quest for Ground Truth in Musical Artist Tagging in the Social Web Era
The title reminds me of one of the more important papers in the short history of ISMIR. Tags are something very subjective, there is no right or wrong. You’ll always find people complaining about how other people mistagged the genre of a song. It will be interesting to see if this paper has the potential to join the ranks of the original ISMIR paper with a similar title.

Annotating Music Collections: How Content-Based Similarity Helps to Propagate Labels
Sounds like something very useful.

A Game-Based Approach for Collecting Semantic Annotations of Music
I hope they’ll present some usage statistics.

Human Similarity Judgments: Implications for the Design of Formal Evaluations
I wonder why this paper isn’t presented before the MIREX panel. Seems like it might contain a lot of information that would be useful for the discussion.

Saturday, 25 August 2007

Justin Donaldson’s Blog

For some reason I only found Justin’s blog today. The first time I heard of Justin was when I stumbled upon the visualizations he implemented at MyStrands. Basically he visualizes songs as little disks which are spread out on the screen according to similarity. He uses a modified version of (2-dimensional) MDS and he implemented a clever way to deal with songs that overlap spatially in the visualization. It's quite different from, for example, the way Paul uses (3-dimensional) MDS to visualize a music space.

Justin’s blog covers different topics, not all of which are related to MIR, but some of his entries are. For example, in this entry he addressed reviewer feedback regarding a paper he submitted on his MyStrands visualization. The blog entry's character is kind of similar to mine on the reviewer feedback I received for the MusicSun paper. Personally, I found it interesting to see that he was confronted with similar feedback as I have in the past: “Why is the visualization more informative/useful to a user than a ranked list of results?” In times where the simple Google style search interface dominates despite having so many nice visual alternatives (like kartOO) there is actually a need to explain this although it seems so obvious sometimes.

It seems to me that publishing a paper on a visual interface for music recommendation is much harder than publishing one about a genre classifier that achieves 100% classification accuracy (which isn't hard if the test set contains the same artists as the training set). I like how Justin implemented his visualization as part of a real music site (instead of just building a research prototype). I wonder if Justin couldn’t have easily conducted a simple evaluation using all the data gathered from that test. (Maybe just simple usage statistics over time?) I wonder if those statistics would be similar to the Alexa statistics for kartOO over the last years. Maybe visualizations that need more than one color and more than one dimension are too complex for music recommendation systems?

Btw, seems like Justin will be giving an interesting talk at the Recommendation Systems 2007 conference (as part of the doctoral symposium) and he’s also presenting a poster at ISMIR 2007.

Monday, 13 August 2007

MusicSun: A New Approach to Artist Recommendation

I previously announced that I’ll be posting reviews I received for a paper my co-author Masataka Goto and I submitted to ISMIR’07. You’ll find them below. Btw, I’d like to see others publish their reviews, too. Then I could decide based on those if I want to read a paper or not. It would be like using Amazon customer reviews when deciding to buy a book or not :-)

I’ve added the paper to my list of publications, there’s a link to the PDF and to the demonstration videos.

The paper was 6 pages long when we first submitted it and accepted as a 4 page version in the proceedings. (For the final version we had to shorten almost everything a bit. The biggest part we dropped was a comparison of the interface to a simplified version which was closer to the simple Google search interface. When reading the reviews it’s important to keep in mind that the reviewers were reading a rather different version of the paper than the one which is online now.) I’ve added my remarks to the reviewers in italic.

I really like receiving feedback to my work, and usually conference and journal reviews are a wonderful source of feedback. However, one thing I missed in the reviews the program chairs sent me was the final paper length each reviewer recommended (there was a field for that in the review form). Maybe next year they could improve this.

Btw, I would like to thank reviewers 1 and 2 as they have helped improve the quality of the paper. Reviewer 3 seems to have forgotten to write something. Reviewer 4 has helped a bit, but pissed me off a bit more. However, others have told me that there is nothing wrong with his or her review. I guess my view on this is not very objective ;-)

REVIEW SUMMARY

Reviewer	R1	R2	R3	R4
Relevance	+++	++	+++	++
Originality	++	++	++	-
Quality	++	++	+++	-
Presentation	+++	+++	+++	+

Legend:
+++ Strong Accept, ++ Accept, + Weak Accept
--- Strong Reject, -- Reject, - Weak Reject

=====================================
Reviewer 1: Detailed comments

Overall: Cool interface, and a decent user study. In general, I would prefer to see a more task-specific evaluation (how long does it take a user to find music they like? do they succeed?) and for specific features how often are they used? But the self-reporting survey is a good start.

It’s always nice when a reviewer starts the review with something positive :-)

Regarding “how long does it take a user to find music they like?”: I think an interface to explore music is closer to a computer game than a tool to get work done. For games measuring how much fun the users are having is way more important than measuring how long it takes to get something done (which is one of the main criteria for tools). Nevertheless, I agree with the reviewer, the evaluation we provided is not as thorough as it could be. (Although this has been by far the most extensive evaluation of a user interface I’ve ever conducted.)

My biggest criticism is that there isn't a true baseline; users are only comparing two different versions of the author's system, and not comparing against, say a simple web search, or a competing tool like collaborative filtering.

This comparison the reviewer is referring to has been removed because we didn’t have enough space.

Regarding baselines: comparing our system against state-of-the-art recommendation engines such as the one from Last.fm wouldn’t have been a fair comparison either. We thought that since the main contributions of our paper are the new interface features, a useful evaluation would have been to remove those new elements and see what the users think. I’d be very interested to get some more advice on how to better evaluation user interfaces to explore music.

Specific comments:

- “music” and “review” as constraints for finding relevant web docs: for many ambiguous band names, this doesn't perform very well. For instance, for the band “Texas”, the query “texas music review” brings up many irrelevant pages. this is a hard problem, and for research systems probably not too important to worry about, but it may be worth mentioning.

Excellent point. My only excuse is that we didn’t have enough space to discuss everything.

- how do you generate the vocabulary lists? if it's just and ad-hoc manual process, please mention that, and perhaps suggest other ways to make it more principled and to evaluate changes to these lists.

Good question. I think we’re a bit more specific on this in the 4 page version, but maybe it got squeezed out at the end. Automatically generation such vocabularies would be really interesting. But I wouldn’t know how to do that (without using e.g. Last.fm tag data).

- Similarly, why did you choose 4 vocabularies vs some other number?

I think we didn’t have space to explain this in the 4 page version.

The number of vocabularies is more or less random. The vocabularies are based on previous work (we used them in an ISMIR’06 paper, and before that in an ECDL’05 paper). However, there’s an upper limit on the number of vocabularies that would make sense to use, and the users seemed fine with the 4 we used. (Of course it would be really nice to adapt it to different languages as well.)

- not all readers will know what “tf-idf” is. please explain or provide a reference.

We added a more explicit reference in the final version. I find things like these really hard to notice. I talk about tfidf all the time and just started assuming that the whole world talks about tfidf the whole time, too. :-)

- table 3: unnecessarily confusing to introduce the “L” and “R” just call them Easy and Hard, and explain that you grouped the top/bottom 3 points in a 7-point scale. and instead of “?”, label it “No Answer” or something

Again something that’s hard to notice if you are too deep into the material. Thanks to this reviewers comment the respective table should be more understandable.

- rather than the self-reported results about which optional features the user found useful, i think a better eval would be a count of actually how many times the user used them.

Actually the usage of features wasn’t something the users reported themselves. I was sitting next to them and taking notes while they were using the interface. Thinking of it now, I realize that maybe even in the final version this might not be clear enough :-/
However, automatically counting how often functions were used would have been better. Unfortunately, I didn’t store the log files for all users because I had some technical problems (and I though that making notes while watching them use the interface would be sufficient). Next time...

=====================================
Reviewer 2: Detailed comments

This is a well written paper... and it demonstrates a nice system for showing users new songs.

Again, it’s really nice when a reviewer starts a review with something positive.

This paper largely walks through one design, a very nice looking design, but how does this system compare to other systems? I found the evaluation in this paper weak.

How does the part of this work that combines recommendations compare to the work of Fagin (Combining Fuzzy Information From Multiple Systems )? Would that be a better approach?

I’m not familiar with Fagin. (But fuzzy combinations sound interesting.) Regarding the reviewer’s criticism of the evaluation I guess they are in line with Reviewer 1.

UPDATE: Klaas has posted a link to a really nice introduction to aggregation operators in the comments.

I really wanted to know which similarity approach worked best. This paper doesn't address that issue.

This was beyond the scope of our paper. But it surely would have been very interesting to evaluate the different individual similarities we used :-/

Testing UI design is hard.. one needs a task and then lots of users. Can you do this?

Yes, it is hard :-)
And no, it doesn’t seem like we did a good job :-/

=====================================
Reviewer 3: Detailed comments

Unfortunately this reviewer didn't explain why he or she gave us such high scores.

=====================================
Reviewer 4: Detailed comments

The paper addresses an interesting issue, recommendation systems and interfaces to support them. I found the idea of using multiple information sources very interesting, and potentially useful.

Again, it’s always nice when a reviewer finds something positive to start with. However, the idea of using multiple information sources for recommendations isn’t new, and I don’t think my co-author and I can take the credit for it. And I don't understand how someone can say that combing different sources of information is only "potentially" useful. Even if I close both of my eyes I can clearly see that there's no way around that :-)

The major problem that I have with the paper is the experimental
design: I am not quite sure what is being evaluated. Is it the
recommendation system interface or the underlying software used to
create the recommendations? Is it the recommendation system interface or the underlying software used to create the recommendations? If it is the former, which I think it is, then the design of the experiment seems to confound many issues.

I think it's difficult to separate the two. It’s not really possible to evaluate the user interface without considering limitations of the underlying recommendation system. The way the user interface deals with these limitations and presents these shortcomings to the user is a very critical aspect of systems using state-of-the-art content-based algorithms. This is something we explicitly dealt with in the long version of the paper and briefly mention in the short version (e.g. the indicators for how reliable the system thinks the recommendations are). Furthermore, the recommendation system and the interface are very closely linked to each other (e.g. the way the users are given the option to adjust the aspect of similarity they are most interested in).
But then again, as Reviewer 1 and 2 have already pointed out, there are limits to the evaluation we present in our paper.

For example, the authors do not control for user expertise, nor do
they control for system issues (e.g., the database not being large
enough to provide a user with the song he or she is seeking).

We gathered and analyzed lots of statistics on the users expertise (in terms of using computers, music interfaces, and musical knowledge), music taste, and general music listening & discovery habits, but didn’t include everything because we ran out of space. Nevertheless, we allocated some space in the 4 page version to describe the participants with more detail.

Moreover, conclusions like, and I'm paraphrasing, "the users say they would use it again" are, for the most part, without any normative value.

We never claimed that such a conclusion is normative. Of course we measured user satisfaction in different ways (direct and indirect), but most of the evaluation part of our paper deals with parts of the interface we thought users would like/understand/use but (surprisingly) didn’t. We believe the contribution of our evaluation is to point out a number of directions for future work.

Under what circumstance would they use it (e.g., if they were paid to evaluate it)? It is a stretch to conclude they would use the system if it were part of Amazon-- part of Amazon in what way; in comparison to what; etc.?

I’m really confused by the reviewer’s remarks. Just because (when asked) the users said they would like to use it doesn’t mean that users would really use it. And we never drew this conclusion. Instead we’ve pointed out several (in the longer version even more) limitations of the interface. We never tried to market MusicSun as a finished system, but rather as a prototype from which there is something to learn from.

In addition, the paper is rife with typos and stylistic problems
(e.g., citations are not a part of speech), and the reference section
relies quite heavily on the authors' own work."

It would have been nice if the reviewer would have explicitly mentioned that this is not why he voted for a weak reject. Furthermore, there’s nicer ways of putting this. Neither my co-author nor I are native speakers. It would have been more helpful if the reviewer would have pointed out some of the typos.

Regarding the self-citation: we cited everything we thought was relevant to understand the work we presented. Most of the interface is built on techniques we previously used. We didn’t have room to describe everything, so we referenced it instead. It would have been more helpful if the reviewer would have pointed us to references that are missing, or unnecessary.

=====================================

Btw, check out these links my colleague Norman Casagrande pointed me to (both from the legendary Phd comics series):
- Paper review worksheet
- Addressing reviewers comments

Sunday, 29 July 2007

ISMIR 8.0

Paris in 2002, Barcelona in 2004, London in 2005, and now finally Vienna! For the fourth time in its history the International Sconference on Music Information Retrieval (ISMIR) will be held in Europe. Needless to say that ISMIR in Vienna will be the best ever. Following a new all time high in the number of submitted papers rumors are quickly spreading that tickets are already almost sold out (just like in London 2005). Seems like everyone wanted to get the early registration bonus (which ends on the 31st of July). It also seems like two of my Last.fm colleagues and I have been lucky enough to grab some of the last tickets for the highly anticipated tutorial on music recommendation given by Paul Lamere and Oscar Celma. However, the other three tutorials are just as exciting, I wouldn’t be surprised if the organizers will need to find bigger lecture rooms.

There are so many reasons why ISMIR in Vienna will be the best ever that I could spend the rest of my life writing them down. Two of the main reasons are that it’s the right place and the right time.

It’s the right place because Vienna is the most beautiful city in the world with the highest living standards (and yet affordable prices). Vienna has a rich history in music, and Vienna is located in the heart of Europe which is currently one of the leading forces in music information retrieval research.

It’s the right time because music information retrieval has never been more exciting. The whole music industry is just about to undergo massive transformations driven by technological changes. Music consumption habits of the younger generation have already changed drastically, and music is surrounding us like never before.

> Join the ISMIR 2007 group on Facebook here.

> Read about ISMIR 2007 on Paul’s blog here and here.

Sunday, 15 July 2007

More PhDs

Hirokazu Kameoka recently completed his PhD thesis on “Statistical Approach to Multipitch Analysis” (PDF) at the University of Tokyo. Hirokazu is now working at NTT Communication Science Laboratories, Media Information Laboratory.

It’s great to see another thesis written in English emerge from Japan. Btw, so far 5 out of 10 who completed their MIR related PhD thesis this year are now living and working in Japan.

In addition, it seems that Don Byrd and Tim Crawford have pointed me to the very first PhD thesis mentioning MIR in the title. It dates back to 1988!

[Email from Tim on 11 June 2007]

Don and Elias,

I have the thesis in front of me:

Stephen Dowland Page, 'Computer Tools for Music Information Retrieval', thesis submitted for the degree of Doctor of Philosophy, Oxford, New College, 1988. 252 pp. (Available from the Bodleian Library, Oxford.)

Basically it describes a simple encoding, matching and query specification method for monophonic symbolic searching, which uses a query language based on regular expressions. He never got as far as polyphonic queries, though he had hoped to do so (no surprise there!). The introductory chapters, on the nature of the problem and on previous work, are - I would say - still of value, although the methods he describes would need a lot of work to make them useful for polyphonic music. Full code for the search engine is given (in Modula 2!). The ~40-page bibliography is useful for listing a lot of early work - though a vague reference to "Kassler's series of radio talks on the subject" is frustrating, since he gives no further details!

NB, for Don:
There is a sheet at the beginning of my copy listing copies supplied up to the day I ordered it from the Bodleian Library. Second on the list is Indiana University Libraries, Bloomington, IN 47405: 20 June 1989. (My copy was supplied on 10th January 1991.)

Tim

(The list of MIR related PhDs has been updated.)

Wednesday, 11 July 2007

The Future of the Paper Industry

Google scholar is a wonderful way to measure the quality of scientific papers based on the number of citations. I actually find the number of citations so interesting that I display it in my list of publications (the yellow bars on the left correspond to citations).

I learned, for example, that one of my papers (which I thought was one of my better ones) never got cited (I'm guessing that's because there’s not too many people working on organizing drum sample libraries). In addition, I believe it might be useful for someone browsing my list of publications to identify which papers are probably more readable. But I'm not sure about the usability in its current form.

I also just learned that somehow my PhD thesis has made it into the top 30 results for the search term music information retrieval! :-)

Anyway, the point is that Google scholar as a form of evaluating the quality of papers is highly insufficient: the delay between having a final version of the paper and the point where reliable quality estimates can be made is way too large (often taking several years).

Given the limitations of Google scholar I’ve been thinking about what the future of the paper industry should be like (and in particular how the quality of a paper could be measured more quickly), and here are some things I’d love to see:

1. Papers should be publicly reviewed similar to the book reviews by Amazon’s customers. (Including the option to rate the usefulness of reviews.)

2. Researchers, research projects, and research teams should have blogs in which they present their findings and encourage open discussions and criticism of their work.

3. Researchers (and in particular students) should post their ideas on public sites to get instant feedback from peers (and document who came up with the idea first).

And here’s how I plan to contribute to the future:

I plan to publish some comments from reviewers, and my response for the ISMIR'07 paper for which I’m first author of (including a link to the paper and the demonstration video).

I plan to write reviews on papers that I read in the future. Maybe I’ll do so for some ISMIR'07 papers that are already online.

However, I’m also planning to spend most of my time in the next weeks helping improve how Last.fm's radio stations listen to their listeners. So it might take me a while to get my "Paper Industry 2.0" contributions started :-)

Btw, using A/B tests to evaluate algorithms on radio stations beats number of citations any time :-)
(And if you like A/B tests, you’re probably also a fan of Greg Linden’s blog.)

UPDATE:
I just read the two links posted in the comments and they are great!
The first link posted by Paul basically talks about how the Nature community is still very old school. The second link posted by Chris talks about how things will need to change in the future and why so many researchers are still very old school.

Saturday, 30 June 2007

C4DM’s Listening Room Celebration

Last Thursday C4DM celebrated their new listening room with an open air concert. The performances included classical guitar duets, modern arrangements of Persian music, beat boxing, electronic music, and lots of great singing. It’s amazing how many excellent musicians they got in their team. Mark Sandler mentioned that they might organize another concert next year.

Wednesday, 27 June 2007

Interrupting the Silence

There is an unwritten rule that blogs should be updated frequently... and I’ve been silent for over two weeks now. I’ve even considered discontinuing this blog. The main reason is that I enjoy what I’m currently doing too much to even think about writing a new entry. I guess it’s a side effect of working for the most exciting music 2.0 service in the world :-)

If I’d try to write a paragraph about what it is that currently fascinates me so much I would probably include words such as: music recommendations, music similarity based on audio analysis & collaborative filtering & tags associated with music, computing audio similarity on millions of tracks, playlist generation, connecting users with users with similar tastes, millions of users, scalability issues, evaluation procedures, Hadoop, LSH and other clever ways to deal with lots of data, for example, I only recently learned that an algorithm as simple as one that efficiently computes an intersection can actually be something very interesting :-)

Furthermore, I not only get to work on fascinating problems, I’m also very lucky to be surrounded by lots of very clever and highly motivated colleagues!

Btw, if you’re a frequent Last.fm user you might have noticed some improvements of the weekly recommendations recently... Norman did a great job on that, and he’s got some further significant improvements lined up.

And of course London itself is another reason why I hardly find time to put something on this blog :-)

Despite all these reasons not to write, there is a reason to write. Webcasters in the US have been demonstrating today what might happen if the recent CRB ruling (dramatic increase in royalties) remains unchanged. Last.fm has decided not to join the protest (for an official statement read this). However, there have been ongoing discussions at the office between those who think we should join the struggles of the small webcasters (after all not too long ago Last.fm was one of them), and those arguing that our user’s are already pissed off (because we’ve had way too many technical problems recently and that we really shouldn’t upset them anymore), and those arguing that all of this won’t help us advance the social music revolution. Anyway, it’s good to see that the day of silence has apparently awakened lots of listeners (as I conclude from these comments on digg). There should be a lot more public discussions on how to compensate artists.

Sunday, 10 June 2007

Digital Libraries of Music

Last Wednesday I enjoyed a talk by David Bainbridge at the C4DM about the Greenstone digital library project. David and his team have been developing it for 12 years, and they seem to have been well ahead of their time. The talk inspired me to start dreaming of the ultimate capoeira music digital library. Thousands of capoeira songs and their variations and interpretations (from all well known masters from various capoeira schools) available at a single click, full length audio, rhythm and melody annotations, lyrics and their translations. It would also be great to have wiki style discussions of the meanings and origins of the respective lyrics, links to similar songs (similar with respect to the style, lyrics, melody, rhythm), pictures of people who created them, or even videos of them performing them, …

Even though capoeira music is just a tiny subset of music in general it seems like putting all this information together would be a huge project. The hardest part would probably be to gather the recordings, if they exist at all. As far as I know many songs have already died because they were never recorded or written down :-/

Friday, 1 June 2007

MusicBrainz

If I were in academia and about to set up a DB for my own evaluations the first thing I’d do is to obtain the MusicBrainz IDs for each track. In the long run it seems like the best way to have a common basis for sharing annotations and other metadata. Even Last.fm will (hopefully soon) support those IDs and then it will be really easy to access the correct tag data (and all the other metadata) for the corresponding tracks. Btw, I heard McGill has already started working with them.

Wednesday, 30 May 2007

Last.fm and CBS

Last.fm is going to be working together with CBS, which is known for great TV shows such as CSI. Announcements will follow. Seems like the social music revolution got some serious tailwinds now.

Update: Check out what BBC says about it.

Another Update: Check out blog.last.fm. You'll also find the full story about my recent late night experience at the last.fm office.

Tuesday, 29 May 2007

Risking our lives for the social music revolution at Last.fm

This one is a bit off topic, but at least it's work related :-)

Today (Monday 28 May a bank holiday in London), around midnight I was alone at Last.fm’s office going through some comments I’ve received from reviewers (for a journal publication on computational models of similarity for drum sounds). It was nice and quiet and peaceful...

At Last.fm we have an IRC channel to communicate; we are always logged on when we are online (also when we’re not in the office). Here’s an excerpt from the IRC chat: (note that all of us have set highlights to important words such as “beer” or “pub” and some other words)

<elias> i think someone just tried to kick the last.fm office door open :-/
<elias> pub beer porn
<elias> ...
<russ> inner one or outer one?
<elias> I'm inside
<elias> there was a hell of a noise at the door
<elias> i went there, saw that part of the door closing system has come off, and there seems to be a hughe crack in the upper part, I'm not sure if that has been there before
<russ> I don't remember any huge cracks before
<elias> I guess I should have a look outside? ... are there any weapons around? :-)
<russ> this is the white one, not the black one right?
<elias> yes
<muz> :/
<russ> there's an electric drill and a pocket knife by my desk

To find out how the storied continued (obviously I have survived), check out blog.last.fm tomorrow.

Monday, 28 May 2007

Lyrics

Given the lyrics of a song, algorithms can relatively easily figure out if it’s a Christmas song, a love song, if it contains explicit lyrics, or simply just identify the language it’s sung in. Such algorithms could be used to improve music discovery, recommendation, and playlist generation.

Personally I think lyrics are one of the most important parts of a song. If I don’t like the lyrics, I won’t like the song. I’d love to have lyrics on my display while listening to a song, I’d love to be able to search or listening to music with similar lyrics, there is so much that would be so easy to do if the lyrics would be available, but they are not.

There is an interesting article in the WSJ by Jason Fry on this, which I stumbled upon via this post on the Lefstez letter (which also contains some interesting thoughts on the topic). Sometimes I just don't understand how far detached some parts of the music industry have become from the artists and their fans. And I also wonder how much longer those who are battling to enforce ancient business models will survive.

CrestMuse Webpage & Video

CrestMuse (the currently biggest MIR related research project in Japan) launched an English version of their project webpage about a month ago. I believe the best part of the web page is the CrestMuse symposium video. The video has been produced very professionally, offers a nice summary of the many aspects of the project, features a native English female narrator, and it even briefly features myself or at least the back of my head (IIRC). Thus, it is well worth the long wait for the download (250MB Mpeg4). My download window says it will take 5 hours until I the download is complete :-/

One of the impressions I got from Japanese researchers is that they are masters of using videos to communicate their work. Wouldn't it be nice if Omras2 would also create similar videos to communicate their work? :-)
And I guess SIMAC's only flaw was that we didn't have such fancy videos.

Sunday, 27 May 2007

Music Similarity: G1C Implementation

I’ve been planning to do this for over a year: The MA (Music Analysis) Toolbox for Matlab now finally includes the G1C implementation which I described in my thesis (btw, the code probably also runs in the freely available Scilab in case you don’t have Matlab). The code is packaged as required for the MIREX’06 evaluation, where the implementation was overall fastest and scored highest (but not significantly better than other submissions).

The code might be useful for those who are new to the field and just want a quick start. Btw, last October I held a presentation on music similarity which might also be helpful for starters and the best documentation and explanation of the code I can offer is my thesis.

I also hope the implementation is somehow useful for those interested in comparing their work on computational models of music similarity to work by others. I believe the best option to do so is to conduct perceptual tests similar to those I conducted for my thesis and those done for MIREX’06 (btw, I wrote some comments about the MIREX’06 evaluation here).

A much easier approach to evaluate many different algorithms is to use a genre classification scenario (assuming that pieces from the same genre are generally more similar to each other than pieces from different genres). However, this doesn’t replace perceptual tests it just helps pre-select the algorithms (and their parameters). Btw, I think it would even be interesting for those working directly on genre classification to compare G1C (combined with a NN classifier) against their genre classification algorithms.

There are lots of things to be careful about when running evaluations based on genre classes (or other tags associated with music). Most of all I think everyone should be using an artist filter: The test set and the training set shouldn’t contain music from the same artists. Some previous work reported accuracies of up to 80% for genre classification. I wouldn’t be surprised to see some of those numbers drop to 30% if an artist filter had been applied.

I first noticed the impact of an artist filter when I was doing some work on playlist generation. In particular, I noticed that songs from the same artist appeared very frequently in the top 20 most similar lists for each song, which makes sense (because usually pieces by the same artists are somehow similar). However, some algorithms which were better than others in identifying songs from the same artists did not necessarily perform better in finding similar songs from other artists. I reported the differences in the evaluation at ISMIR’05, discussed them again in my MIREX'05 submission, and later in my thesis. An artist filter was also used for the MIREX’06 evaluation. Btw, I’m thankful to Jean-Julien Aucouturier (who was one of the reviewers of that ISMIR’05 paper) for some very useful comments on that. His thesis is highly relevant for anyone working on computation models of music similarity.

Another thing to consider when running evaluations based on genre classes is to use different music collections with different taxonomies to measure overfitting. For example, one collection could be the Magnatune ISMIR 2004 training set and one could be the researcher’s private collection. It can easily happen that a similarity algorithm is overfitted to a specific music collection (I demonstrated this in my thesis using a very small collection). Although I was careful to avoid overfitting, G1C is slightly overfitted to the Magnatune collection. Thus, even if G1C outperforms an algorithm on Magnatune, the other algorithm might still be much better in general.

There’s some room for improvements of this G1C implementation in terms of numerical issues, and some parts can be coded a lot more efficiently. However, I’d recommend trying something very different. Btw, I recently noticed how much easier it is to find something that works much better when having lots of great data. I highly recommend using Last.fm’s tag data for evaluations, there’s even an API.