MIR Research: August 2007

Friday, 31 August 2007

Researchers far ahead of industry

Delete the filter and other playlist generation tools you might have installed on your system, here comes Dominik's open source solution.

Thanks Klaas for the link!

Thursday, 30 August 2007

MIR Related Books

Here’s two books I’m currently reading and which I would like to recommend.

Last night a DJ saved my life
The history of the disc jockey
Bill Brewster and Frank Broughton
Website

I’m still in the late 70s, but it’s been a wonderful voyage through time so far. I think anyone working on DJ-ing algorithms might enjoy this. The book starts out with the impact recorded music had, the first broadcasts, and then dives very deep into the role of the DJ in terms of discovering and distributing rare records, breaking new records, creating new styles of music, and enabling others to have a great time. The book isn’t structured like a research paper, but it’s very well researched.

Net, Blogs and Rock ‘n’ Roll
How digital discovery works and what it means for consumers, creators and culture
David Jennings
Blog, Last.fm Profile

I just started reading this one, but it’s been fun so far. Sometimes it feels a bit like all of the Music 2.0 hype words thrown into a mixer, but beyond that there's lots of interesting information in the book about everything related to Music 2.0. This book might be a great source of inspiration for project proposals or motivations/introductions for research papers on MIR technologies that could somehow fit under the Music 2.0 umbrella.

Most of all I love contrasting the two books with each other. The two worlds couldn’t be more different and yet they are somehow about the exact same thing.

Wednesday, 29 August 2007

Fingerprinting and Music Recommendation

It's a huge step forward for Last.fm:
Audio Fingerprinting for Clean Metadata

Soon Last.fm will stop recommending Roberta Sa because I've listened to Roberta Sá. And once Last.fm's catalog is cleaned up, all Last.fm users will be able to use our massive meta data catalog to fix and extend the tags of their own MP3 collections. Furthermore, the quality of recommendations, neighbors, the radio stations, and just everything should gradually improve as a nice side effect of this major clean up.

Btw, the fingerprint extraction code is open source :-)

Monday, 27 August 2007

Exciting times for music recommendations

The Filter recently secured another round of financing worth USD 5 million. Not too long ago MyStrands secured a massive USD 25 million. And today's job announcement on the music-ir mailing list sounds like BMAT has started to build their own social music recommendation web site, too. Btw, wouldn't it make sense for BMAT and MyStrands to work together? Anyway, that's just a tiny sample of all the companies working on music recommendations, some of the startups seem very promising.

There’s still a very long way to go. But with every step forward, music listeners will find it easier to discover new artists. And artists will find it easier to find an audience. (Btw, if you use Last.fm you might notice some larger steps forward in the next months.)

It's amazing how much has happened since 2001 (when I finished my Master thesis on a related topic). It’s fun to be working in such a dynamic environment. And it's never been easier to discover amazing music.

Sunday, 26 August 2007

ISMIR: Short List of Papers

I just compiled my short list of papers I don’t want to miss at ISMIR 2007 which starts in about 4 weeks. Of course I’m interested in all papers, but if I run out of time while exploring posters, or need to choose between different sessions, I’ll prefer the ones listed here.

Fuzzy Song Sets for Music Warehouses
To be honest, this is just on the list because given the title I don’t have the slightest clue what this paper is about. I know what fuzzy sets are thanks to Klaas. I’m guessing that a music warehouse is a synonym for a digital library of music. I wonder if the second part of the title got lost?

Music Clustering with Constraints
Another title that puzzles me. Seems like titles have been cut off a lot. They forgot to mention according to what they are clustering the music. Number of musical notes in a piece? AFAIK, most clustering algorithms have some form of constraints. For example, in standard k-means the number of clusters is constrained. When using GMMs it is very common to constrain the minimum variance of an individual Gaussian. Anyway, I’m into clustering algorithms, so this could be an interesting presentation.

Sequence Representation of Music Structure Using Higher-Order Similarity Matrix and Maximum-Likelihood Approach
The author of this one has done lots of interesting stuff in the past. I’m curious what he’s up to this time. Music structure analysis is definitely something very interesting that could be very useful in many ways.

Algorithms for Determining and Labelling Approximate Hierarchical Self-Similarity
Again at least one of the authors has have done very interesting stuff in the past and I’m really interested in music structure analysis.

Transposition-Invariant Self-Similarity Matrices
I’m only guessing but this one could be about self-similarity with respect to melody. (I’m guessing that the previous 2 are focusing on self-similarity with respect to timbre or chroma.) Melodic similarity is a lot harder than timbre similarity. I’m curious how they did it.

A Supervised Approach for Detecting Boundaries in Music Using Difference Features and Boosting
If I miss this presentation I might upset my coauthors ;-)

Automatic Derivation of Musical Structure: A Tool for Research on Schenkerian Analysis
I had to Google Schenkerian. It sounds interesting.

Improving Genre Classification by Combination of Audio and Symbolic Descriptors Using a Transcription System
I’m very curious what kind of symbolic descriptors the authors used. Note density? I’ve seen lots of work on audio-based genre classification, and some work on using MIDI (which is usually referred to as symbolic information, but the authors could also mean something very different with symbolic). I’m pretty sure I’ve read at least one article on the combination of audio and MIDI information, but I don’t think I’ve ever seen anyone actually succeed. I’m curious what results the authors got, and I hope they used an artist filter.

Exploring Mood Metadata: Relationships with Genre, Artist and Usage Metadata
Let me guess: pop is usually happy and upbeat, and death metal is rather aggressive? :-) I wonder though what usage metadata is (if people listen to it while driving their cars, working, jogging etc?).

How Many Beans Make Five? The Consensus Problem in Music-Genre Classification and a New Evaluation Method for Single-Genre Categorisation Systems
Single-category classification? I think I’m good at that ;-) (Yes, I know that with single they mean binary classification.) Anyway, I’m curious what the authors say about genre classification and consensus. The authors probably have a very different perspective than I do.

Bayesian Aggregation for Hierarchical Genre Classification
I hope they either compare it to existing techniques, or use evaluation DBs that have been used previously. And I hope they used an artist filter. I’m very curious though what they aggregated.

Finding New Music: A Diary Study of Everyday Encounters with Novel Songs
If I had a very, very short list of papers I wouldn’t want to miss, than this would be on it :-)

Improving Efficiency and Scalability of Model-Based Music Recommender System Based on Incremental Training
Made in Japan, what else is there left to say? ;-)
This would also be on the very, very short list of presentations I wouldn’t want to miss.

Virtual Communities for Creating Shared Music Channels
I’m guessing that this could be really interesting, but I wish the title was more specific. Under the same title one could present, for example, how Last.fm groups and their group radio stations work, or how people get together on Last.fm to tag music to create their own radio stations.

MusicSun: A New Approach to Artist Recommendation
Another title that’s missing lots of information, nevertheless, I won’t skip this one.

Evaluation of Distance Measures Between Gaussian Mixture Models of MFCCs
I’m curious which approaches they tested and how and what their conclusions are.

An Analysis of the Mongeau-Sankoff Algorithm for Music Information Retrieval
Another title that sent me to Google. This time there were only 15 results, none of which did a good job in explaining it to me. Anyway it has MIR in the title, so I think I should have a look.

Assessment of Perceptual Music Similarity
Sounds like a follow-up of the work they presented last year. I’m very curious. I hope they got more than 2 pages in the proceedings. I’d love to read more on this topic.

jWebMiner: A Web-Based Feature Extractor
Sounds like there’s more great software from McGill for everyone to use.

Meaningfully Browsing Music Services
I’ve seen a demo that included Last.fm, so I really can’t miss this one.

Web-Based Detection of Music Band Members and Line-Up
Personally I would be tempted to just use MusicBrainz DB for that. I wonder how much more data the authors could find by crawling the web in general.

Tool Play Live: Dealing with Ambiguity in Artist Similarity Mining from the Web
Artist name ambiguity is an interesting problem, I wonder what solution they are presenting.

Keyword Generation for Lyrics
I’m guessing these are keywords that summarize the lyrics? I wonder if they use some abstraction as well to classify, for example, a song as a love song.

MIR in Matlab (II): A Toolbox for Musical Feature Extraction from Audio
I use Matlab everyday and I don’t think I’ve heard of this toolbox before, sounds interesting.

A Demonstration of the SyncPlayer System
I think I saw a demo of this at the MIREX meeting in Vienna. If I remember correctly the synchronization refers mainly to synchronizing lyrics with the audio but it can do lots of other cool stuff, too.

Performance of Philips Audio Fingerprinting under Desynchronisation
I have no clue what desynchronisation is, but I know that fingerprinting is relevant to what I work on.

Robust Music Identification, Detection, and Analysis
This could be another paper on fingerprinting?

Audio Identification Using Sinusoidal Modeling and Application to Jingle Detection
More fingerprinting fun.

Audio Fingerprint Identification by Approximate String Matching
Seems like fingerprinting has established itself as a research direction again :-)

Musical Memory of the World –- Data Infrastructure in Ethnomusicological Archives
It’s not directly related to my own work, but sounds very interesting.

Globe of Music - Music Library Visualization Using Geosom
A visualization of a music library using a metaphor of geographic maps? I’m curious how using a globe improves the experience.

Strike-A-Tune: Fuzzy Music Navigation Using a Drum Interface
I hope they’ll let me have a try :-)

Using 3D Visualizations to Explore and Discover Music
I believe I’ve seen this demo already, but I never got to try it out myself. I hope the waiting line won’t be too long.

Music Browsing Using a Tabletop Display
If the demo is interesting I’ll forgive them their not very informative title ;-)

Search&Select -– Intuitively Retrieving Music from Large Collections
I like the authors work. I’m very curious what he built this time.

Ensemble Learning for Hybrid Music Recommendation
It has the words music recommendation in the title, and the authors have done some interesting work in the past.

Music Recommendation Mapping and Interface Based on Structural Network Entropy
Another music recommendation paper, I’m guessing this one is about a certain MyStrand visualization. I’m particularly interested in the “structural network entropy” part.

Influence of Tempo and Subjective Rating of Music in Step Frequency of Running
My guess is that tempo has an impact and that this impact is even higher for music I like? But I wouldn’t expect the subjective rating to have a very high impact. I often notice how I start walking to the beats of music I hear even if I don’t like the music.

Sociology and Music Recommendation Systems
Another paper I’d put on the very, very short list :-)

Visualizing Music: Tonal Progressions and Distributions
Sounds great! I should check if they already have some videos online.

Localized Key Finding from Audio Using Nonnegative Matrix Factorization for Segmentation
I’m curious how the author used a nonnegative matrix factorization for this task. I’ve never used one, but I thought they are usually used for mixtures. However, segments (like chorus and instrument solos) are usually not best described as mixtures?

Invited Talk
Sounds like I’ll learn interesting things about copyright, creative commons, and other intellectual property issues involved in music information retrieval.

Audio-Based Cover Song Retrieval Using Approximate Chord Sequences: Testing Shifts, Gaps, Swaps and Beats
I mainly want to know what the author has been up to, but I’m also interested in cover song detection.

Polyphonic Instrument Recognition Using Spectral Clustering
I want to see this one too, but it’s at the same time as the previous paper. The papers use rather similar techniques and deal with rather similar problems. I don’t understand why they were put up to compete with each other. Something non-audio related would have been a much better counter part.

Supervised and Unsupervised Sequence Modelling for Drum Transcription
I wonder how good their drum transcription works. I hope they have lots of demos.

A Unified System for Chord Transcription and Key Extraction Using Hidden Markov Models
Again a paper I really don’t want to miss but it’s at the same time as the one above. There are so many papers that don’t deal with extracting interesting information from audio signals that I absolutely don’t understand why they arranged this parallel session the way they did.

Combining Temporal and Spectral Features in HMM-Based Drum Transcription
I’m not sure if I’ll check out this one or the one below. Both are really interesting.

A Cross-Validated Study of Modelling Strategies for Automatic Chord Recognition in Audio
Sounds like they might have some interesting results.

Improving the Classification of Percussive Sounds with Analytical Features: A Case Study
I must see this one because I recently did some work on drum sounds. I’m curious if the authors include all sorts of percussive instruments (such as a piano) or if it’s drums mainly.

Discovering Chord Idioms Through Beatles and Real Book Songs
I’d love to see this one, too :-(
Don’t get me wrong: I fully support parallel sessions (there isn’t really an alternative given this many oral presentations) but unfortunately the sessions weren’t split in a way that would allow me to see everything I would like to see. Why not put chords and alignment parallel to each other?? To demonstrate my point I won’t list any papers of the alignment session.

Automatic Instrument Recognition in a Polyphonic Mixture Using Sparse Representations
Another strange thing about how the sessions were split is that one parallel session always ends 15 minutes earlier than the other one. Do the organizers expect that everyone from the other session runs to the other session? I’d prefer if all sessions would end at the same time and thus make it easier to find a group to go join for lunch. Anyway, sounds like an interesting paper.

ATTA: Implementing GTTM on a Computer
It’s been a while since I first heard a presentation on GTTM. I guess it’s about time to refresh my knowledge.

An Experiment on the Role of Pitch Intervals in Melodic Segmentation
I have no clue… but segments often have different “local keys”. The chords within keys are usually clearly defined. Each chord has specific pitch intervals… I wonder what experiment they did.

Vivo - Visualizing Harmonic Progressions and Voice-Leading in PWGL
A visualization!

Visualizing Music on the Metrical Circle
Another visualization :-)

Applying Rhythmic Similarity Based on Inner Metric Analysis to Folksong Research
I’m curious how they compute rhythmic similarity. I have seen a lot of work on extracting rhythm information, but haven’t seen much on computing similarities using it.

Music Retrieval by Rhythmic Similarity Applied on Greek and African Traditional Music
Another rhythmic similarity paper :-)

A Dynamic Programming Approach to the Extraction of Phrase Boundaries from Tempo Variations in Expressive Performances
A long time ago I did some work on segmenting tempo variations… I’m curious how they represent tempo (do they apply temporal smoothing?) and how well detecting phrase boundaries works given only tempo. (Why not use loudness as well?)

Creating a Simplified Music Mood Classification Ground-Truth Set
Sounds like this might also be related to the MIREX mood classification task.

Assessment of State-of-the-Art Meter Analysis Systems with an Extended Meter Description Model
I wonder how good state-of-the-art methods work for meter detection.

Evaluating a Chord-Labelling Algorithm
Chord detection is great.

A Qualitative Assessment of Measures for the Evaluation of a Cover Song Identification System
Cover song detection is great, too.

The Music Information Retrieval Evaluation Exchange “Do-It-Yourself” Web Service
Wow! I wonder if they will have a demo ready?

Preliminary Analyses of Information Features Provided by Users for Identifying Music
I have no clue what this one is about, but it’s probably MIREX related.

Finding Music in Scholarly Sets and Series: The Index to Printed Music (IPM)
One of the many things I know nothing about, but it sounds interesting.

Humming on Audio Databases
I wonder if they provide a demo, and if they can motivate people to use it. (It will probably be more fun listening to people sing than see if their system works.)

A Query by Humming System that Learns from Experience
Would be nice to have this one right next to the previous one.

Classifying Music Audio with Timbral and Chroma Features
Another one for the very, very short list. I’m curious how the author combined the features, and if he measured improvements, and if he did artist identification or genre classification (and if he used an artist filter if so).

A Closer Look on Artist Filters for Musical Genre Classification
Sounds like something everyone should be using :-)

A Demonstrator for Automatic Music Mood Estimation
I definitely want to see this demonstration.

Mood-ex-Machina: Towards Automation of Moody Tunes
I wonder what this sounds like.

Pedagogical Transcription for Multimodal Sitar Performance
I wonder if it’s so pedagogical that I can understand it?

Drum Transcription in Polyphonic Music Using Non-Negative Matrix Factorisation
Not sure what’s new here, but I’ll be there to find out.

Tuning Frequency Estimation Using Circular Statistics
No clue what this is about. My best guess would be that it’s related to the pitch corrections I’ve seen in chord transcription systems.

TagATune: A Game for Music and Sound Annotation
Wow another music game! I haven’t heard of this one yet and Google hasn’t either. I’m very curious how it differs from the Listen Game and the MajorMinor game.

A Web-Based Game for Collecting Music Metadata
Would be great if they publish some usage statistics.

Autotagging Music Using Supervised Machine Learning
I’m very curious what results they got.

A Stochastic Representation of the Dynamics of Sung Melody
Another Japanese production :-)

Singing Melody Extraction in Polyphonic Music by Harmonic Tracking
I wonder how high the improvements were by tracking the harmony.

Singer Identification in Polyphonic Music Using Vocal Separation and Pattern Recognition Methods
I wonder how they evaluated this. Did all singers have the same background instruments and sing in the same musical style?

Transcription and Multipitch Estimation Session
I know nothing about multipitch estimation. But I hope to hear some nice demonstrations in the session.

Identifying Words that are Musically Meaningful
I wonder what the most musically meaningful word is. At Last.fm I think it’s “rock”. Another word very high up in the Last.fm ranks is “chillout” :-)

A Semantic Space for Music Derived from Social Tags
I’m curious what their tag space looks like.

The Music Ontology
I don’t know much about ontologies, but it sounds like this is the one and only one for music, so I better not miss it.

Signal + Context = Better Classification
I love this title. I hope the first author will be presenting it.

A Music Information Retrieval System Based on Singing Voice Timbre
I’ll probably be totally exhausted from having seen so many presentations and posters by this time, but I’ll try to reserve some energy to be able to concentrate on this talk.

Poster session 3 (MIREX)
Usually one of the highlights at ISMIR. I hope the MIREX teams manages to have the results ready in time. Only about 4 weeks left to get everything done.

Methodological Considerations in Studies of Musical Similarity
I wish this paper would have been published before I wrote my thesis. But I guess it's never too late to learn :-)

Similarity Based on Rating Data
Sounds like something Last.fm has been doing since years: The ratings are measured based on how often people listen to a song. Then standard collaborative filtering techniques are applied. The results are not too bad. I’m guessing that the authors used very sparse data compared to the data Last.fm has. Another paper I’d put on my very, very short list.

A Study on Attribute-Based Taxonomy for Music Information Retrieval
I wonder if this is similar to Pandora’s music genome project?

Variable-Size Gaussian Mixture Models for Music Similarity Measures
I wonder if and how the author was able to measure significant improvements.

Towards Integration of MIR and Folk Song Research
I like folk music, and I like MIR.

From Rhythm Patterns to Perceived Tempo
I’m curious how they approach this. A rhythm pattern (as defined in music books) does (AFAIK) not have any tempo information and can be played at different tempi. But I’m sure this is an interesting paper :-)

The Quest for Ground Truth in Musical Artist Tagging in the Social Web Era
The title reminds me of one of the more important papers in the short history of ISMIR. Tags are something very subjective, there is no right or wrong. You’ll always find people complaining about how other people mistagged the genre of a song. It will be interesting to see if this paper has the potential to join the ranks of the original ISMIR paper with a similar title.

Annotating Music Collections: How Content-Based Similarity Helps to Propagate Labels
Sounds like something very useful.

A Game-Based Approach for Collecting Semantic Annotations of Music
I hope they’ll present some usage statistics.

Human Similarity Judgments: Implications for the Design of Formal Evaluations
I wonder why this paper isn’t presented before the MIREX panel. Seems like it might contain a lot of information that would be useful for the discussion.

Saturday, 25 August 2007

Justin Donaldson’s Blog

For some reason I only found Justin’s blog today. The first time I heard of Justin was when I stumbled upon the visualizations he implemented at MyStrands. Basically he visualizes songs as little disks which are spread out on the screen according to similarity. He uses a modified version of (2-dimensional) MDS and he implemented a clever way to deal with songs that overlap spatially in the visualization. It's quite different from, for example, the way Paul uses (3-dimensional) MDS to visualize a music space.

Justin’s blog covers different topics, not all of which are related to MIR, but some of his entries are. For example, in this entry he addressed reviewer feedback regarding a paper he submitted on his MyStrands visualization. The blog entry's character is kind of similar to mine on the reviewer feedback I received for the MusicSun paper. Personally, I found it interesting to see that he was confronted with similar feedback as I have in the past: “Why is the visualization more informative/useful to a user than a ranked list of results?” In times where the simple Google style search interface dominates despite having so many nice visual alternatives (like kartOO) there is actually a need to explain this although it seems so obvious sometimes.

It seems to me that publishing a paper on a visual interface for music recommendation is much harder than publishing one about a genre classifier that achieves 100% classification accuracy (which isn't hard if the test set contains the same artists as the training set). I like how Justin implemented his visualization as part of a real music site (instead of just building a research prototype). I wonder if Justin couldn’t have easily conducted a simple evaluation using all the data gathered from that test. (Maybe just simple usage statistics over time?) I wonder if those statistics would be similar to the Alexa statistics for kartOO over the last years. Maybe visualizations that need more than one color and more than one dimension are too complex for music recommendation systems?

Btw, seems like Justin will be giving an interesting talk at the Recommendation Systems 2007 conference (as part of the doctoral symposium) and he’s also presenting a poster at ISMIR 2007.

Monday, 13 August 2007

MusicSun: A New Approach to Artist Recommendation

I previously announced that I’ll be posting reviews I received for a paper my co-author Masataka Goto and I submitted to ISMIR’07. You’ll find them below. Btw, I’d like to see others publish their reviews, too. Then I could decide based on those if I want to read a paper or not. It would be like using Amazon customer reviews when deciding to buy a book or not :-)

I’ve added the paper to my list of publications, there’s a link to the PDF and to the demonstration videos.

The paper was 6 pages long when we first submitted it and accepted as a 4 page version in the proceedings. (For the final version we had to shorten almost everything a bit. The biggest part we dropped was a comparison of the interface to a simplified version which was closer to the simple Google search interface. When reading the reviews it’s important to keep in mind that the reviewers were reading a rather different version of the paper than the one which is online now.) I’ve added my remarks to the reviewers in italic.

I really like receiving feedback to my work, and usually conference and journal reviews are a wonderful source of feedback. However, one thing I missed in the reviews the program chairs sent me was the final paper length each reviewer recommended (there was a field for that in the review form). Maybe next year they could improve this.

Btw, I would like to thank reviewers 1 and 2 as they have helped improve the quality of the paper. Reviewer 3 seems to have forgotten to write something. Reviewer 4 has helped a bit, but pissed me off a bit more. However, others have told me that there is nothing wrong with his or her review. I guess my view on this is not very objective ;-)

REVIEW SUMMARY

Reviewer	R1	R2	R3	R4
Relevance	+++	++	+++	++
Originality	++	++	++	-
Quality	++	++	+++	-
Presentation	+++	+++	+++	+

Legend:
+++ Strong Accept, ++ Accept, + Weak Accept
--- Strong Reject, -- Reject, - Weak Reject

=====================================
Reviewer 1: Detailed comments

Overall: Cool interface, and a decent user study. In general, I would prefer to see a more task-specific evaluation (how long does it take a user to find music they like? do they succeed?) and for specific features how often are they used? But the self-reporting survey is a good start.

It’s always nice when a reviewer starts the review with something positive :-)

Regarding “how long does it take a user to find music they like?”: I think an interface to explore music is closer to a computer game than a tool to get work done. For games measuring how much fun the users are having is way more important than measuring how long it takes to get something done (which is one of the main criteria for tools). Nevertheless, I agree with the reviewer, the evaluation we provided is not as thorough as it could be. (Although this has been by far the most extensive evaluation of a user interface I’ve ever conducted.)

My biggest criticism is that there isn't a true baseline; users are only comparing two different versions of the author's system, and not comparing against, say a simple web search, or a competing tool like collaborative filtering.

This comparison the reviewer is referring to has been removed because we didn’t have enough space.

Regarding baselines: comparing our system against state-of-the-art recommendation engines such as the one from Last.fm wouldn’t have been a fair comparison either. We thought that since the main contributions of our paper are the new interface features, a useful evaluation would have been to remove those new elements and see what the users think. I’d be very interested to get some more advice on how to better evaluation user interfaces to explore music.

Specific comments:

- “music” and “review” as constraints for finding relevant web docs: for many ambiguous band names, this doesn't perform very well. For instance, for the band “Texas”, the query “texas music review” brings up many irrelevant pages. this is a hard problem, and for research systems probably not too important to worry about, but it may be worth mentioning.

Excellent point. My only excuse is that we didn’t have enough space to discuss everything.

- how do you generate the vocabulary lists? if it's just and ad-hoc manual process, please mention that, and perhaps suggest other ways to make it more principled and to evaluate changes to these lists.

Good question. I think we’re a bit more specific on this in the 4 page version, but maybe it got squeezed out at the end. Automatically generation such vocabularies would be really interesting. But I wouldn’t know how to do that (without using e.g. Last.fm tag data).

- Similarly, why did you choose 4 vocabularies vs some other number?

I think we didn’t have space to explain this in the 4 page version.

The number of vocabularies is more or less random. The vocabularies are based on previous work (we used them in an ISMIR’06 paper, and before that in an ECDL’05 paper). However, there’s an upper limit on the number of vocabularies that would make sense to use, and the users seemed fine with the 4 we used. (Of course it would be really nice to adapt it to different languages as well.)

- not all readers will know what “tf-idf” is. please explain or provide a reference.

We added a more explicit reference in the final version. I find things like these really hard to notice. I talk about tfidf all the time and just started assuming that the whole world talks about tfidf the whole time, too. :-)

- table 3: unnecessarily confusing to introduce the “L” and “R” just call them Easy and Hard, and explain that you grouped the top/bottom 3 points in a 7-point scale. and instead of “?”, label it “No Answer” or something

Again something that’s hard to notice if you are too deep into the material. Thanks to this reviewers comment the respective table should be more understandable.

- rather than the self-reported results about which optional features the user found useful, i think a better eval would be a count of actually how many times the user used them.

Actually the usage of features wasn’t something the users reported themselves. I was sitting next to them and taking notes while they were using the interface. Thinking of it now, I realize that maybe even in the final version this might not be clear enough :-/
However, automatically counting how often functions were used would have been better. Unfortunately, I didn’t store the log files for all users because I had some technical problems (and I though that making notes while watching them use the interface would be sufficient). Next time...

=====================================
Reviewer 2: Detailed comments

This is a well written paper... and it demonstrates a nice system for showing users new songs.

Again, it’s really nice when a reviewer starts a review with something positive.

This paper largely walks through one design, a very nice looking design, but how does this system compare to other systems? I found the evaluation in this paper weak.

How does the part of this work that combines recommendations compare to the work of Fagin (Combining Fuzzy Information From Multiple Systems )? Would that be a better approach?

I’m not familiar with Fagin. (But fuzzy combinations sound interesting.) Regarding the reviewer’s criticism of the evaluation I guess they are in line with Reviewer 1.

UPDATE: Klaas has posted a link to a really nice introduction to aggregation operators in the comments.

I really wanted to know which similarity approach worked best. This paper doesn't address that issue.

This was beyond the scope of our paper. But it surely would have been very interesting to evaluate the different individual similarities we used :-/

Testing UI design is hard.. one needs a task and then lots of users. Can you do this?

Yes, it is hard :-)
And no, it doesn’t seem like we did a good job :-/

=====================================
Reviewer 3: Detailed comments

Unfortunately this reviewer didn't explain why he or she gave us such high scores.

=====================================
Reviewer 4: Detailed comments

The paper addresses an interesting issue, recommendation systems and interfaces to support them. I found the idea of using multiple information sources very interesting, and potentially useful.

Again, it’s always nice when a reviewer finds something positive to start with. However, the idea of using multiple information sources for recommendations isn’t new, and I don’t think my co-author and I can take the credit for it. And I don't understand how someone can say that combing different sources of information is only "potentially" useful. Even if I close both of my eyes I can clearly see that there's no way around that :-)

The major problem that I have with the paper is the experimental
design: I am not quite sure what is being evaluated. Is it the
recommendation system interface or the underlying software used to
create the recommendations? Is it the recommendation system interface or the underlying software used to create the recommendations? If it is the former, which I think it is, then the design of the experiment seems to confound many issues.

I think it's difficult to separate the two. It’s not really possible to evaluate the user interface without considering limitations of the underlying recommendation system. The way the user interface deals with these limitations and presents these shortcomings to the user is a very critical aspect of systems using state-of-the-art content-based algorithms. This is something we explicitly dealt with in the long version of the paper and briefly mention in the short version (e.g. the indicators for how reliable the system thinks the recommendations are). Furthermore, the recommendation system and the interface are very closely linked to each other (e.g. the way the users are given the option to adjust the aspect of similarity they are most interested in).
But then again, as Reviewer 1 and 2 have already pointed out, there are limits to the evaluation we present in our paper.

For example, the authors do not control for user expertise, nor do
they control for system issues (e.g., the database not being large
enough to provide a user with the song he or she is seeking).

We gathered and analyzed lots of statistics on the users expertise (in terms of using computers, music interfaces, and musical knowledge), music taste, and general music listening & discovery habits, but didn’t include everything because we ran out of space. Nevertheless, we allocated some space in the 4 page version to describe the participants with more detail.

Moreover, conclusions like, and I'm paraphrasing, "the users say they would use it again" are, for the most part, without any normative value.

We never claimed that such a conclusion is normative. Of course we measured user satisfaction in different ways (direct and indirect), but most of the evaluation part of our paper deals with parts of the interface we thought users would like/understand/use but (surprisingly) didn’t. We believe the contribution of our evaluation is to point out a number of directions for future work.

Under what circumstance would they use it (e.g., if they were paid to evaluate it)? It is a stretch to conclude they would use the system if it were part of Amazon-- part of Amazon in what way; in comparison to what; etc.?

I’m really confused by the reviewer’s remarks. Just because (when asked) the users said they would like to use it doesn’t mean that users would really use it. And we never drew this conclusion. Instead we’ve pointed out several (in the longer version even more) limitations of the interface. We never tried to market MusicSun as a finished system, but rather as a prototype from which there is something to learn from.

In addition, the paper is rife with typos and stylistic problems
(e.g., citations are not a part of speech), and the reference section
relies quite heavily on the authors' own work."

It would have been nice if the reviewer would have explicitly mentioned that this is not why he voted for a weak reject. Furthermore, there’s nicer ways of putting this. Neither my co-author nor I are native speakers. It would have been more helpful if the reviewer would have pointed out some of the typos.

Regarding the self-citation: we cited everything we thought was relevant to understand the work we presented. Most of the interface is built on techniques we previously used. We didn’t have room to describe everything, so we referenced it instead. It would have been more helpful if the reviewer would have pointed us to references that are missing, or unnecessary.

=====================================

Btw, check out these links my colleague Norman Casagrande pointed me to (both from the legendary Phd comics series):
- Paper review worksheet
- Addressing reviewers comments

MIR Research