Sunday, 24 August 2008

Tagging Critics

I was doing some research for the ISMIR tag tutorial when I stumbled upon (via this interesting paper Playing Tag: An Analysis of Vocabulary Patterns and Relationships Within a Popular Music Folksonomy by Abbey E. Thompson):

The following expert from this paper:

[...] "tags are often ambiguous, overly personalised and inexact" [...] "The result is an uncontrolled and chaotic set of tagging terms that do not support searching as effectively as more controlled vocabularies do." [...]

This was published in the D-Lib magazine in early 2006. I wouldn't be surprised if by now the authors realized they were wrong.

But why would anyone ever want to control the vocabulary people use when describing something so extremely multifaceted and something that evolves so fast like the content on the web (delicious), or snapshots of life (flickr), or music ( I guess I'd need to think more like an old-skool librarian to understand that.


Jeremy said...

There are possibly two reasons I can think of to control tagging. One is to give a degree of authority to the tags (e.g., Pandora). Is this more accurate for describing people's tastes? Depends. I've found that playlists are easier to predict when the tags are not a closed set, even when accounting for dimension. However, the test songs were already on the person's playlists; therefore, they were not really the same as musical discovery. Pandora's business has shown that some people prefer their model for musical discovery.

The second reason is that it helps with discovery by reducing noise. One issue is that tags are often misspellings or very related. Research into finding and correcting these is imperfect (from what I've seen). Some sort of correction feedback with the user similar to what Google does with misspelled searches could be beneficial. That is, if the user actually does anything after this.

Elias said...

Thanks Jeremy!

But I still don't really see how a controlled vocabulary (which I'm interpreting as a vocabulary controlled by a limited number of experts) could ever be a better way of organizing all the information in the world than the vocabulary used by those who also search for the information (=anyone).

Regarding Pandora: User feedback on radio stations is an extremely valuable data source. Ignoring it would seem like a strange thing to do. E.g. if the music genome project thinks that A is similar to B, but the fans of A disagree (voting thumbs-down on B), then I'd trust the fans more than the genomes. I guess what I'm trying to say is that I've seen no evidence that Pandora only uses data from the music genome project.

I agree that an intelligent tag search system would understand what common misspellings and synonyms are. But that does not require a controlled vocabulary? Or is my interpretation of control wrong?

Anyway, I don't see why anyone would want to limit the way people talk about music. To some extent tagging is like poetry. Creative tags made me laugh countless times. And creative tags have helped me discover music I would have otherwise never discovered.

Yves said...

Thankfully, not all librarians think like that!

But we have to keep in mind tags do not solve all data management problems. Even in your personal music library, would you consider using just tags to manage it? Dropping all the "artist", "track" etc. to move towards a flat data model Item --> Tag? If so, what if you want to describe one of the tag in more details (eg. this artist was born in London)? The crowd doesn't help the underlying data model from being flat.

Both approaches (I wouldn't say "controlled", but structured vocabularies and folksonomies) are complementary, and are useful for different facets of music-related information.

(And both approaches can be easily integrated if you give web identifiers to everything (tracks, artists, tags, etc.) and use a bit of RDF to explain how they relate to each other :-) )

Elias said...


Most definitely! Tags are just one of many interesting dimensions. Lyrics! Audio similarity! "Standard" catalog information (performer, title, composer, producer, collaborators, year performed, year composed, conductor, ...) are extremely important... I wouldn't want to miss any of those.

Mathias said...


Regarding tags and MIR research: Is there a chance that the API will get some research friendly methods in the near future? E.g. I'm thinking of possibilities like retrieving random user or track tag sets in order to gain information about the general tag distribution in the system. Being able to access all tags for a given track (and the raw tag counts) would be interesting, too.

Regards, Mathias

Elias said...


I'm not sure. Afaik priority is given to services that make it easy for users to access their own data, and services that enable third parties to develop additional features ( And the main person involved in the API has an extremely long todo list.

If you have a very specific question regarding tags, e.g., how many tags do taggers use in average. I might be able to get you that number.

If you are interested in raw tag counts: Paul Lamere is distributing a large chunk of tag data with raw tag counts.

Mathias said...


Thanks for your offer and for the link. Paul's data set looks indeed very promising. Information on the average amount of tags would be useful to me, so if you don't mind I will contact you via e-mail about that.

Regards, Mathias

[tourist].Tam said...

Is this not a problem of language? What if the user of this ideal system you are talking about starts to use the spelling of the youngster "txt msging mobile fone usr"?
Plus the fact that we are agreeing in plain English here, and not in another western language (I am not even starting to mention the difficulties to implement English formed ideas in Japanese, Indus, Malaisian, etc ...).

I do thing that tagging shouldn't be part if the music object in a music related system, but attached to it. : )