Monday 11 February 2008

Simple Stuff

Jeff Hammerbacher from Facebook’s team on data and analytics gave an interesting talk and it can be viewed here (which might be broken, here is a direct link). I’d highly recommend it to anyone interested in what happens behind the scenes on websites dealing with a lot of interesting data.

Some of the stuff he talks about is related to what happens at Last.fm. For example, at Last.fm we also use Hadoop.

I found the part around minutes 33-38 most interesting. He talks about the skills needed to work with their data. He mentions that being able to write code is a must. He mentions that most of the people on his team write more code than they did in their previous positions. And he mentions that very simple statistical tools such as curve fitting and understanding statistical significance can be used to solve most of their learning from data challenges. He talks about how visualizing data is very important (e.g. to identify and understand outliers).

Obviously music recommendation is a much more complex problem than any of the challenges Facebook is facing. Scrobbles, tags, skipping behaviour, etc require very different treatment than the data Facebook gathers. Or maybe not?

To some of the most interesting things I’ve had the pleasure to work on at Last.fm I’ve only applied very basic statistical techniques: non-linear curve fitting and measuring the significance of improvements. However, while the “machine learning” parts could hardly be any simpler, the complexity of dealing with terabytes is something completely different.

Btw, at Last.fm we are hiring someone to work on data and analytics and we also got a position related to data warehousing. Both positions would be facing challenges very closely related to the stuff Jeff talks about. Except that the data we have is a lot more interesting! It made me a bit sad to hear that one of the things they were actually looking at is communication streams between universities… you’d think they’d have a lot more interesting insights to gain? ;-)

Thursday 7 February 2008

Web Services for Researchers

It just occurred to me how soon every research lab might be offering a long list of web services. Bandwidth is not a limiting factor. Building a web service is not that hard as it was 5 years ago. It's a great way to share without giving away code (and IP). It's also user friendlier as it doesn't require installing someone else's most likely buggy code on your own system. And it's potentially a great way to make money, too!

I wonder if I'm the last one to realize this? :-)

Anyway, what has helped me realize this was Thomas Lidy's announcement of his teams new web service, and The Echo Nest's web services that I recently found out about through Paul. Both allow you to upload music, extract features from the audio signal, and send them back to you.

I just gave both a try and they worked very smoothly. The two pictures below show results for the same track. The first one is created with the processing music visualization tool provided by The Echno Nest, the second one using Matlab to analyze the fluctuation pattern that Tom's tool extracts.





I wonder if the Echo Nest's service would crunch 100k tracks. (I believe there are at least a few research groups already dealing with collections beyond 100k tracks.) The service Tom announced is limited to 100 tracks/day and a maximum of 300 total per voucher (which requires you to sign up with your email address). Anyway it's a great start. And it seems that Tom will soon be making more announcements on further services that allow anyone to visually organize their music collections using a metaphor of geographic maps. Nice!

Btw, the Last.fm web services also seem to be very popular amongst researchers, at least some have been hitting them very hard ;-)

And one of the most eagerly anticipated web services is probably the MIREX DIY web service which was announced at ISMIR 2007 by Stephen Downie's team. The service will allow researchers to upload their implementations and receive evaluation results in return. Which will make it very easy for researchers to test if they are heading in the right direction.

Tuesday 5 February 2008

2 PhDs and 1 MSc

Matthew Davies recently made his PhD (Towards Automatic Rhythmic Accompaniment) available online.

Arturo Camacho recently announced on the music-ir list that his PhD (SWIPE: A Sawtooth Waveform Inspired Pitch Estimator for Speech and Music) and the corresponding Matlab code are available online.

Claudia Wronski recently finished her Master's thesis (in German) on "Die veränderten Zugriffsmöglichkeiten auf die Ressource Musik – Auswirkungen auf das Kauf-, Nutzungs- und Rezeptionsverhalten der Musikkonsumenten" (freely translated: the changing access to music and its impact on shopping and consumption habits of music listeners). She covers topics such as how the music culture is changing, the crisis of the music industry, the long tail, the emancipation of artists, and the future of music.

Claudia is an active Last.fm user, and has successfully been leading a very interesting user group: The Special Interest Tag Radio Collective.