This year's MIREX evaluation task has been one of my personal ISMIR 2008 highlights. Stephen Downie and his team computed more numbers than I could possible keep track of for lots of different algorithms in 12 different MIR tasks. That's a lot more than in any of the previous years, and that's a lot of interesting data to dig into.
I've been particularly interested in the auto-tagging task. It's the first time MIREX ran this form of task, and there have been only few research papers in the MIR community on the subject. As far as I understood there is no agreement yet as to how to exactly evaluate the algorithms, which is also reflected on the result page. Kris West has added information on the statistical significant of the results which show that none of the submissions was consistently and significantly better than others. Nevertheless, there's a lot to learn from the evaluation and I hope we'll see many more participants next year.
Paul has a good summary of the discussion of this year's MIREX panel.