How do you organize music? By genre? Artists? Instruments? Is it a process that can be universalized, or is it personal?
Organizing music is the fundamental problem that the Music Genome Project attempts to solve. Named after the Human Genome Project, the Music Genome Project is a "sophisticated taxonomy of musical information"; it uses a controlled set of vocabularies to classify music and helps connect people to music they love, including ones they don't know yet.
The vocabularies used in the methodology reflect the musical attributes of the songs being classified, analogous to genetic traits in human being. As of today, there are approximately 450 distinct characteristics that are used to analyze each song in the library. Most of these vocabularies are objective and observable (e.g. vocal duets, acoustic guitar solo, percussion, triple meter style, etc), although there are some that are more nuanced (e.g. driving shuffle feel, wildly complex rhythm, epic buildup/breakdown, etc).
The Music Genome Project is the engine that powers Pandora, an online radio station that plays music based on categories defined by the user. The category for each station can be as granular as a specific artist ("Britney Spears") or as generic as a genre ("pop"). The algorithm will detect attributes that make up the category, then scan songs that share similar "genes" (or to use a 202 term, "family resemblance") and puts them in the user's listening queue. The user can find out why the song is being played on the station, e.g. "it features heavy use of samples, house roots, four-on-the-floor beats, and beats made for dancing." S/he can also vote the song down if s/he finds that it doesn't really match his/her listening preferences, and vice versa.
While the project is updated continuously, it interestingly doesn't use "automated data extraction" at all to organize its library. Rather, the project relies on experts (music analysts) trained in music theory and history who are able to analyze songs in a precise manner using the prescribed methodology. This helps maintain a higher level of data quality and integrity, but on the other hand, it leads to a smaller catalog size compared to competing services such as Spotify (~1 million vs ~16 million, respectively, as of June 2012).
Another fascinating aspect of this curation process is the attempt to minimize qualitative bias. The analysts for the most part try to ignore "taste" and cultural baggage that come with a song/genre/artist. For instance, an analyst may categorize a song under "gritty male vocal", but how good/bad the singer sounds in the song, or the popularity of the artist himself are irrelevant factors inside the organizing system. In the NY Times article, there's an amusing anecdote that reflects the social and cultural baggage that may come from a specific artist/genre. The user in this case was served a Celine Dion song while listening to a Sarach McLahlan station — "… it was the right sort of thing — but it was Celine Dion!"
Organizing music is inherently difficult, but the Music Genome Project has successfully tackled this problem by employing a number of information organization concepts, namely controlled vocabularies, resource description and categorization, and data maintenance and integrity. The result is a fascinating music engine that allows users to enjoy music they already love and discover new ones in the process. The engine is quite intelligent — intelligent enough that, in some ways, it knows the user better than s/he knows him/herself.
Sources:
http://arstechnica.com/tech-policy/2011/01/digging-into-pandoras-music-genome-with-musicologist-nolan-gasser/
http://en.wikipedia.org/wiki/List_of_Music_Genome_Project_attributes
http://www.nytimes.com/2009/10/18/magazine/18Pandora-t.html?pagewanted=all