Wednesday, May 13, 2009

She Blinded Me With Science - identifying similar songs

For decades upon decades, artists from Coldplay to various Beatles have been accused of song plagiarism. The actual detection of plagiarism is somewhat hard, since there are only so many notes that can be arranged in so many ways, and since it's hard to draw a dividing line between inspiration and appropriation.

Or is it? Back in 2003, some people from the University of Queensland tried to mathematically determine music similarity:

Exhibitor: Michael Wignall

Supervisor: Peter Kootsookos

Research Group: Electromagnetics and Imaging

Industry Sector: Media / Entertainment
Content-Based Music Similarity

The fundamental purpose of this project is to develop a system by which music similarity can be measured based solely on the content of the music itself. This system will analyse the inherent characteristics of musical pieces and use that analysis to compare songs, independently of any metadata that may exist.

The basis for this comparison is a song’s musical ‘fingerprint’, which is computed by clustering a set of spectral features represented by the Mel-Frequency Cepstral Coefficients (MFCC) of the audio signal. This fingerprint not only uniquely identifies a musical piece; it also provides information about its musical characteristics.

A database of fingerprints can be generated for a selection of songs, from which the similarity between those pieces of music can be evaluated. This provides an opportunity for a number of different applications, including genre grouping, instrument matching, artist isolation and many more.

Now the application that was developed didn't try to find identical songs, but similar ones:

An application to automatically create “DJ Sets” of music was created to demonstrate the fingerprinting and comparison technology developed as part of this project. This application involves generating a database of dance music fingerprints and then, based upon a number of set descriptors (including a ‘seed’ song), automatically generating a playlist of music which when mixed together forms one continuous synchronised DJ set.

Now I kind of doubt that "He's So Fine" and "My Sweet Lord" would end up in the same DJ set.

However, the existence of this experiment does suggest that similar mathematical programs could be developed that could determine, based upon some human-derived rules, whether one song copies another song. However, let's say that some enterprising company developed such a program and sold it to the RIAA or music industry lawyers or whatever. Would the enterprise be obligated to reveal its source code to expose the rules that were used to determine that one song copied another?
