You may cringe, but the project had some really impressive results when you think about it.
And of course science advances, and seven years have passed. So now it's not enough to feed vocals to a program. What about feeding it...a picture? And then letting the program create both the music AND the lyrics?
Enter the University of Toronto, a picture of a Christmas tree, and this.
Neural Story Singing Christmas from Hang Chu on Vimeo.
Now I'm a chord guy, so I've been (repeatedly) enjoying the chord progressions that the program came up with. But I guess I ought to pay attention to the lyrics.
Lots to decorate the room.
The Christmas tree is filled with flowers.
I swear its Christmas Eve.
I hope that is what you say.
The best Christmas present in the world is a blessing.
I've always been there for the rest of our lives.
A hundred and a half hour ago.
I am glad to meet you.
I can hear the music coming from the hall.
A fairy tale.
A Christmas tree.
There are lots and lots and lots of flowers.
Clayton Purdom of the A/V Club described the result as bone-chilling:
These are not lyrics, they are the moans of the damned, trapped between this world and something beyond it, just conscious enough to know they are not at rest.
While Purdom (and even smithsonian.com) has a point about the emotional emptiness of the lyrics (perhaps it will appeal to the narcissists that I will soon discuss in a separate blog post), the song is more impressive than it appears at first glance. The program analyzed an image, and based upon the information it had learned, it was able to come up with something that is recognizable as a Christmas song. But first the program had to learn:
Neural karaoke emerged from a broader research effort to use computer programs to make music, write lyrics and even generate dance routines. Taking music creation as a starting point, Hang Chu, a PhD student at the lab, trained a neural network on 100 hours of online music. Once trained, the program can take a musical scale and melodic profile and produce a simple 120-beats-per-minute melody. It then adds chords and drums....
Another hour of Just Dance tunes and 50 hours of song lyrics from the internet helped teach the program how to put words to music. Drawing on words that appeared at least four times in the dataset, the program built up a vocabulary of 3390 words, which the computer could then string together at a rate of one word per beat.
For the final step of the latest work, the program trained on a collection of pictures and their captions to learn how specific words can be linked to visual patterns and objects. When fed a fresh image, the program can compile some relevant lyrics and sing them using phonemes, or units of sound, linked to the words in its vocabulary.
In this case, the program saw a tree that appeared to be a Christmas tree, some cubes that appeared to be presents, and something (the stars? the lights?) that appeared to be lots and lots and lots of flowers. To top it off, because Christmas songs often evoke emotional reactions, the line "The best Christmas present in the world is a blessing" was thrown in there.
Frankly, I'm impressed. And not just by the chord progressions.
Although the program is still a little lacking in love songs.