Amazon's AI Creates Synthesized Singers (venturebeat.com) 19
Kyle Wiggers, writing for VentureBeat: AI and machine learning algorithms are quite skilled at generating works of art -- and highly realistic images of apartments, people, and pets to boot. But relatively few have been tuned to singing synthesis, or the task of cloning musicians' voices. Researchers from Amazon and Cambridge put their collective minds to the challenge in a recent paper in which they propose an AI system that requires "considerably" less modeling than previous work of features like vibratos and note durations. It taps a Google-designed algorithm -- WaveNet -- to synthesize the mel-spectrograms, or representations of the power spectrum of sounds, which another model produces using a combination of speech and signing data.
The system comprises three parts, the first of which is a frontend that takes a musical score as input and produces note embeddings (i.e., numerical representations of notes) to be sent to an encoder. The second is a model that is modified to accept the aforementioned embeddings, whose decoder produces mel-specrograms. As for the third and final component -- the WaveNet vocoder, which mimics things like stress and intonation in speech -- it synthesizes the spectrograms into song. The frontend performs linguistic analysis on the score lyrics, allowing for three possible vowel levels of stress and ignoring punctuation. In time, it discovers which phonemes (perceptually distinct units of sound) correspond to each note of the score using syllabification information specified in the score itself. It also computes the expected duration in seconds of each note, as well as the tempo and time signature of the score, which it combines into embeddings.
The system comprises three parts, the first of which is a frontend that takes a musical score as input and produces note embeddings (i.e., numerical representations of notes) to be sent to an encoder. The second is a model that is modified to accept the aforementioned embeddings, whose decoder produces mel-specrograms. As for the third and final component -- the WaveNet vocoder, which mimics things like stress and intonation in speech -- it synthesizes the spectrograms into song. The frontend performs linguistic analysis on the score lyrics, allowing for three possible vowel levels of stress and ignoring punctuation. In time, it discovers which phonemes (perceptually distinct units of sound) correspond to each note of the score using syllabification information specified in the score itself. It also computes the expected duration in seconds of each note, as well as the tempo and time signature of the score, which it combines into embeddings.
Wow. Full of malware pop-ups. (Score:4, Informative)
I followed the story link to get more info, and I got hit by a giant pile of malware popups. I did not dig any further.
Re: (Score:3, Funny)
I followed the link in your sig to get more info, and I got hit by a font that attacked my retina like a circular saw. I did not dig any further.
In other words (Score:2)
Re: (Score:2)
Re: (Score:2)
Good news (Score:1)
The good news is that it works. The bad news is that it the output always sounds like Taylor Swift.
Re: (Score:2)
It's not bad news for Swift. Perhaps after 20 million people make and publish tunes, Taylor comes back and sues them all for likeness copyrights, making jillions.
It reminds me of what Oracle is trying do with Java API's. Fuck You, Oracle! May somebody swiftly tailor you into bankruptcy.
Seen this before... (Score:3)
There's been plenty of attempts to create "virtual singers" which are basically the deepfakes of the music world.
- AutoTune has fallen in price and used by many pop acts....
- MP3 briefly had commands that imitated the notes of several notable singers.
- Hasley has recently accused BTS of using Korean syllables she can't make with her notes in the song Boy Wit Luv, a virtualization tech that isn't allowed here.
So, really, we're dependent on Kiss-FM and YouTube to make sure this tech doesn't wipe out the RIAA... for once something RIAA and Slashdot can agree about.
Re: (Score:2)
I don't know. It might be worth it to get rid of the RIAA.
Re: (Score:2)
It happened maybe 15 years ago. There is a product called Vocaloid which is basically a vocal synth. Like you have a synth to do drums or piano you can have one to do backing vocals and harmonies.
It couldn't really sing in English, it wasn't that advanced. But Japanese is simpler and people started making songs with Vocaloid doing the lead singing.
The particular voice they used as called "Hatsune Miku". They gave each voice a character... Anyway she became a star in her right, with other people making music
One Year From Now... (Score:2)
"Alexa, sing me a song."
"Okay."
"All your Bass and Treble are Belong to MEEEEEEEE!"
Can Prime members (Score:2)
download the songs for free?
Re: (Score:2)
Until SkyNet sues for damages!
Quickies (Score:2)
I'd love something like this for making demos and spoof tunes, such as political satire. Quality matters less for those, being I don't trust "the bots" to do quality work at this stage in history.
I'd like to be able to say with text: here are my chords and my melody with notes and lyrics. Make me a demo in the style of song X with singer Y. Such markup may look something like:
In 20 years cheap music will be made this way (Score:1)
Ten years ago you had the "generic choir sings Christmas Favorites" holiday CDs in the discount bin.
Now you have "generic choir sings Christmas Favorites" on digital download.
In 2029 you'll have "Year your Christmas Favorites" but it will all be synthetic.
The marketer won't have to pay a choir. If the music and lyrics are in the public domain, once he's bought the software, he can do it all himself and not pay anyone except maybe the hosting service/distribution service/what-not.
In 50 years, the demand for
grr s/Year/Hear (Score:1)
"Hear your Christmas Favorites" not "Year your Christmas Favorites."
I still need to tweak my AI-assisted spell-checker.
Vocaloid Alexa? (Score:1)
Is it soulful? or machine generated gibberish? (Score:2)