Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI Music Technology

Amazon's AI Creates Synthesized Singers (venturebeat.com) 19

Kyle Wiggers, writing for VentureBeat: AI and machine learning algorithms are quite skilled at generating works of art -- and highly realistic images of apartments, people, and pets to boot. But relatively few have been tuned to singing synthesis, or the task of cloning musicians' voices. Researchers from Amazon and Cambridge put their collective minds to the challenge in a recent paper in which they propose an AI system that requires "considerably" less modeling than previous work of features like vibratos and note durations. It taps a Google-designed algorithm -- WaveNet -- to synthesize the mel-spectrograms, or representations of the power spectrum of sounds, which another model produces using a combination of speech and signing data.

The system comprises three parts, the first of which is a frontend that takes a musical score as input and produces note embeddings (i.e., numerical representations of notes) to be sent to an encoder. The second is a model that is modified to accept the aforementioned embeddings, whose decoder produces mel-specrograms. As for the third and final component -- the WaveNet vocoder, which mimics things like stress and intonation in speech -- it synthesizes the spectrograms into song. The frontend performs linguistic analysis on the score lyrics, allowing for three possible vowel levels of stress and ignoring punctuation. In time, it discovers which phonemes (perceptually distinct units of sound) correspond to each note of the score using syllabification information specified in the score itself. It also computes the expected duration in seconds of each note, as well as the tempo and time signature of the score, which it combines into embeddings.

This discussion has been archived. No new comments can be posted.

Amazon's AI Creates Synthesized Singers

Comments Filter:
  • by dgatwood ( 11270 ) on Friday December 20, 2019 @04:50PM (#59542706) Homepage Journal

    I followed the story link to get more info, and I got hit by a giant pile of malware popups. I did not dig any further.

    • I followed the link in your sig to get more info, and I got hit by a font that attacked my retina like a circular saw. I did not dig any further.

  • Author: "I have no idea what they're talking about, but singing AI sure sounds cool. I'm being paid to write an article though, so I'll just regurgitate the technobable from the whitepaper and hope no one notices."
    • It's Slashdot so almost no one will even read the article in the first place. I quit after getting the link to the actual paper [arxiv.org] from the article.
    • That's pretty much a summary of how The Media in general deals with the crapware they keep calling 'AI': "We don't understand a single word of it, but it's HOT and FRESH and we're getting PAID to write about it, so we'll just copypaste a bunch of shit with some other shit to glue it together into an article and call it good, collect the paycheck".
  • The good news is that it works. The bad news is that it the output always sounds like Taylor Swift.

    • by Tablizer ( 95088 )

      The good news is that it works. The bad news is that it the output always sounds like Taylor Swift.

      It's not bad news for Swift. Perhaps after 20 million people make and publish tunes, Taylor comes back and sues them all for likeness copyrights, making jillions.

      It reminds me of what Oracle is trying do with Java API's. Fuck You, Oracle! May somebody swiftly tailor you into bankruptcy.

  • by The New Guy 2.0 ( 3497907 ) on Friday December 20, 2019 @05:14PM (#59542776)

    There's been plenty of attempts to create "virtual singers" which are basically the deepfakes of the music world.

    - AutoTune has fallen in price and used by many pop acts....
    - MP3 briefly had commands that imitated the notes of several notable singers.
    - Hasley has recently accused BTS of using Korean syllables she can't make with her notes in the song Boy Wit Luv, a virtualization tech that isn't allowed here.

    So, really, we're dependent on Kiss-FM and YouTube to make sure this tech doesn't wipe out the RIAA... for once something RIAA and Slashdot can agree about.

    • by HiThere ( 15173 )

      I don't know. It might be worth it to get rid of the RIAA.

    • by AmiMoJo ( 196126 )

      It happened maybe 15 years ago. There is a product called Vocaloid which is basically a vocal synth. Like you have a synth to do drums or piano you can have one to do backing vocals and harmonies.

      It couldn't really sing in English, it wasn't that advanced. But Japanese is simpler and people started making songs with Vocaloid doing the lead singing.

      The particular voice they used as called "Hatsune Miku". They gave each voice a character... Anyway she became a star in her right, with other people making music

  • "Alexa, sing me a song."

    "Okay."

    "All your Bass and Treble are Belong to MEEEEEEEE!"

  • download the songs for free?

  • I'd love something like this for making demos and spoof tunes, such as political satire. Quality matters less for those, being I don't trust "the bots" to do quality work at this stage in history.

    I'd like to be able to say with text: here are my chords and my melody with notes and lyrics. Make me a demo in the style of song X with singer Y. Such markup may look something like:

    @bm // b-minor chord
    b: I // [note]: [phrase]
    c#: am // c-sharp
    b: the
    a: orange walrus,
    b: goo goo ka cho
    @C // c major chord
    d: you
    b#: ca

  • Ten years ago you had the "generic choir sings Christmas Favorites" holiday CDs in the discount bin.

    Now you have "generic choir sings Christmas Favorites" on digital download.

    In 2029 you'll have "Year your Christmas Favorites" but it will all be synthetic.

    The marketer won't have to pay a choir. If the music and lyrics are in the public domain, once he's bought the software, he can do it all himself and not pay anyone except maybe the hosting service/distribution service/what-not.

    In 50 years, the demand for

  • Virtual singers are already a huge thing, mostly in Japan - infact maybe don't be surprised if you see a certain Hatsune Miku possibly performing during the Tokyo Olympics opening ceremony! Various engines exist for using virtual singing synthesizers in music - Yamaha's Vocaloid is by far the most well known, with many dozens of voices and characters released using the tech since it's debut in 2006. Things really began to take off with Crypton Future Media's release of Hatsune Miku for the Vocaloid 2 engin
  • music/singing is inherently human; there is something that drives you deeper away from the chattering mind. The exhale pattern (how you chop the exiting air/prana) has to influence the mind due to the MBX (mind - breath nexus -- how you feel nice after a long run - pranayama). An AI/computer generating these patterns likely be off for this soul touching/piercing. Its' like in the symbolic processing domain (nama roopa processing - like spoken text/thoughts/mind), AI starts making math conjectures -- can th

"No matter where you go, there you are..." -- Buckaroo Banzai

Working...