Stability AI Brings Text-To-Audio Generation To the Masses (venturebeat.com) 8
Stability AI today announced the initial public release of its Stable Audio technology, providing anyone with ability to use simple text prompts to generate short audio clips. VentureBeat reports: StableAudio is a new capability, though it is based on many of the same core AI techniques that enable Stable Diffusion to create images. Namely the Stable Audio technology makes use of a diffusion model, albeit trained on audio rather than images, in order to generate new audio clips. "Stability AI is best known for its work in images, but now we're launching our first product for music and audio generation, which is called Stable Audio," Ed Newton-Rex, VP of Audio at Stability AI told VentureBeat. "The concept is really simple, you describe the music or audio that you want to hear in text and our system generates it for you."
Newton-Rex is no stranger to the world of computer generated music, having built his own startup called Jukedeck in 2011, which he sold to TikTok in 2019. The technology behind Stable Audio however does not have its roots in Jukedeck, but rather in Stability AI's internal research studio for music generation called Harmonai, which was created by Zach Evans. Stable Audio works directly with raw audio samples for higher quality output. The model was trained on over 800,000 pieces of licensed music from audio library AudioSparks. [...]
As a diffusion model, Evans said that the Stable Audio model has approximately 1.2 billion parameters, which is roughly on par with the original release of Stable Diffusion for image generation. The text model used for prompts to generate audio was all built and trained by Stability AI. Evans explained that the text model is using a technique known as Contrastive Language Audio Pretraining (CLAP). As part of the Stable Audio launch, Stability AI is also releasing a prompt guide to help users with text prompts that will lead to the types of audio files that users want to generate. "Stable Audio will be available both for free and in a $12/month Pro plan," notes VentureBeat. "The free version allows 20 generations per month of up to 20 second tracks, while the Pro version increases this to 500 generations and 90 second tracks."
Newton-Rex is no stranger to the world of computer generated music, having built his own startup called Jukedeck in 2011, which he sold to TikTok in 2019. The technology behind Stable Audio however does not have its roots in Jukedeck, but rather in Stability AI's internal research studio for music generation called Harmonai, which was created by Zach Evans. Stable Audio works directly with raw audio samples for higher quality output. The model was trained on over 800,000 pieces of licensed music from audio library AudioSparks. [...]
As a diffusion model, Evans said that the Stable Audio model has approximately 1.2 billion parameters, which is roughly on par with the original release of Stable Diffusion for image generation. The text model used for prompts to generate audio was all built and trained by Stability AI. Evans explained that the text model is using a technique known as Contrastive Language Audio Pretraining (CLAP). As part of the Stable Audio launch, Stability AI is also releasing a prompt guide to help users with text prompts that will lead to the types of audio files that users want to generate. "Stable Audio will be available both for free and in a $12/month Pro plan," notes VentureBeat. "The free version allows 20 generations per month of up to 20 second tracks, while the Pro version increases this to 500 generations and 90 second tracks."
Wow! (Score:4, Informative)
How much did this ad cost, I wonder?
Re: (Score:3)
Yes that's what Slashdot is for, old technically-inclined men complaining together.
Re: (Score:1)
Its just a loading GIF (Score:2)
I don't think anyone in the music industry has too much to worry about just yet.
Re: (Score:3)
According to Stability AI, Stable Audio can render 95 seconds of stereo audio at a 44.1 kHz sample rate (often called "CD quality") in less than one second on an Nvidia A100 GPU. The A100 is a beefy data center GPU designed for AI use, and it's far more capable than a typical desktop gaming GPU..
Huggingface downloadables or stfu (Score:2)
I got spoiled by stable diffusion and was really hoping this would be about a model you could download and run on your own hardware, but alas, another ad for another cloudy service encumbered by the realities of licensed source material. Its the Netflix of AI. Paid by the piece and probably yanked tomorrow when enough source rightsholders freak out or get wind that they can make a few cents on a class action suit.
AI seems to be the nexus of linux wrenching spirit and pirate spirit in that it requires a cert