Somewhen ago, I wondered whether I could take a two dimensional rally stage route and analyse its “wiggliness” as a spectrogram (Thinks Another: Using Spectrograms to Identify Stage Wiggliness?).
Today, I note, via @charlesarthur, Riffusion, a Stable Diffusion based AI model that relates text to spectrograms: give it some text and it tries to generate a spectrogram (which is to say, a picture) associated with that text.
As a picture, a spectrogram is a scientific diagram that visualises a set of frequencies over time. And just as we can generate a spectrogam by processing a sound file, such as an sound file that results from a recorded piece of music, we can also generate a sound file from a spectrogram, at least in part (we’re lacking phase information).
Which is to say, we can use text to generate an image that can be mapped directly onto an audio file.
PS I wonder, can a similar approach also be used as a way of generating spoken texts in a partcular voice?
PPS In our final ‘Tis Tales storytelling performance of the year last night, I told a tale about Gambrinus, the “King of Beer”, a devil story which along the way included the invention of the carillon, a fantastic instrument involving bells/chimes, a keyboard and foot pedals. Along the way, I likened the music to mediaeval techno. And in the Riffusion post, I note that they included a generated example of an interpolation “from church bells to electronic beats“…
PPPS This site really creeps me out…: This Voice Does Not Exist.
Your blog is on fire, Tony. Don’t douse it.
The generating of sound from text is wild, not quite the same but I recall (or the blog does) the idea of bending images with Audacity- importing images, applying effects, and exporting anew
https://cogdogblog.com/2014/07/image-bending-in-audacity/
Thanks too for the This voice does not exist, that’s some fascinating work.
You don’t seem bored ;-)
That Audacity editing trick looks fun… I was struck some time ago how lots of machine recognition things were being recast as image recognition tasks by mapping different sorts of signal into image space. Just add synaesthesia…