Somewhen ago, I wondered whether I could take a two dimensional rally stage route and analyse its “wiggliness” as a spectrogram (Thinks Another: Using Spectrograms to Identify Stage Wiggliness?).
Today, I note, via @charlesarthur, Riffusion, a Stable Diffusion based AI model that relates text to spectrograms: give it some text and it tries to generate a spectrogram (which is to say, a picture) associated with that text.
As a picture, a spectrogram is a scientific diagram that visualises a set of frequencies over time. And just as we can generate a spectrogam by processing a sound file, such as an sound file that results from a recorded piece of music, we can also generate a sound file from a spectrogram, at least in part (we’re lacking phase information).
Which is to say, we can use text to generate an image that can be mapped directly onto an audio file.
PS I wonder, can a similar approach also be used as a way of generating spoken texts in a partcular voice?
PPS In our final ‘Tis Tales storytelling performance of the year last night, I told a tale about Gambrinus, the “King of Beer”, a devil story which along the way included the invention of the carillon, a fantastic instrument involving bells/chimes, a keyboard and foot pedals. Along the way, I likened the music to mediaeval techno. And in the Riffusion post, I note that they included a generated example of an interpolation “from church bells to electronic beats“…
PPPS This site really creeps me out…: This Voice Does Not Exist.