Somewhen ago, I wondered whether I could take a two dimensional rally stage route and analyse its “wiggliness” as a spectrogram (Thinks Another: Using Spectrograms to Identify Stage Wiggliness?).
Today, I note, via @charlesarthur, Riffusion, a Stable Diffusion based AI model that relates text to spectrograms: give it some text and it tries to generate a spectrogram (which is to say, a picture) associated with that text.
As a picture, a spectrogram is a scientific diagram that visualises a set of frequencies over time. And just as we can generate a spectrogam by processing a sound file, such as an sound file that results from a recorded piece of music, we can also generate a sound file from a spectrogram, at least in part (we’re lacking phase information).
Which is to say, we can use text to generate an image that can be mapped directly onto an audio file.
PS I wonder, can a similar approach also be used as a way of generating spoken texts in a partcular voice?
PPS In our final ‘Tis Tales storytelling performance of the year last night, I told a tale about Gambrinus, the “King of Beer”, a devil story which along the way included the invention of the carillon, a fantastic instrument involving bells/chimes, a keyboard and foot pedals. Along the way, I likened the music to mediaeval techno. And in the Riffusion post, I note that they included a generated example of an interpolation “from church bells to electronic beats“…
PPPS This site really creeps me out…: This Voice Does Not Exist.
2 thoughts on “Picturing Music”
Your blog is on fire, Tony. Don’t douse it.
The generating of sound from text is wild, not quite the same but I recall (or the blog does) the idea of bending images with Audacity- importing images, applying effects, and exporting anew
Thanks too for the This voice does not exist, that’s some fascinating work.
You don’t seem bored ;-)
That Audacity editing trick looks fun… I was struck some time ago how lots of machine recognition things were being recast as image recognition tasks by mapping different sorts of signal into image space. Just add synaesthesia…
Comments are closed.