Picturing Music

Somewhen ago, I wondered whether I could take a two dimensional rally stage route and analyse its “wiggliness” as a spectrogram (Thinks Another: Using Spectrograms to Identify Stage Wiggliness?).

Today, I note, via @charlesarthur, Riffusion, a Stable Diffusion based AI model that relates text to spectrograms: give it some text and it tries to generate a spectrogram (which is to say, a picture) associated with that text.

As a picture, a spectrogram is a scientific diagram that visualises a set of frequencies over time. And just as we can generate a spectrogam by processing a sound file, such as an sound file that results from a recorded piece of music, we can also generate a sound file from a spectrogram, at least in part (we’re lacking phase information).

Which is to say, we can use text to generate an image that can be mapped directly onto an audio file.

PS I wonder, can a similar approach also be used as a way of generating spoken texts in a partcular voice?

PPS In our final ‘Tis Tales storytelling performance of the year last night, I told a tale about Gambrinus, the “King of Beer”, a devil story which along the way included the invention of the carillon, a fantastic instrument involving bells/chimes, a keyboard and foot pedals. Along the way, I likened the music to mediaeval techno. And in the Riffusion post, I note that they included a generated example of an interpolation “from church bells to electronic beats“…

PPPS This site really creeps me out…: This Voice Does Not Exist.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

2 thoughts on “Picturing Music”

    1. That Audacity editing trick looks fun… I was struck some time ago how lots of machine recognition things were being recast as image recognition tasks by mapping different sorts of signal into image space. Just add synaesthesia…

Comments are closed.

%d bloggers like this: