Tuesday, April 23, 2013

Image to Spectrogram

Convert an image to a sound clip.  The spectrogram of the clip reproduces that image.


This project will take a digital picture and convert it into a wave file. Creating a spectrogram of that file then reproduces that picture.


Treat the image as a matrix of pixels with r,g,b values. Go through 1 column of the matrix for a given time interval and use the following formula to get a suitable frequency:

max_frequency - (y/picture_height)*max_frequency

We can get some coloration by changing the decibels for each frequency based on the pixel RGB values. Decibel level is division of an exponent of 10 following the form:

decibalN = 10 ^ ((r + g + b)/ max_sum_rgb )

All frequencies for a column are merged using:

(sin(2*pi + sample1)/decibel1 + sin(2*pi + sample2)/decibel2 ...
   + sin(2*pi + sampleN)/decibelN) / N



Note that color is just treated as a grayscale, since there is only 1 dimension to decibel level. So it really is treating it like this:




Edit: Following the suggestion from a comment on the old site the picture was generated with the following command: sox fract.wav -n spectrogram -x 4 -Z -60 -z 60

Downloads and Execution


The code is a single Perl script. Will handle png, jpg, or gif files, and requires the extension is present in the name. Simplest way to run it is with ./imageSpectrogram.pl image_filename.
There are 2 Perl CPAN dependencies needed:
  • cpan Audio::Wav
  • cpan GD
Also to get a spectrogram the incredibly versatile SoX tool works well. Just run:
sox filename.wav -n spectrogram
To produce a spectrogram.png image.


I originally tried just doing a bash script with SoX calls, but figured out that it doesn't handle merging 100+ frequencies. But after I found this page showing how to add a frequency in Perl, it was pretty simple. Overall it works as expected, but is horribly slow on larger images with lots of color.

No comments: