“Bink” – Composed, arranged, and all parts played by me and then sequenced and recorded (to cassette!) via gfmusic sequencer 1987 (gfmusic sequencer conceived, and developed by me using Borland Turbo Pascal 1987 on an early IBM PC clone with 5 MB hard disk! Before Windows). Many years later the cassette was played into a Windows PC and recorded/converted to MP3 using Audacity. And now it’s here. “A little travelin’ music Sammy!”
This article was written by me for Google Knol back in 2008 as a part of the Gloplug Music Visualization program. Gloplug was/is a music visualization plugin for Microsoft’s Media Player. In essence, it was (and is) a sophisticated “Light Organ” (aka “Color Organ”) running on the PC under Windows XP/Vista/7/8. Unlike other Music Visualization programs, Gloplug actually did/does react to frequencies and not just to the “beat” or amplitude of the overall audio data. That said, this article discusses some of the techniques used by Gloplug for representing music in a visual/graphic manner. The information presented here is based on the model and source code used by the Gloplug Visualization Plugin for Windows Media Player. However, some of these techniques (such as “Falloff”) may also used in other programs and all of these techniques could be applied to graphically representing Music in general.
Here are the topics covered:
- Compression of Audio/Music Data
- Ignoring and Emphasizing Audio Data
- Ignoring “Zero Values”
- High-Frequency Compensation
- Order Of Algorithm Application
- Double Buffering Graphics – Preventing “Flicker”
- Assign Frequency Ranges – Gaps and Preventing “Bleeding”
- Not Every Song Is A Visualization Winner
- “Adjusting the Gain” For Each Song and Within the Song
Compression of Audio/Music Data
At its most fundamental level, graphically representing music is based on compressing the audio spectrum and “loudness” of the music into just a few visual cues that the eyes/brain will correlate to what we “hear.”
Without “compression” we would simply draw a line (or circle or arc…) for each “cycle” in the audio spectrum. This would simply be too much data to perceive as “aesthetically pleasing.” We would tire of it quickly. On the other hand, merely summing or averaging the audio data into just one visual representation would be too simplistic to be engaging. Somewhere in between is what will be “pleasing.” That is, “pleasing” is the result of a relatively few visual cues which are (somehow) synchronized to the music we hear.
In the case of Gloplug, as with the “classic light organ,” the audio spectrum is compressed down into just 3 colors and the amplitude (loudness) is represented by the “size” of lines, circles, etc., AND/OR the “brightness” of the color. In this “3 color model” the default, in the case of Gloplug, is red for the Bass, green for the Mid-rang, and blue for the “Highs” (but the user can change the colors or they can have the program periodically vary the colors). But the point is that the audio spectrum is compressed down into 3 visual color cues. Other programs could use more colors (or even a visual cue different from color!). The “loudness” is represented by a different visual cue. I.e. the “size” of the graphic being displayed such as the length of a line or the radius of a circle. In addition, “brightness” is sometimes used as an additional visual cue for “loudness.” Here is a very simple example… A combination of loud Bass notes in a song might be shown as a large (and bright) red circle. Quieter Bass passages would appear as a smaller red circle that is not so bright.
Ignoring and Emphasizing Audio Data
First let’s discuss the “Ignoring” (filtering) of audio data. At first it may seem that ignoring audio data in the “visualization” process would lead to a less “realistic” visualization of the music. However, the truth is just the opposite. Here is a simple example… When we first try to graphically represent the “Bass Range” we soon discover that the audio data (e.g. from Windows Media Player or Winamp) contains a significant amount of amplitude (loudness) throughout the “Bass frequency range” that we simply do not hear! This is just a fact of life that we may not realize until we actually analyze the audio data. If we do not filter this “background amplitude” then the graphical representation of the “Bass” will be bigger than the ears perceive! Or, perceived “silence” would still show up on the screen! This is just a fact of life. This same effect also exists to a lesser degree in the Mid-Range and to an even lesser degree in the Highs. But the effect is still there throughout much of the audio spectrum. It is imperative to “throw away” this “background noise” in the data in order for the graphics to “realistically” represent what is actually “heard” as opposed to what exists in the raw audio data! There are techniques for mitigating this problem and they are discussed below (e.g. the use of “Thresholds”). Again, some audio data needs to be “thrown away” (filtered) to prevent it from skewing the visual representation of what we actually “hear.”
In addition to “throwing away” some audio data, it is imperative to dramatically emphasize other audio data. In some sense this might be considered as “creating” audio data. Emphasizing some audio data is used to prevent the visual “loss” of music that we distinctly “hear.” In fact, most of the “data massaging” algorithms used exist just to emphasize some of the raw audio data so that the graphical rendering of the music more closely matches what we actually “hear.” For a good example see “Ratios” below.
Next is a discussion of specific techniques used to manipulate audio data so that the visualization more closely matches what we “hear.”
A threshold will assign a zero value to audio data less than a specified level. In the case of Gloplug, there is a separate threshold for each of the 3 audio “ranges” (Bass, Mid-range, and Highs). These thresholds have default values but are able to be changed by the user. Ideally we would have a threshold for each data point in the spectrum but experience has shown that this is not practical and is unnecessary.
Ignoring “Zero Values”
This concept/technique really applies mainly to the Highs since it is rarely seen in the Bass and Mid-range portion of the music spectrum. Here is how it works… If data points supplied by the Media Player in the “Highs range” have a value of zero, then those data points should NOT be included in any “averaging” for the purpose of graphically rendering the “Highs.” Why? Consider this example… Suppose that a piece of music contains a flute playing a single loud note around 5000 Hz. If we do not ignore zero values in the rest of the surrounding High range, then the “Note” that our brain clearly “hears” will be visually “shown” as relatively insignificant! This is because the process of compressing the huge “High” range down into a single graphic will average this clearly heard “Note” amongst all of the “empty” portions of the High range. The result will be a very small “graphical” representation of what we clearly “hear” as “significant.” Ignoring zero values (after applying thresholds) is an important and valuable technique in graphically rendering what our “ears” focus on as opposed to what is the audio data contains!
It is helpful to emphasize portions of the audio spectrum by applying ratios. For example, we clearly hear the single loud flute note but it may be surrounded by softer music that passed thru the threshold mechanism. If this single note is many times louder than the average of all the other data surrounding it then we can further emphasize this particular “Note” by multiplying its loudness by some additional factor. This will graphically emphasize music that we clearly hear as “loud” but which would otherwise be de-emphasized because of the sheer volume of other data in the range (Bass, Mid-range, or Highs ranges).
The physical “cycle-span” of what we perceive as the “High Range” for music is HUGE compared to the span of the Bass and Mid-range. For example, the Bass Range might be considered as 0-260 Hz while the “Highs Range” is something like 4,000-16,000 Hz. And in general, we will clearly hear “Notes” in the High range even though their “power” (amplitude) in the raw data from the Media Player is relatively small as compared to “Notes” in the Bass and Mid-ranges. I.e. a relatively small amount of “power” in the Highs results in a relatively large amount of “hearing” (I don’t know how else to describe it). Because of this, it is helpful to apply a “compensation factor” that also uses logarithms and ratios to compensate for this phenomenon. This is another technique used by Gloplug and the user can, if they wish, change the factor.
Application of logarithms to discrete audio data points, and even to averages of data for the Bass, Mid-range, and High ranges, is also an important technique. Use of logarithms helps “massage” the audio data into a more realistic graphical representation. It is widely used in (all?) music visualization programs. However, the application of logarithms alone will NOT result in a pleasing visualization of the music. The quintessential example of why logarithms alone are not enough is discussed above under “Ratios.”
Order Of Algorithm Application
The above techniques can NOT be applied in a haphazard way. The order in which the algorithms are applied to the data is very important! To a large degree, the order is determined by the “Edison Method.” I.e. by “educated” trial and error. In fact, to a large degree, the specific implementation of each technique is also arrived at by some amount of trial and error. During software development, implementation of each algorithm can have unexpected “feedback” on other algorithms being applied to the audio data. Sometimes this is good but usually it results in changing the order of the algorithms and tuning the algorithms to account for the effects of the others. And sometimes it results in abandoning a technique altogether.
Falloff is a common (universal?) technique used in music visualization. In a nutshell, the falloff parameter prevents the size of the graphic from “receding” too rapidly as the “notes” go from loud to soft. Use of falloff allows the graphic to appear to recede (get smaller) smoothly. For example, let’s say we have a loud Bass note, followed by silence, followed by another Bass note of the same amplitude. With falloff, the graphic (e.g. “line”) will recede smoothly and pleasingly toward zero during the “silence” instead of going from a long line immediately to a length of zero. Without this “smoothing” of the transition from long (loud) to short (zero or much softer) the graphic display would appear to “flicker” or be “jittery.” Falloff is important, and a “pleasing” falloff speed will vary depending on the overall potential size of the graphic being displayed. For example, a graphic with a maximum length of X will have a smaller falloff factor than a graphic with a maximum length of 2X. The falloff parameter ultimately boils down to how fast the graphic gets smaller in terms of pixels per second. In the case of Gloplug and other visualization programs, falloff is adjustable by the user (though the manufacturer generally picks a pretty good value as a default).
Double Buffering Graphics – Preventing “Flicker”
Just a quick mention of “Double Buffering.” This is a technique where the program draws all of the graphics in an “off screen” buffer. Then, when all of the drawing is completed, the entire “off screen buffer” is copied in one fell swoop to the memory actually used for the display. This prevents an annoying flickering effect on the screen and the technique is common knowledge among (most?) programmers. Luckily, modern software development environments include this feature (unlike the old days where we had to code it!). All we have to do now is assign a value of TRUE to the doubleBuffered variable and it’s taken care of for us!
Although it’s only one line of code, it must be done to prevent the screen from flickering while displaying the visualization. I was reluctant to include this topic, but then again, not everyone reading this will be a graphics programmer.
Assigning Frequency Ranges – Gaps and Preventing “Bleeding”
With Gloplug, as with the classic light organ, the audio range is broken up into 3 frequency ranges for Bass, Mid-range, and Highs. For Gloplug the factory setting for the Bass range is 0-200 Hz (but the user can change it). And the default setting for the Mid-range is 220-4000 HZ. Note that there is a 20 Hz gap! And there is a 200 Hz gap between the settings for Mid and Highs frequency ranges! We are again throwing away audio data.
This gaps help prevent “bleeding.” An example of bleeding is a lower pitched a capella vocal at the boundary of both the Bass and Mid ranges. Without the small gap between the Bass and Mid ranges, then both the Bass and Mid-range graphic display would be triggered. Experience has shown that this is annoying! The relatively small gaps between the 3 ranges (Bass, Mid, and High) greatly mitigates this distraction.
Experience has also shown that without the gaps the distracting graphical “bleeding” would occur quite often! If you have a mind to, with Gloplug you can adjust the frequency ranges for the Bass, Mid, and High ranges. In this way you can even experiment with wider gaps, no gaps, or even have the ranges overlap in order to see the effect and benefits of having such gaps.
Use of gaps between the Bass, Mid, and High frequency ranges is yet another example of “throwing away” a small amount of audio data in the interest of an aesthetically pleasing visualization. And it’s a technique that is not intuitively obvious.
Not Every Song Is A Visualization Winner
The title says it all. Some songs lend themselves to visualizations and some do not. In general, songs that do NOT respond well to visualizations are ones that tend to be “frequency dense” and/or always loud throughout the spectrum. These songs may be pleasing to the ear and our ear may easily pick out the distinct pieces of the music amid the mountains of surrounding sound. However, for these songs the visualization programs are often forced into “all graphics maxed out all the time!” On the other hand, some songs are excellent when it comes to visualization. An example is Linda Rondstadt’s “Blue Bayou.”
A good technique is to use your Media Player playlist feature. When you listen to a song that also “looks good” then you can easily add that song to a playlist you create named “GoodVisualizationSongs.”
“Adjusting the Gain” For Each Song and Within the Song
One problem that immediately surfaces when developing a visualization program is that some songs are “loud” and some are “soft.” If we “adjust the gain” so to speak, for the loud song, then the graphical displays for the “soft” songs will be too small. Likewise, if we adjust the display for the soft song then the graphics for the loud song will be “maxed out” all of the time. The solution is to “reset” the “gain” appropriately for each song. This is possible because the Media Players will tell the visualization program when the song changes. Also, as there are transitions between loud and soft passages within a song, it may become necessary to again adjust (but not reset!) the “gain.”
Dynamically adjusting the “gain” between songs and also within a song is a very important part of presenting visualizations that are pleasing and which appear to synchronize well with the music.
Hopefully, after reading the above, you have a better understanding of what is involved with developing visualizations that synchronize with the music in terms of frequency and loudness. As with many things in life, the devil is in the details.
****************** end of post ******************