Master of Loudness

Group: VoicePro
Posts: 694
Joined: 15-July 05
Member No.: 1,269
State / Province: IN
Country: USA
Sex: Male
Call Letters: Sweetwater Sound
Class: Other
Reputation: 450 pts
Thank this Member
Complain to Member 

|
Well, I finished early. This one is long and will take some time to absorb. I will glady accept questions, but please read the entire tutorial before asking. If you have a question, please be specific. If you just say "I don't get it", I can't help you. I wrote this to be as easy to understand as possible, but dynamics processing is a complex topic and some of it may be tough. Be patient. Please boost my rep if you find this useful and I would love to hear how it helped you. If you do have a specific question, you may PM me, but I would prefer that you post in this thread because others may have the same question.
I don't claim to know it all, but I spend a lot of time trying to learn it all. I hope that by sharing some of what I've learned so far, I can help you learn your craft a little better.
PLEASE DOWNLOAD THE ATTACHED IMAGE FILES AT THE BOTTOM OF THIS POST!
In this tutorial, I will explain some of the ways to use compression, expansion, limiting and gating for voiceover work. It's important to understand that I could write hundreds of pages on this topic?And there's still a lot I don't know. What that means is that there are virtually infinite possibilities with dynamics processing. There are no magic recipes or buttons to push that will instantly make you sound great. There's also no tutorial that will make you completely understand compression techniques. The best teacher is always experimentation. This tutorial will give you some basic starting ideas, but experimentation with your own equipment is essential.
SECTION I: WHAT IS DYNAMICS PROCESSING?
Imagine an engineer controlling the incoming level of your voice through the soundcard. When you're getting too loud, he'll turn you down. When you're too soft, he'll turn you up. When you're not speaking and there's too much room noise, he'll turn you completely off, but when you start speaking again, he'll be right there, without missing a beat. He can react to changes in your levels in a fraction of a second. You can tell him how fast to start adjusting your levels. You can tell him how quickly to make those adjustments. You can even tell him how much to adjust your voice. He will never make a mistake and will always do exactly as you tell him. That engineer's name: Dynamics Processing!
SECTION II: REQUIRED EQUIPMENT
There are literally hundreds of different compressors out there. Some are hardware units, some are software based. Some are better than others. Less and less, people are buying hardware. Why spend $600 for a hardware unit when you can spend $300 on software that will do a better job? Hardware has its advantages, such as level control before the soundcard and no processing being required after it's been recorded. But overall, I believe that hardware simply cannot compete with the versatility of software. Most editors include a dynamic processor of some sort. Programs like Audition, Wavelab, Sound Forge ProTools and other professional-grade editors include very good dynamics processors. Free programs like Audacity don't offer nearly as many options. But there is good news for users of free programs. There are MANY options with direct X and VST plug-ins. Some of these are absolutely top-grade and very expensive, like the WAVES C1. Others, however, are completely free and yet, very good. A quick Google search of "Free VST compressor" should turn up countless results. Choose what works best for you. In this tutorial, all examples will be based on the Adobe Audition 1.5 Dynamics Processing module. It is a "Graph" style processor, which is very common in software packages today. The ideas are exactly the same as traditional compressors, but the interface is very different. For someone familiar with traditional compression controls, the graph-style may be very intimidating. But, in my opinion, the graph makes a great deal of sense, once you understand what it's telling you.
SECTION III: BEHAVIORS OF COMPRESSORS
Several main types of compression exist. In addition to different functions, there are also different behaviors. Don't worry, I promise this will all make sense.
a) OPTO COMPRESSOR
One type of compressor is an opto compressor. This type of processor reacts slowly and isn't very commonly used. Opto compressors sound wonderful for music, as they allow a lot of "punch" in the recording. They, however, do not sound very good for spoken voice. Avoid opto compressors for voiceover work.
b) ELECTRO COMPRESSOR
Electro compressors are extremely common. So common, in fact, that unless otherwise specified, the compressor will have electro-type behaviors. This is the behavior that is most desirable for voiceover applications.
c) MULTI-BAND COMPRESSOR
Multi-band compressors are typically very versatile and very expensive. These are rarely seen in a single hardware unit (however engineers chain multiple units to create multi-band compressors). They can be opto or electro compressors. But multi-band compressors offer more than one or two frequency bands that can be compressed differently. This means you can add more compression to the lows in your voice and less to the highs, which will make you sound deeper. You can also control the gain of each band (usually four or five bands), which means you can EQ while you compress. If you can't afford the $400 for the WAVES C4 multi-band compressor, don't worry?Multi-band compressors do some cool things, but for VO, these are not really much more effective than single-band compressors.
SECTION IV: LEVELS
Before you can understand compression, you must have a working knowledge of amplitude and levels. Most people know that going into the red is bad and green is good, but not a lot of people really understand what's going on. Digital meters are not the same as analog VU meters, which most people are accustomed to. With analog VU, it's okay to exceed 0dB. With analog, going into the red a little bit is okay. With digital, exceeding 0dB is not an option. If you exceed 0dB with digital, you will clip and ruin the signal. The idea is to get as close to 0dB, without going over. The best way to do this is to leave enough headroom to work. Try to keep your level from exceeding -3dB when you record. After you record, normalize the file to about 98% (or -.2dB). Processing in 32-bit or 24-bit and converting later is highly recommended over 16-bit.
SECTION V: FUNCTIONS OF COMPRESSION
Compression (dynamics processing) can be used in a variety of different ways. The obvious is to make a constant overall level and bring up the quiet parts, but a good dynamics processor can do much more.
a) COMPRESSION
Compression is obviously the most common function of a compressor. Compression brings down higher levels that exceed a defined threshold, which makes a smoother overall sound. This also allows the entire file to become louder overall.
b) LIMITING
Limiting does exactly what it sounds like. It limits the peaks to a defined level. Most limiters are "brick wall" limiters, which means that sound cannot exceed the defined level. Where compression will allow some sound to exceed the threshold, limiting stops the audio completely at the threshold. This allows for peak control and overall loudness maximizing. Limiting is usually not desirable for dry voice work, except in VERY carefully chosen applications.
c) GATING
Rather than affecting the loud parts of the audio, gating focuses on the softer parts. Gating will cut out all audio below a defined threshold. True gating occurs very abruptly, similar to an on/off button, and is not desirable for voice recordings.
d) DOWNWARD EXPANSION
A downward expander is a type of gate. The idea is basically the same, however, a downward expander uses a much gentler slop, which means that rather than "switching" the sound on and off, the downward expander will slide the sound toward off and on. This makes a downward expander essential for voice work. The downward expander will gently remove fan noise, air noise and other subtle, quiet noise found between words. It is the most effective tool for "cleaning up" a voice recording and does not produce nasty artifacts as noise reduction does. If your processor offers a "gate" but not an "expander", the gate most likely will also be capable of downward expansion.
e) UPWARD EXPANSION
An upward expander is the opposite of a downward expander (did I figure that out all by myself?!?!). An upward expander will turn up the quietest parts of the recording. This can be very useful for adding "bite" or "edge" to a recording, but it will also make background noise much louder. When used correctly, this in an incredible tool for production, but is rarely useful for dry voice. Many processors simply offer an expander, without explanation. In those cases, it is generally understood to be a downward expander.
f) DE-ESSING
Compression can be used to minimize harsh "S" sounds, without EQ that may make the audio dull or lifeless. This is sometimes done through what is called a "sidechain". Essentially, a size chain separates a certain frequency range and compresses it differently than the rest of the signal. Not all software compressors offer a sidechain function, but most do allow you to define a frequency range to be compressed. Then you can use the compressor twice: Once for normal compression and once for de-essing. When de-essing, it's normal to compress between 4kHz and 8kHz. Finding the right frequency is tough, and different for each speaker.
SECTION VI: UNDERSTANDING THE GRAPH AND TRADITIONAL CONTROLS
The graphical interface varies from one compressor to another, but most have the same general idea. The graphical interface is tough to explain, so I will also use attached jpg picture files to help illustrate the idea. The good news is that once you understand the graph, you'll just "get it". And once you "get it", dynamics processing becomes much easier.
If you look at fig. 1A, you will see the Audition dynamics processor graph with "flat" setting. That means that there is absolutely no processing going on. Running sound through this processor will not alter the sound in any way.
In fig. 1B, I have pasted in some color to represent sound. You can see the sound peaking at -40dB. You must use both the horizontal and vertical scales. The horizontal scale represents the original signal, while the vertical scale represents the compressed signal. We'll get further into use of the scales later in this tutorial.
In fig. 1C, you can see the sound at -10dB. Am I making sense? The color simply represents what the sound would look like inside the compressor at that level. If you can understand what I'm demonstrating, understanding compression will be easy! Some high-end compressor plug-ins have actually turned the graph into a meter and will automatically color in the graph, much like a VU meter.
On fig. 1D, I have put a limiter on that same -10dB signal. I placed the limiter at -30dB. As you can see, the horizontal scale shows the original -10dB signal, while the vertical scale shows the -30dB limited signal. I have also illustrated the level that the audio would have reached without the limiter. You can see how far the signal has been limited.
That should give you a pretty good idea of how the graphical interface can be used to adjust some controls of the dynamics processor.
SECTION VII: ATTACK, RELEASE, AND MAKE-UP GAIN
Certain controls of the dynamics processor cannot be defined using the graphical interface, and numeric values must be used. The two most important of these are attack and release.
a) ATTACK
The attack is how fast the compression takes effect. Or perhaps a better way to look at it is the time it takes the "engineer" to react to level changes. This is defined in "ms". A medium attack will give a little "bite" to hard-consonants and beginnings of words. It will not give you a very smooth sound at all, but sometimes makes a nice effect. A slow attack will preserve the dynamics of the original, while giving a smoother overall sound. For voice, a fast attack is usually best. I use an attack of no more than 5ms, and sometimes as low as 0.5ms.
b) RELEASE
The release is how long the compressor "holds" before making the next move. Or how long it will take the "engineer" to begin riding the fader again. Release can have a very strong impact on the character of the sound. A slow release will make a very smooth sound, but won't allow for a very aggressive or punchy sound. A fast release will allow for louder overall volume, but may create a "pumping" sound or even become distorted. With voice, it's usually best to use a medium release. I use a release of 40ms to 100ms. This allows a loud, punchy sound, but doesn't distort or pump.
c) MAKE-UP GAIN
Many software dynamics processors off a "make-up gain" button or switch. When compressing a file, the file will, by nature, become quieter. The make-up gain option will automatically bring the overall level of the file back to where it was before compression occurred and perhaps louder. If your compressor doesn't have an automatic make-up button, it should have an "output gain" slider or knob. This allows for the same function, but it must be determined manually.
SECTION IX: APPLICATION ? EXPANDER/GATE
This section will cover the controls of an expander or gate. Since we should use expansion with voice, we will refer to this simply as "expansion" from here on out, with the understand that this may be achieved through some gate plug-ins. We are discussing primarily downward expansion, but we will touch on upward expansion. We will assume that I am referring to a downward expander, unless otherwise specified. When expansion and compression are two separate modules, the expander should always come first. You should also expand before any other processing. Expanding later can cause unwanted noise. The idea is to remove as much noise as possible, without making it sound like noise has been removed.
a) THRESHOLD
The threshold determines the level at which the expander begins to work. Since an expander works on the quietest part of the signal, an expander will kick in once a signal drops below the threshold. As long as the signal is louder than the threshold, the expander will not activate. Fig. 2 shows the parts of an expander, as they appear in the graph. For now, focus on the threshold. It is set at -50dB. As long as the signal stays above -50db, this expander will not operate. The steepness of the slope is determined by ratio, which will be covered shortly. This particular expander shows a 3:1 ratio. This will all make sense soon enough. For now, though, focus on the threshold.
Fig. 1A shows the simple graph of an expander with a ratio of 3:1 and, as you can see, a threshold of -50dB.
Fig 1B illustrates audio that is too loud to activate the expander. As you can see, the audio is peaking at -20dB, but with the threshold being much lower, the expander will not activate. Therefore, if we assume that the -20dB signal is voice, the voice will be unaltered.
Fig. 1C shows audio peaking at around -53dB. Imagine that this could be a breath. The expander has begun to activate, which means the level of the breath will be lowered, but not completely removed. By looking at the horizontal graph, you see that the original audio was near -53dB, but the expanded audio has been lowered to near -62dB. Basically, you're only lowering the volume of breath, without tampering with the voice.
Fig. 1D shows what might be background noise, CPU fan noise or other low-level hiss or hum. The level is roughly -73dB. This level is well below the threshold. So far below, in fact, that it will be completely removed by the expander. This is the best way to remove low-level noise. Notice that there is no bright red, which means there is no audio getting through.
You should now be able to better understand the basics of the threshold of an expander. Finding the right threshold takes experimentation and trial and error. This tutorial isn't a recipe book, but simple a guide to a good starting place and a better understanding of how to effectively experiment.
b) RATIO/SLOPE
The ratio determines how steep the slope of the expander will be. If the slope is too steep, the audio will sound very choppy. The ratio will help you determine the level between the point that the threshold activates the expander and the point that the audio completely disappears. With the right settings, the sound will be very natural. In my opinion, anything more than a 3:1 ratio would be too much for voice. I personally use a 2.3:1 ratio. We'll start with 3:1. 3:1 means that for every 1dB the level drops below the threshold, the expander will lower the level another 3dB. With a 10:1 ratio, a 1dB drop below the threshold would actually cause the level to drop 10dB, a 2dB drop below the threshold would cause the audio to drop 20dB. Lower ratios (which create gentler slopes) will sound the most natural.
Fig 4A illustrates a 3:1 expander, with a threshold of -50dB. The expander will begin to activate at -50dB and for each dB the actual signal drops below -50dB, the processed signal will drop 3dB.
Fig. 4B Shows a 10:1 ratio at a -50dB threshold. The original, unprocessed signal reaches only -53dB, with the ratio at 10:1, the expanded signal reaches only -83dB.
Fig. 4C Shows the same input signal, and the same threshold, but with a 2:1 ratio. You can see how much more of the signal is allowed to pass through this gentle slope.
Fig 4D is a 2.3:1 expander, which is what I use, though I use a different dynamics processor.
c) UPWARD EXPANSION
An upward expander works conversely to a downward expander or gate. The ratio is represented in negative values, such as -3:1. That means when a signal drops below the threshold by 1dB, the signal will actually be boosted 3dB. There's not a huge amount of use for upward expansion with voice over applications, but it can sometimes be used in production to add a little aggression to the sound.
Fig. 5A Shows what a simple upward expander looks like. This shows a -3:1 ratio and a threshold of -70dB.
Fig. 5B Shows the same settings with audio exceeding the threshold. The audio will not be affected, as long as audio stays above the threshold.
Fig. 5C Illustrates what happens to audio when the level drops below the threshold. The yellow portion shows where the audio has been boosted.
Fig. 5D Shows one creative way a graphical dynamics processor can be used. This is simply not possible with traditional compressors. You can see that mid frequencies will have the bite of an upward expander, while low frequencies (like room noise) will be reduced. These kinds of possibilities are endless. They only require creativity and experimentation.
You should now have a very good understanding of the uses of expansion, and how to make expansion work for you!
SECTION X: APPLICATION - COMPRESSOR
Compressors share the same controls as expanders, but the ideas behind those controls create different results. The idea behind compression is to bring the louder parts down closer to the amplitude of the quieter parts. This will make the overall volume lower, but it will also give you enough headroom to make the overall volume much higher than you could without any type of compression. This section will detail the functions of the compressor. See figure 6 to understand the different parts of the graph as they relate to control.
a) THRESHOLD
Much like an expander, the threshold on a compressor is the setting that defines the level at which the compressor will begin to activate. The amplitude of sound must reach levels louder than the threshold level for sound to be affected. For instance, with a threshold of -10dB, the original level must be louder than -10dB to have any effect on the audio, whatsoever.
Fig. 7A shows a simple compressor with the threshold set at -30dB. By now, you can probably guess how the audio will react to this setting, but I'll go in-depth anyway.
Fig. 7B illustrates audio peaking at -35dB. You can see that the incoming signal is not affected, as it has not reached the threshold.
Fig. 7C shows that the original signal is peaking at -25dB. You can also see where the signal would have gone, without the compressor.
Fig. 7D shows the input signal going as far as -10dB. Again, you can see how high the signal could have gone without compression.
a) RATIO
The ratio of a compressor is based on the same logic as the ratio of an expander. A ratio of 8:1 means that for every 8dB the signal exceeds the threshold, the signal will only actually be allowed to raise 1dB. Or a ratio of 4:1 means that the signal must get 4dB louder than the threshold for the compression to allow 1dB of increase in level. Or with a 2:1 ratio and a signal passing the threshold by 8dB, the level will only raise by 4dB. One final example. Audio is peaking at -2dB. You set the threshold at -15dB and you set the ratio at 3:1. That will bring the peak down to -10.6dB. Personally, I use a rather aggressive compressor. I use a ratio of 4.57:1 and a threshold of -22.6dB. I don't recommend setting quite so aggressive for most people. Also, most ad agencies will not accept settings so aggressive. If I am sending to an ad agency, I back the threshold off to -12dB and the compression ratio is taken down to 2.5:1. These are closer to the settings that will likely work best for you. Experiment!
Fig. 8A shows a compressor with a ratio of 8:1 and a threshold of -30dB.
Fig. 8B shows that same compressor, but with an audio peak at -15dB. You can see exactly how that reduces the gain. Remember to look at the horizontal scale to see the input audio level and the vertical scale to see the compressed audio level.
Fig. 8C illustrates a compressor with the same threshold level, but this shows a much stronger ratio of 20:1. Notice how much the level (vertical scale) is reduced.
Fig. 8D shows a 3:1 ratio at -15dB. This is an example of one setting that might work well for voiceover. As you can see, the input sound is peaking at -2dB. You can also see that the audio isn't going to be "squashed" much, which is the idea. The more natural sounding, the better.
SECTION XI: APPLICATION ? LIMITER
A limiter comes in two main flavors: Hard and soft. A hard-limiter is a brickwall limiter. A hard limiter will stop the audio at the threshold. Simply put, if you set the threshold at -15dB, audio CANNOT exceed -15dB. A brickwall limiter is a true limiter. There is no ratio to adjust, only threshold. For voice work, I only use limiting to control really bad peaks. There are other types of limiters, such as a soft-limiter or a soft-knee limiter. There is even one called an "intelligent" limiter. The intelligent limiter effect cannot be duplicated with most compressor, software or hardware. Since intelligent limiters are so rare, I'm not even going to attempt to go into specifics. I'm also going to avoid multi-band limiters, as they have little or no use for voice processing (but they can be very valuable in mastering situations and other music applications). Back to soft limiters. Soft limiting can really be done in two ways. In Audition, you can create a hard limiter and simply check the "splines" box. This will smooth out the line in the graph and round all of the edges. It will allow more sound to pass the threshold than a hard limiter, but will still hold the audio near that level. The result is a much smoother sound. The other way to create a soft limiter is by simply adding another compressor on top of your first compressor. For instance, assume your original compressor had a threshold of -20dB and a ratio of 3:1. You could add another compressor with a threshold of -6dB and a ratio of 10:1. This would simulate a soft limiter. It means that when the audio started to get really loud, harder compression would kick in.
Fig. 9A is a standard brickwall limiter with a ratio of -15dB. Audio cannot exceed that threshold.
Fig. 9B shows the same threshold, but this time with the "splines" option activated. Not all programs offer splines, but Audition certainly does. This is one way to create a soft limiter effect.
Fig. 9C shows the other option for creating a soft limiter. This is actually my personal choice for soft limiting. This shows a compressor with a threshold of -30dB and a ratio of 3:1, followed by a soft limiter (second compressor) with a threshold of -10dB and a ratio of 10:1.
Fig. 9D has the same settings as fig. 9C, but this time I've added in color to show the audio which is peaking at 0dB. As in previous pictures, the darker red color represents where the audio would have gone, if not for the compressor and soft-limiter.
Limiting is rarely needed with voice work, so use it sparingly, if it all.
SECTION XII: APPLICATION ? DE-ESSER
A de-esser can be used to reduce harsh sibilance ("S" sounds) in recorded material. De-essers come in many forms, but the science is always the same. De-essers always compress a small frequency range. Sibilance usually sit around 5kHz for men and 8kHz for women. If you decide to attempt de-essing, experimentation with frequencies is going to be the only way to find what works best for you or the particular speaker you're working with. Some compressors have a built in "sidechain" which allows you to perform standard compression on an audio signal, while separating the frequency range for de-essing. The sidechain allows you to compress those frequencies differently than the standard compression you are using to process the rest of the file.
Some compressors (like the one found in Audition) will not allow simultaneous standard and sidechain operations. However, you can achieve the same effect by either chain two compressors together, or simply processing the file twice. Audition allows you to specify the frequencies that you wish to process.
Also, many free Direct X and VST plug-ins are available that will de-ess. In fact, there are many free plug-ins specifically for de-essing. Like all things, some are better than others.
I won't go into great detail about how to use a de-esser, but I will give you a few de-essing tips. De-ess last. De-essing should be done after all compression and EQ. Second, don't de-ess too much. You don't want to remove sibilance, you just want to reduce them. Third, don't substitute de-essing for EQ and don't substitute EQ for de-essing. These are very different processes. Try both and figure out which works best for you!
A note about sidechains: You can also use a sidechain compressor in low frequencies to reduce plosives (popped "P"s). There are countless other creative ways to use this function. Once again, experimentation is the key!
SECTION XIII: WAVEFORM
So you've read all this way, but if you haven't been experimenting all along, you probably wonder what the waveform looks like. I'm using exaggerated settings to show the effects of dynamics processing. I have recorded a file of me slowly counting to five. I did this next to a fan, so there would be a lot of background noise.
Fig. 10A shows the original file. You can easily see background noise. You can also easily see that numbers four and five appear louder than the other three.
Fig. 10B illustrates what has happened after I performed expansion and compression on the file. You can see that the background noise is completely gone. You can also see that the levels are much more even and look "thicker".
Fig. 10C shows the processed file after being hard limited. You can see how the tops of the waves are very flat. That was where the threshold was set.
SECTION XIV: CLOSING
By now you should have a pretty good understanding of dynamics processing as it relates to voiceover. You should have a good grasp of how to use the graphical interface commonly found in audio software. You should also have some understanding of compression, in general. Presets are a good place to start, but there is no substitute for actually understanding the controls. Hopefully, you've now got enough information to get you closer to "that sound" that you're looking for. Remember, the key is experimentation! Try new things. Take the time to learn. And above all, have fun!
|