Misc: Choosing a sample rate and bit depth for recording
korg_m3.jpg

Scenario

You are setting up your keyboard (or any instrument, really) for use with an external sequencer program (Steinberg Cubase, Apple Logic, MOTU Digital Performer, Ableton Live, etc.), and you are unsure which sampling frequency and bit rate to use when recording your keyboard.

Disclaimer

A lot of this article is opinion-based. Even the "experts" on the subject have differing opinions about the subject, and there is confusion and argument at the expert level. The original author of this article has been involved in audio engineering and production at a semi-pro level for 12+ years and has experienced first hand the move from a purely analog realm into the various iterations of the digital realm so far. Much of what follows is an attempt to accurately portray the contrasting opinions despite personal bias and preferences. but hey, I'm not perfect and you might well disagree with some of what I'm about to assert.

Closely-related articles

The executive summary

If you want to avoid reading a lot of details, the short story is this:

  • If you're scoring for videos of any kind, you want to record your material at 48000 and 24 bits.
  • If you're producing music, you probably want to record your material at 44100 and 24 bits if you plan to use any commercially-produced sample/loop libraries in your projects.
  • If you're producing music that is entirely and completely composed of your own recorded material, you probably want to record your material at 48000 and 24 bits, even if your ultimate destination is a CD (for which you'll need to convert to 44100 and 16 bits).
  • You would be well-advised to acquire some good audio editing software such as SoundForge or WaveLab, because at some point in any of the preceding three processes you're probably going to need to do some quality sample-rate conversion of audio files between 48000 and 44100, and you'll also need to dither from 24 bits to 16 bits for some outputs such as files to be burned on a CD or files to be loaded into the M3 for use as sampling data. No DAW/Sequencer software out there does nearly as good a job at sample rate conversion or dithering.

A special note about producing songs

Let's face it, CDs are already a dead format. Downloadable file formats are pretty much the target format of choice unless you're a slave to a major recording label, most of which are still futilely entrenched in the paradigm of physical CDs that they can sell.

There's a reason why the iTunes Store is the largest music seller in the world and the record labels are scared to death of the Internet. Everybody prefers those nice compact song files that can be stored and backed up safely and played on their portable music players, etc. And most people cannot hear any difference in audio quality between an MP3/AAC file versus its better-sounding CD or DVD counterpart. Only engineers and audiophiles can hear the difference, and they're not your mass market.

The only reason I buy a CD any more is if I cannot find the music on the iTunes Store or some other legitimate download site such as an indy band's MySpace page. (Because I've been a working musician and refuse to steal the hard work of others even if the RIAA and major labels are a bunch of idiots with their heads up their butts and killing the industry with their stupid DRM and broadcast licensing restrictions killing great music discovery services like Pandora. end rant =))

If you agree with the preceding viewpoint, this definitively suggests working in 44100 for song production of any type if your ultimate destination is primarily downloadable song files such as MP3s, despite the fact that you can render these formats at 48000. Why? Several reasons:

  • First and foremost, all of the popular downloadable formats (namely MP3 and AAC) are encoded with lossy compression, so all of that subtle "air" and preservation of the shape of high frequencies afforded by recording and working in 48000 is going to be squashed like a bug the minute you encode your beautiful mix into a downloadable format.
  • Encoding from a 48000 waveform will create a larger downloadable file than encoding from a 44100 file. This means more hard drive space and longer download times.
  • It's worth noting that a downloadable file encoded from a 48000 24-bit source file will still sound noticeably worse than a CD-based version of that same song that was downsampled to 44100 and dithered to 16 bits. And to most people it won't sound any noticeably different than if the source file was 44100 and 24-bit before you encoded it.

The bottom line is that most people cannot perceive a difference between really good DVD audio, pretty good CD audio, and crappy MP3/AAC audio. To most people, it all sounds the same. You must generally be young, have undamaged hearing, and have very good critical listening ability to hear any difference at all. The only time your general audience starts hearing a difference is when they're listening to the highly compressed frequency ranges of analog radio stations or Internet-based streaming radio stations.

Choosing a bit depth

Bit depth is the easy part. In almost all cases, you want to record at 24 bits if your audio interface supports that bit depth. Yes, your recorded 24-bit samples will be larger and require more hard disk space, but the trade off in headroom is more than worth the larger resulting file sizes.

The heart of the matter is the huge difference in dynamic range, or "detail", between a 16-bit waveform and a 24-bit waveform. Both bit-depths are equally loud at 0dB, but the 24-bit waveform has many more available dB before you hit the noise floor. When recording at 16-bits, you have only a 96dB dynamic range between your softest sounds and your loudest sounds, whereas a 24-bit file has a 144dB dynamic range.

Having more dynamic range in each of your recorded audio tracks translates to the ability to mixdown your tracks at lower volumes, so that your Master track never exceeds -6dB. This ensures that you will never end up with digital clipping (which is very harsh) in the final mix, and it gives mastering engineers a lot more wiggle room to get a great master out of your mixdown. You don't need to worry about the fact that your mixdown is not as loud as your reference mixes at this point, because a final simple step (usually in a different audio editing program such as Soundforge or Wavelab) is to normalize your mixdown to -0.2dB or thereabouts, which will then make your song roughly as "loud" as the current crap that's being produced by the commercial recording industry. (See the excellent Wikipedia article on the Loudness War for an interesting read.)

If you attempted to mixdown 16-bit tracks in this manner, you would lose too much "detail" in each track, with a resulting dynamic range of only 90dB. With 24-bit material, however, you end up with a resulting dynamic range of 138dB, which is plenty.

Another big reason to record at 24-bits is that it will reduce the artifacts (distortion) you get from running your tracks (and the Master) through digital effects chains. Every effect requires crunching the numbers for every waveform, and the more "detail" in the numbers, the fewer mathematical rounding errors you'll end up with when all the number crunching is done. Most sequencers and other audio-processing software actually converts your original waveform to 32-bits internally before performing these calculations, and then converts the result back to your original bit depth. A 24-bit source for this conversion yields 3 bits of detail for every 4 bits in the converted 32-bit waveform, but a 16-bit conversion yields only 2 bits of detail for every 4 bits in the converted 32-bit waveform. This means the number-crunching of your effects is working on an more accurate picture of your data.

With that last bit about effects processing in mind, you might be tempted to record at 32-bits if your audio interface (and sequencing software) supports it. But in general this trade-off (between 24-bits and 32-bits) is not worth it. 32-bit files are really really huge compared to 24-bit files.

Now, having said all this, it's worth noting that quite a few professional and well-known producers still record and mix in 16-bit. Old habits die hard and it hasn't been that long since 16-bit was the standard best supported by most audio-interfaces, tape-based digital recorders, and DAW software.

It's also worth noting that some well-known producers themselves confuse the importance of bit-depth versus sample rate. William Orbit recently began a blog to help promote his upcoming album. In his own blog, he talks about why he recommends using a 48000 sampling rate, saying that it gives effects processors more information to work with. Which is wrong; it's only bit depth that affects this.

Choosing a sampling rate

This decision is much harder. Your two practical choices are 44100 or 48000. Anything less is sonically inferior, and anything more is overkill. Because of something called the Nyquist frequency, any higher sampling rate is just increasing file size to capture frequencies that are far beyond the range of human hearing.

Some experts argue that that even 48000 is theoretically too much, because it will accurately capture frequencies up to 24000 Hz, but humans are typically capped at 20000 Hz in the best of cases. But it's pretty much been proven that many humans can hear a difference between signals recorded at 48000 and 44100. The theory is that when a waveform goes through analog > digital > analog conversion in the process from recording to playback through your speakers, the shape of those frequencies above 20000 Hz exerts an effect on the shape of the frequencies from 16 Khz to 20 Khz. This is what is generally called the "air" in a recorded signal.

So given all that, 48000 is the logical choice, right? Unfortunately it's not that simple. There are still some other important factors to consider that can outweigh this subtle sonic superiority. It's also worth noting that many people can hear no difference at all between a 44100 waveform and a 48000 waveform. For that matter, many people can hear no difference at all between an uncompressed CD-quality or DVD-quality audio file and a compressed, lossy file format like an MP3. Especially if you're older or been a working musician for a while, your hearing is to a large degree shot to hell anyway, lol. Even if you've taken rigorous care of your hearing and stayed away from loud sounds all your life, simple aging degrades your ability to hear higher frequencies. Teenagers and juveniles can hear frequencies that even 25-year adults simply cannot. There are also genetic differences between the races of man. Asians, for example, tend to hear or prefer certain frequencies differently than Western caucasians. A good house mix in an Asian club can tend to sound "harsh" or "fatiguing" to a westerner. There's a good reason that traditional Asian music forms tend to emphasize a lot of bright, jarring cymbal sounds, and it's not just to scare away evil spirits.

What tends to matter a lot more than the better-sounding "air" around vocals, acoustic stringed instruments, etc. that a 48000 sampling rate can yield are three very important factors:

  • Whether the ultimate destination for your music is song files/CDs, or videos/DVDs.
  • Whether you also work with sample libraries and loop libraries, and incorporate these into your projects along with the stuff you've recorded from your keyboard and other instruments).
  • Whether you plan to move clips from your sequencer into your keyboard for use as sampling data for your keyboard programs.

What is the ultimate destination for your music?

If you are scoring music for use in video formats, your choice is clear: 48000. Many professional video editing suites will not work very well with 44100 audio files. The standard audio format for DVDs is 48000.

You can record and mix at 48000 even if your final destination is music CDs and MP3s or other compressed downloadable music file formats. You won't lose anything valuable at all when dithering down your 48000 mix to a 41000 output file, as long as you're using a solid audio editing program to do so (such as SoundForge or WaveLab). Or if you let your mastering house do the final dithering.

So in terms of destination format, it's really a matter of being forced into 48000 if your destination is video-based products. You can safely record at either 48000 or 44100 if your destination is music-based products of one sort or another, provided you have access to a good audio editing program for the final stage of dithering your final stereo mix.

Note that until very recently, Adobe Flash did not support 48000 audio files. They came out sounding "like chimpmunks". This situation was rectified in 2007 by Adobe, and while they now convert internally to 44100 in the finished Flash output, their conversion algorithms are excellent and well-tuned, especially if your source format is 48000. Unfortunately, there is still a lot of outdated advice floating around on the web that warns you away from 48000 if your destination is Flash video.

What is the sampling rate of your sample/loop libraries?

If you work purely with your own recorded material, then the choice is again quite easy: go with 48000 for everything you record, and stay in the 48000 world all the way through to your final stereo mix. If your destination format is video-based, you're good to go, and if your destination format is music-based (CDs, MP3s, etc.) then you can always dither down to 44100 with a good audio-editing program (or let your mastering house take care of it).

If, however, you plan to incorporate a lot of purchased (or downloaded) samples and loops into your project, then you will very much be constrained by the sample rate used in your loop libraries.

Here's the essential problem: if your project contains mixed audio files of differing sample rates (some are 48000, some are 44100, some are 32000, some are 96000, etc.), then your sequencing software must perform on-the-fly sample rate conversion of some of those audio files to ensure that all the internal material being processed matches the sample rate defined for your environment/project.

For example, Ableton Live has an Audio Preferences setting where you specify the sample rate (and bit depth) to use for recording new clips. This same sample rate is used for rendering a stereo mixdown. So if Live is configured to use 48000 for the new stuff you record, but you are also including various 44100 loops and samples into your Live Sets, then Live must perform on-the-fly sample rate conversion of that 44100 material every time you play/render your Live Set.

And the sad fact is that all DAW software has generally crappy built-in sample rate conversion. All popular DAWs are in the same boat—they all suck at this. To perform really accurate sample rate conversion in real-time would eat way too many system resources and would limit the number of tracks/clips/loops/samples you could work with without bogging down your computer.

So, you are faced with an important choice if you plan to use a lot of commercial sample/loop libraries, because most of them are recorded at 44100:

  • You can record your own stuff at 44100, in which case your stuff and 95% of all the samples/loops you're likely to use alongside your own stuff will all match at 44100 and no internal sample rate conversion is taking place.
  • You can record your own stuff at 48000 and use some external audio editor (such as SoundForge or WaveLab) to convert all the samples you plan to use to 48000 before you bring those samples into your project. In this case, again, no internal sample rate conversion is taking place. While this approach yields the best sonic quality, it can be a pain in the ass to do all this external sample rate conversion of your commercial samples/loops, and of course it requires that much more hard disk space to store the duplicates.

The key element in both of these cases is that you're ensuring that no internal sample rate conversion is taking place. This will result in much cleaner-sounding mixes. You really can hear the quality difference between samples that were converted with a weak DAW-based sample rate conversion algorithm versus those that were converted with a capable audio editor such as SoundForge or WaveLab. Your sequencer software will also run much faster and smoother, and be able to handle larger numbers of clips and tracks.

So as some general rules of thumb if you're working with lots of external commercial samples in your projects:

  • If your goal is to produce mixes for live performance DJ scenarios (something at which Ableton Live excels, btw), then you probably want to record your own material at 44100 because you're probably using a lot of commercial samples/loops too, and your stuff is going to be played back over a house PA, so sound quality isn't that important.
  • If your goal is music output for CDs (or of course anything to do with video or DVD), you'll get a cleaner sound and better "air" and less artifacts around the high frequencies if you record your stuff at 48000 and use an external program to pre-convert your samples/loops to 48000 before importing them into your project.
  • If your goal is to produce downloadable music files and you don't plan for your stuff to ever end up on a commercially-mastered CDs or DVDs, then you'll save yourself a lot of hassle by recording in 44100. Most of your samples/loops will already be at 44100 so you'll have to perform little if any external sample rate conversion (although it always pays to check your sample/loop collections because a few here and there are recorded at 48000). Most importantly, though, downloadable file formats such as MP3, AAC (MP4), Vorbis, etc. are all both compressed and lossy, which means that all the very subtle "air" captured at 48000 is going to be smashed into oblivion by the time it ends up as an MP3/AAC or whatever.

Are you planning to move clips from your sequencer back into your keyboard as sampling data?

Let's use the Korg M3 as an example. The M3's internal sampling rate is 48000 and its internal bit-depth for sampling data stored in RAM is 16 bits. Therefore, before you load any sample into your M3 for use in a multisample, you really should take the time to use a good audio editor program (such as SoundForge or WaveLab) to convert the sample to 48000 and dither it from 24 bits down to 16 bits.

What this means is that if you plan to use a project in your sequencer primarily for producing stuff that you plan to load back into the M3 as sampling data, then you should record at 48000, and if your project uses any commercial samples/loops, you should just deal with the pain of using an external audio editor program to convert them to 48000 before bringing them into your project.

This same guideline applies to other keyboards too, but the details might differ depending the native sample rate and bit depth they support for imported sampling data.

What other "experts" have to say

The debate surrounding this subject can make your head explode. You've been duly warned, lol. And it's hard to find good information about the subject. Here are some good links to other popular material on the subject.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License