Nyquist-Shannon; The Backbone of Digital Sound

In 1982, the fruits of a partnership between Sony and Philips were released to the music-loving public. The Compact Disc, or CD, was unveiled as the first consumer digital audio format. These 12 cm discs could store up to 74 minutes of perfect quality digital audio, and as was famously demonstrated many times over, were immune from liquid damage, could withstand significant scratching, and could be played without wearing them down for a theoretically infinite number of times. That’s because, unl

ike all previous sound formats where the sound is encoded as an analog impression of the sound wave on plastic discs or magnetic tape, the sound on a CD is encoded as a series of samples, which function as a set of instructions on how to recreate the sound. The Compact Disc was a big freaking deal in many ways. It represented a giant leap in convenience for the consumer, quality of the recorded sound, and in raw data storage capacity. That last bit wouldn’t be too relevant to the computing indus

try for some time, but it solved the central problem of digital sound--needing a, for the time, absurdly massive amount of raw data. Before we get too far into the specifics of the Compact Disc, it’s time to dig a little deeper into how digital sound actually works. Recall from my last video that an analog-to-digital converter is taking instantaneous samples of the analog signal at a specific sampling frequency. Then, a digital-to-analog converter can recreate the original analog signal with onl

y those samples. Well, a lot of people think that this can’t possibly work to recreate all the detail in the original analog sound wave. But thanks to mathematics, we know that it can. And does. It’s time to explore the Nyquist-Shannon sampling theorem. This theorem was co-discovered by E. T. Whittaker and Vladimir Kotelnikov so it is also, but less commonly, referred to as the Whittaker–Nyquist–Kotelnikov–Shannon sampling theorem. Anyway, Harry Nyquist and Claude Shannon, along with the other t

wo, discovered that a band-limited signal, that means a signal which does not contain frequencies above a certain limit, can be perfectly described and perfectly reconstructed by taking instantaneous samples at twice the rate of the frequency limit. This gets a little complicated, so I’m going to explain it as best as I can. Our hearing itself is bandlimited. Though frequencies exist in nature above 20,000 hertz, our ears cannot detect anything above that frequency. Some people claim they can bu

t, well most people rapidly lose their hearing at those high frequencies as they age, and I’m talking like as they leave childhood, so let’s just not go there. With the knowledge that anything above 20,000 hertz will just not be audible, we can therefore capture all of the audio we can hear by only recording sounds below 20 kilohertz. Passing an analog input through what’s called a low-pass filter will eliminate any frequencies above a specified point. There’s no sense in recording sounds we can

’t hear, so we can use a low-pass filter to eliminate frequency components above 20 kilohertz and we end up with a signal that represents all that we can hear, and is, importantly, band-limited. For now, we’re going to assume this can work perfectly, so we’ll say that any signals above 20 kHz cannot pass through the low pass filter. Nyquist-Shannon tells us that by sampling at a rate of 40 kilohertz, we can capture all the detail possible in this newly band-limited signal. That may sound a littl

e unintuitive, but let’s explain why it’s true. When you have a band-limited signal, certain types of sounds get reduced to a representation of themselves made as a sum of sine wave harmonics. Fourier transformation, which is another complicated math subject that will rear its ugly head a little later, means that we can represent any waveform as a sum of sine wave harmonics. The most classic illustration of this is a square wave. A true square wave looks like this, but to actually reproduce this

signal would require near infinite bandwidth. And I know what you’re saying, that seems silly, this is just a rapid hard cut in and out of a signal. Like flipping a light switch on and off. Ehh, that’s true, but if we could produce this signal, then this vertical piece, which represents an instantaneous increase in amplitude, would require a frequency response that’s ridiculously high. See if we can make that instantaneous shift from low to high intensity, then we must be able to produce the sa

me shift downward in the same amount of time--which is no time at all if it truly is instantaneous. To do that would require infinite bandwidth. That’s more than we have to play with, so when this signal is passed through a low-pass filter of 20 kilohertz, it comes out like this. This is the sum of harmonics that would create this square wave via a fourier transform, but the highest frequency harmonic possible is 20 kilohertz. Because we’ve placed a bandlimit on the input, we’re dealing with not

hing but sine waves now, piled on top of each other. We are constructing any other waveforms as a sum of sine wave harmonics, and this explains why the Nyquist-Shannon theorem holds true. If the sample goes from the lowest possible to the highest possible value and back, the only waveform that can hit those three samples is a sine wave at the Nyquist frequency of 20 kilohertz. We can’t describe any frequencies higher than 20 kilohertz, but just two samples per cycle is enough to define our highe

st possible frequency. When you band-limit a signal, you eliminate all of the high frequency harmonics that could define a square wave with greater detail than this. Now before you cry foul that this is creating detail where there shouldn’t be detail, just remember that your hearing is just as bandlimited as the output of the low-pass filter. And also keep in mind that the whole chain here is messy. Everything in nature will oscillate with a sudden impulse of energy. Even if a true square wave w

ere to burst on the scene, your eardrums will oscillate between the peaks in energy, sort of like the wiggly wobblies in this bandlimited waveform. So they key here is-- don’t worry about it. This drove me crazy for days but the fact is--this is just how sound waves and nature work. Any signal can be represented as a sum of sine waves, and bandlimiting it simply forces these oscillations into existence. The most pressing issue with this representation is that it can create ringing artifacts as a

result of the Gibb’s phenomenon, but now we’re getting into really nitpicky stuff, and you could easily argue that this would happen even in analog systems due once again to natural oscillations either in the physical realm, such as the fact that both a phonograph needle and loudspeaker driver have mass and thus cannot move instantaneously so they will oscillate at their own harmonic frequency and create their own ringing artifacts anyway, and also the electrical realm because the nature of any

circuit will have some oscillations, too. So again, don’t worry about it. Now comes the part that sounds crazy but is completely true, and I can show it to you in a moment. Nyquist-shannon tells us that if we simply have twice the number of samples per second as the frequency of our signal’s band limit, the exact bandlimited signal can be reproduced perfectly, and I mean literally perfectly, using only those samples. This is admittedly weird, so let’s talk through it. Imagine an ADC is recordin

g a sound. Every 40 thousandth of a second it takes an instantaneous reading of the signal it receives. By quantifying it on a digital scale of our choosing, it creates 40 thousand discrete samples every second. But remember, that signal has passed through a low-pass filter before it reached the ADC, so it does not contain any frequencies above 20 kilohertz. The truly mind-blowing part about Nyquist-Shannon is that the samples we get from this bandlimited signal can ONLY reproduce the original s

ignal. There is only ONE signal that can possibly produce the exact series of samples that the ADC recorded Again, this is because we are dealing with a bandlimit. Without a bandlimit in place, the samples could be defining parts of other strange waveforms due to aliasing, but when adhering to this bandlimit the resulting string of samples can only define exactly one waveform. There is literally only one mathematical solution for the bandlimited waveform that would pass through all samples. (Min

d blow) This is some complicated stuff here. But just know that if I have any series of samples, and I assume these are representing a bandlimited signal, then they can only possibly satisfy one waveform. And that, ladies and gentlemen, is why the myth that digital sound creates a stair step pattern in the output is false. But the weirder bit is that the DAC very well might. But before you freak out--that doesn’t mean any stair-steppy signal has ever come out of a DAC or CD player or anything. A

nd that’s because of the same low-pass filter that originally bandlimited the input signal. Many digital to analog converters are actually quite simple. They use a resistor ladder, which is tied to the actual bits in each discrete sample, to produce the appropriate voltage. I don’t want to go on too much of a tangent here but they are really neat and explain how the simplest DACs work. Each bit of the sample is tied to a resistor. If it’s a 1 the resistor is activated and passes voltage through

it, and if it’s a 0 the resistor is not. The network of resistors will create a unique voltage for each possible combination of bits, and thus you now end up with zeros and ones equalling a voltage of however specific you want. A 16 bit DAC, like those used in the Compact Disc standard (most of the time--we’ll get to that) will have 16 resistors, each controlled by one bit of the datastream. These feed into intermediary resistors to create all of the possible voltages. But the more bits you add,

the more accurate these resistors have to become, which helps explain why the earliest DACs were very expensive. The technology to produce resistors in an integrated circuit within an accuracy within approximately .000015% was expensive for a while. Anyway, these R-2R DACs, as they’re sometimes called, will produce a stair-step waveform from the output of the resistor ladder. This is what’s called sample-and-hold. Each sample sustains the given voltage level until the next sample is received by

the DAC. This had led to many, many, far too many people believing that this is the signal that comes out of your CD player and goes into your amplifier. It is easy to imagine this blocky-looking waveform screwing around with your favorite recording of Beethoven’s 9th. But you forget, dear audiophile, that the stair-steppy waveform will pass through a low-pass filter on its way out. And that filter will create the same bandlimit on the output from the DAC as was placed on the input of the ADC.

Now, what this means, is that the output from the DAC is also bandlimited to 20 kilohertz. And why does that matter? Because the very stair-steppy nature of the resistor ladder’s output is impossible with a bandlimit of 20 kilohertz. Just like our square wave example, these vertical components require infinite bandwidth. Good luck with that. But what’s even weirder, and kinda difficult to grasp, is that because the output of the DAC has the same bandlimit as the ADC did, now we are dealing with

Nyquist-Shannon again. And the truly strange-but-true part of this, is that the only possible result of the output from the low pass filter is the original waveform that the ADC recorded. Remember, with a bandlimited signal, we can represent all of the detail within that signal with discrete samples, and with only a sample rate that is twice the bandlimit frequency. If we create a waveform that passes through all of the samples, then it must be the original waveform recorded by the ADC. The fact

that the waveform comes out of the resistor ladder all choppy-like doesn’t matter in the slightest, because the low pass filter will bandlimit it and get rid of the choppies. Now it can only contain frequencies of 20 kilohertz or below. Remember, the vertical parts of the stair-step pattern are impossible with that bandlimit, so they just get smoothed out. And since we know that the DAC was outputting the correct voltage level with each sample, all of the samples must have been satisfied. Which

means that after the LPF smooths the waveform, it must have passed through all of the samples. And because there’s a bandlimit in place, Nyquist-Shannon proves that the output signal is the exact same one as the input. To provide some evidence to back this claim up, take a look at this CD player. This is a Sony CD changer from 1993. It has a relatively rudimentary DAC in part because it’s a cheaper machine and in part because it’s older. Let’s hook an oscilloscope up to it and take a look at th

e output coming from its RCA jacks. This is just some music it’s playing right now. Notice that there’s nothing in here that looks remotely stair-steppy. But let’s take it even further. I’ve created a CD with various tones generated in Audacity. Let’s start with a 1 kilohertz square wave. Even though in Audacity the samples look like this--straight up, then hold, with a completely straight line between peaks--the output from the CD player is that wiggly wavy thing. That happens because those wig

gly wavies are the only way to make this square wave with a 20 kilohertz bandlimit, and the wiggly bits are passing through each of those samples. Now let’s switch to some sine waves. This is again 1 kilohertz. This looks perfectly smooth, no stair-steps to be seen. To be fair, though, even in Audacity it looks pretty good. Let’s move up to a 10 kilohertz sine wave. Now in Audacity it looks really gnarly, with the connections between the samples making a barely intelligible wave. There aren’t ev

en 5 samples per cycle, so how can the smooth detail of the sine wave possibly be reproduced? Well, take a look. There’s a perfectly smooth sine wave for you, right there. This is why some of you cringed when I drew straight lines between the samples. That’s only sort of what happens, and even then it’s not that accurate. But it does serve as a sort of blend between the two realities. There is a stair-step pattern in the intermediate between the resistor ladder and the low pass filter. So the DA

C does connect the dots, but like this. Then the LPF smooths out the connections between the dots, but that only happens as a side-effect of the fact that it’s creating a bandlimit so the high frequency components, that’s the vertical parts here, get tossed out. What you’re left with is the only possible waveform that can both hit all the samples, and which does not contain frequency components above Nyquist. Simple, right? Ah! We haven’t even really talked about the CD itself yet! And this is p

ushing into the 14 or 15 minute mark already, if my gauge of time per written page is at all correct. OK, I guess we’re going to push the technology of the CD into another video. But that’s OK, since we covered what makes sound out of numbers. And hopefully we’ve destroyed the myth that digital audio cannot produce smooth waveforms. It does. Much of the information from this video (and indeed some selected clips) came from a lovely video by Monty at xiph.org. I’ve linked to a great article of hi

s down below, and a card will pop up now heading to his video. Many, many people brought this to my attention on Twitter and elsewhere, so thank you. He’s got some much better demonstrations than I do that cover this topic. He also explains why the bit depth affects noise, and not clarity, what dithering is and how it reduces quantization noise, and much more. But I will give you one last tid-bit before I sign off. You may have noticed that in the video we’ve been discussing a theoretical sampli

ng rate of 40 kilohertz, as this Nyquist sampling rate could perfectly capture all of what human hearing can pick up. But the CD standard’s sampling rate is 44.1 kilohertz. Why the extra 4.1? That seems awfully specific. Well, that’s due to the fact that low-pass filters aren’t perfect. They can’t just cut off frequencies above a point, they instead have a transition window where the frequencies degrade to zero. The 44.1 kilohertz sampling rate is to accommodate for that window. Without a hard-c

ut on the low-pass filter, aliasing could occur because the samples might define a waveform of a higher frequency than Nyquist. This is a precisely why we need an LPF. Both of these waveforms satisfy all the samples, so to prevent one of them from coming through, we need to decide a limit in frequency. If the red waveform is above the Nyquist limit, then it won’t get reproduced. But if the low-pass filter could let slip some signals above our decided sample rate, scenarios like this might occur.

Therefore, the sample rate was chosen to be 44.1 kilohertz, that way it exceeds the transition window for our desired 20 kilohertz cutoff. And by sampling a bit beyond the audible range, we don’t have to worry about spurious aliasing artifacts from samples in the transition band. But the more interesting thing about the 44.1 kilohertz rate is that it was also chosen for easy digital sound storage before a CD gets pressed. This was the perfect sampling rate for storing sound on both an NTSC and

a PAL U-Matic videocassette recorder. The commercial VCR format from Sony was among the earliest ways to store a digital audio stream, using the video signal sort of like a giant QR code. It’s not literally being read in that sense, but the data is stored on the tape as a field of black and white bars in each line, so if watching the output on-screen it would look like a flashing screen of QR codes whizzing by 60 times per second. 44.1 kilohertz would, on both NTSC and PAL signals, work out to 3

samples per video line. So in a strange twist, the world of analog video dicated how digital sound would work. Thanks for watching, I hope you enjoyed the video. As I end all of my videos apparently, if this is your first time coming across the channel and you liked what you saw, please consider subscribing. Also, and I know this can sound weird, but for those that follow my videos--you might want to make sure you are actually subscribed. Often times YouTube just serves you content because it k

nows you like it, and you might think you are subscribed when you aren’t. Same goes for all channels, but I’ll be the selfish one today and post this reminder. As always, thank you to everyone who supports this channel on Patreon, especially the fine folks who are scrolling up your screen. Supporters on Patreon have turned my weird hobby of making videos about technology into a job, and you all deserve my thanks. If you would like to pledge some support and help the channel grow, please check ou

t my Patreon page. Thanks for your consideration, and I’ll see you next time!

Comments

@TechnologyConnections

There's a shot in this video that's upside down because... I forgot to un-upside down it. Sigh.

@OptimumPx

"Don't worry about it." -Technology Connections 2018

@Jetsetlemming

The hardest part of how sound works for me was how multiple sound sources combine to a single data stream, a single wave form. Everything an ear hears is the input of a single wave traveling through the liquid of your cochlea, and your brain does extremely complicated processing on it to separate those elements, identify them, and locate them in 3d space. This is why your headphones don't need separate audio outputs for every instrument in the song you're listening to.

@JamesNeave1978

This is how I like to think of low pass filters "de-jaggying" (antialiasing). A low pass filter, at it's absolute simplest, is a capacitor (with a current limiting resistor to keep the smoke in) Capacitors resist change in voltage. Like a weight on a spring resists change in length of a vertical spring. So that infinite rate change stair step is slowed down by the capacitor. That's it. The digital signal tells the voltage to "CHANGE RIGHT NOW NOW NOW GO GO GO" Then the capacitor replies, "OK, will do, gimme a second, I'm getting there, plod plod plod" Simples.

@thecommenter4629

I was a young lad when CD's became big but I can still remember the first time I heard a CD back in the late 80's... sounded like the musicians were in the room with me. To my ears it was superior to any cassette tape or vinyl record because there was no background noise... just a clarity of sound I had not experienced until then.

@K-o-R

"wiggly wobblies" I told you to cut out the technobabble!

@CapyTapy

Xiph Monty? The same Monty from the opus and vorbis codecs? Damn this dude deserves an award!

@AnnaVannieuwenhuyse

The Nyquist-Shannon theory is one of my favourite. It just can't be questioned. It just is solid mathematics.

@ReikazeRambles

I've been a fan of technology connections for the longest time and have been watching the channel for fun, but I never expected it to explain a topic in one of my classes better than my professor! What a pleasant suprise, thanks for the video :)

@lemondropcentral14

I have a 4 year physics degree and have worked as an electronic testing engineer for the avionics industry for 2 years.... this video finally helped me understand what the Fourier transform does. Sure, I used it all the time on my physics and math homework in college, but I could never wrap my head around what it actually does. After this video, it finally makes sense!

@shmehfleh3115

The Covox Speech Thing and Disney Sound Source are two practical examples of really simple ladder resistor DACs. They both plugged into the parallel port of an old PC and used its 8 data lines to drive 8 resistor networks tied into a mono analog audio out. This chumpy setup basically created an 8-bit DAC with a sample rate that was limited only by the speed of the parallel port. Unfortunately, as they were completely dumb hunks of passive electronics, they couldn't tell the difference between data meant to be converted into sound and data meant for the printer. If their output was left on when printing, they made a horrible squealing noise.

@scruffythejanitor1969

I request one in-depth video on DACs, please. Or whatever, since you've yet to have a bad video on this channel.

@LMacNeill

Very cool explanation of the Nyquist-Shannon theorem. Made it easy to wrap my head around it. Thanks!

@jfbaquero

And yet another great video. I had read Monty's article eons ago, for a long time I had lost track of it. As an audiophile I used to share it with other audiophiles to help them understand how digital audio works and also help explain why buying audio downloads sampled at 192kHz is just a waste of money (maybe not for Batman). The reality is that almost no one understands the math and engineering underneath (including some engineers). The problem usually is a lousy ADC/DAC not the digital medium. Your lecture in this video is the closest I 've seen to make such complex issue understandable for those who lack the science background. Amazing work! Respect and Thanks!

@NickMoore

Band-limiting was the conceptual bit I have been missing for ages! Thank you for putting it all together.

@richarddeese1991

Thanks! Claude Shannon is absolutely an unsung hero. He should be as famous as Von Neumann (to me)! He was clearly a genius, and - more than any other person - is the father of information theory. He laid the groundwork for the digital age, and he did it back in the 1940s. tavi.

@GenXGrownUp

"The key here is... don't worry about it." Had me literally laughing out loud. Thanks again for these awesome videos. You inspire me to work harder on my own channel's production and entertainment value. 😀

@jacekjagosz

Wow, this is an eye opening video! Like how DAC is a resistor ladder, it is so simple!

@joshsampey2460

You are officially the first channel I have supported on patreon. Every video is informative, free of gimmicks, well written with great nerd humor thrown in.

@82abn34

A beautiful explanation. I can't imagine how you struggled to achieve such an elegant and balanced presentation without mathematics. Thanks! Ive been thinking about this topic for years and today you've helped me reach that very important intuitive feel.

Nyquist-Shannon; The Backbone of Digital Sound

Related articles

Comments