You can support this channel on Patreon! Link below
Let’s talk a bit more about digital sound. Thanks to a mathematical theorem, we know that a bandlimited signal can perfectly be represented by a series of discrete samples that occur at twice the rate of the frequency of the bandlimit. OK, that’s hard to explain in a little descroption blurb, so the video is probably your best bet.
This here is the video of Monty’s. YOU SHOULD TOTALLY WATCH IT!! There is so much good stuff in here and it’s a great resource for dispelling some of the myths of Digital Sound.
https://www.youtube.com/watch?v=cIQ9IXSUzuM
This is his original article that inspired the video (I think--it’s a great article anyway):
https://xiph.org/~xiphmont/demo/neil-young.html
You can support this channel on Patreon! Patrons of the channel are what keep these videos coming, and with the support of viewers like you, I’m making improvements to the channel. If you’d like to pledge some support and help the channel grow, please check out my Patreon page. Thanks for your consideration!
https://www.patreon.com/technologyconnections
And thank you to the following Patrons!
Sen, mark barratt, Tully, Violet Moon, Duncan Ward, Tobias Faller, Justin Smith, Corey A Hudson, EpicLPer, Luc Ritchie, Michael Dragone, Manfred Farris, Eric Romero, John Laur, Patric Bates, Sven Almgren, Lutz Broska, Jürgen Kieser, Luke O’Dell, Nicholas, Ewen McNeill, thefanification, Nicolas, Albin Flyckt, Michael A Kalfas II, Michael Bernstein, Kevin Kostka, Shame Zamora, Brad Wilmot, John Bailey, Alex Ilyin, Miles H, Deovandski Skibinski Junior, Andrew "FastLizard4" Adams, Avi Drissman, Jens Bretschneider, Phil Taprogge, Sam, Rich Jeanes, Jonathan Skowronek, Tim Grov, Pieter van der Eems, Philip Kvist, Brian Condron, Peter Jerde, Torin Zaugg, James Watson, Vince Terranova, Jason Nevins, Andrew Montagne, David Scott, Mike Nichols, MrSonicOSG, Brandon Enright, James Fialho, Christian Torelli, Sunchild, Kim Rypstra, The Paul Allen, toasterking, Seth Robinson, Ralph, Pavel Soukharev, Forrest Miller, Patrick Quinn-Graham, Max Zelinski, Troy Kelly, Ulti, Jason Brandy, Norman Tatlock, Jesper Jansen, Andrew Johnson, Goolashe, Rémy GRANDIN, ce keen, Jake Shep83, Nick Pollard, Drew Holm, David Grossman, Ben Auch, Jeff Puglisi, Andy S, Robert, Johan Greefkes, Jacob Dixon, Matt Luebbert, Alex Corn, SonOfSofaman, Brent Higgins, Rob Kefford, Roger Baker, Alexander Schrøder, Andreas Skagestad, Eric Butterfield, James Holmes, Tim Skloss, James-Ross Harrison, Sean OCallaghan, Lee Wallbank, Jonas, Colin Cogle, Kyle Matheis, Krzysztof Klimonda, Aaron Rennow, Gantradies
Don’t see your name? Don’t worry! To keep this little shout-out alive, the $5 patron shoutout is now on a rotating basis! If you’re not here, you should be here in one of the next two videos. If you’ve slipped through the cracks, don’t hesitate to send me a message via Patreon and I’ll fix it!
In 1982, the fruits of a partnership between Sony and Philips were released to the music-loving public. The Compact Disc, or CD, was unveiled
as the first consumer digital audio format. These 12 cm discs could store up to 74 minutes
of perfect quality digital audio, and as was famously demonstrated many times over, were
immune from liquid damage, could withstand significant scratching, and could be played
without wearing them down for a theoretically infinite number of times. That’s because,
unl
ike all previous sound formats where the sound is encoded as an analog impression of
the sound wave on plastic discs or magnetic tape, the sound on a CD is encoded as a series
of samples, which function as a set of instructions on how to recreate the sound. The Compact Disc was a big freaking deal in
many ways. It represented a giant leap in convenience for the consumer, quality of the
recorded sound, and in raw data storage capacity. That last bit wouldn’t be too relevant to
the computing indus
try for some time, but it solved the central problem of digital sound--needing
a, for the time, absurdly massive amount of raw data. Before we get too far into the specifics of
the Compact Disc, it’s time to dig a little deeper into how digital sound actually works.
Recall from my last video that an analog-to-digital converter is taking instantaneous samples
of the analog signal at a specific sampling frequency. Then, a digital-to-analog converter
can recreate the original analog signal with onl
y those samples. Well, a lot of people
think that this can’t possibly work to recreate all the detail in the original analog sound
wave. But thanks to mathematics, we know that it can. And does. It’s time to explore the Nyquist-Shannon
sampling theorem. This theorem was co-discovered by E. T. Whittaker and Vladimir Kotelnikov
so it is also, but less commonly, referred to as the Whittaker–Nyquist–Kotelnikov–Shannon
sampling theorem. Anyway, Harry Nyquist and Claude Shannon, along with the other t
wo,
discovered that a band-limited signal, that means a signal which does not contain frequencies
above a certain limit, can be perfectly described and perfectly reconstructed by taking instantaneous
samples at twice the rate of the frequency limit. This gets a little complicated, so
I’m going to explain it as best as I can. Our hearing itself is bandlimited. Though
frequencies exist in nature above 20,000 hertz, our ears cannot detect anything above that
frequency. Some people claim they can bu
t, well most people rapidly lose their hearing
at those high frequencies as they age, and I’m talking like as they leave childhood,
so let’s just not go there. With the knowledge that anything above 20,000 hertz will just
not be audible, we can therefore capture all of the audio we can hear by only recording
sounds below 20 kilohertz. Passing an analog input through what’s called
a low-pass filter will eliminate any frequencies above a specified point. There’s no sense
in recording sounds we can
’t hear, so we can use a low-pass filter to eliminate frequency
components above 20 kilohertz and we end up with a signal that represents all that we
can hear, and is, importantly, band-limited. For now, we’re going to assume this can
work perfectly, so we’ll say that any signals above 20 kHz cannot pass through the low pass
filter. Nyquist-Shannon tells us that by sampling
at a rate of 40 kilohertz, we can capture all the detail possible in this newly band-limited
signal. That may sound a littl
e unintuitive, but let’s explain why it’s true. When
you have a band-limited signal, certain types of sounds get reduced to a representation
of themselves made as a sum of sine wave harmonics. Fourier transformation, which is another complicated
math subject that will rear its ugly head a little later, means that we can represent
any waveform as a sum of sine wave harmonics. The most classic illustration of this is a
square wave. A true square wave looks like this, but to
actually reproduce this
signal would require near infinite bandwidth. And I know what you’re
saying, that seems silly, this is just a rapid hard cut in and out of a signal. Like flipping
a light switch on and off. Ehh, that’s true, but if we could produce this signal, then
this vertical piece, which represents an instantaneous increase in amplitude, would require a frequency response that’s ridiculously high. See if we can make that instantaneous shift from low to
high intensity, then we must be able to produce the sa
me shift downward in the same amount
of time--which is no time at all if it truly is instantaneous. To do that would require
infinite bandwidth. That’s more than we have to play with, so when this signal is
passed through a low-pass filter of 20 kilohertz, it comes out like this. This is the sum of
harmonics that would create this square wave via a fourier transform, but the highest frequency
harmonic possible is 20 kilohertz. Because we’ve placed a bandlimit on the
input, we’re dealing with not
hing but sine waves now, piled on top of each other. We
are constructing any other waveforms as a sum of sine wave harmonics, and this explains
why the Nyquist-Shannon theorem holds true. If the sample goes from the lowest possible
to the highest possible value and back, the only waveform that can hit those three samples
is a sine wave at the Nyquist frequency of 20 kilohertz. We can’t describe any frequencies
higher than 20 kilohertz, but just two samples per cycle is enough to define our highe
st
possible frequency. When you band-limit a signal, you eliminate
all of the high frequency harmonics that could define a square wave with greater detail than
this. Now before you cry foul that this is creating detail where there shouldn’t be
detail, just remember that your hearing is just as bandlimited as the output of the low-pass
filter. And also keep in mind that the whole chain here is messy. Everything in nature
will oscillate with a sudden impulse of energy. Even if a true square wave w
ere to burst on
the scene, your eardrums will oscillate between the peaks in energy, sort of like the wiggly
wobblies in this bandlimited waveform. So they key here is-- don’t worry about it. This drove me crazy for days but the fact
is--this is just how sound waves and nature work. Any signal can be represented as a sum
of sine waves, and bandlimiting it simply forces these oscillations into existence.
The most pressing issue with this representation is that it can create ringing artifacts as
a
result of the Gibb’s phenomenon, but now we’re getting into really nitpicky stuff,
and you could easily argue that this would happen even in analog systems due once again
to natural oscillations either in the physical realm, such as the fact that both a phonograph
needle and loudspeaker driver have mass and thus cannot move instantaneously so they will
oscillate at their own harmonic frequency and create their own ringing artifacts anyway,
and also the electrical realm because the nature of any
circuit will have some oscillations,
too. So again, don’t worry about it. Now comes the part that sounds crazy but is
completely true, and I can show it to you in a moment. Nyquist-shannon tells us that
if we simply have twice the number of samples per second as the frequency of our signal’s
band limit, the exact bandlimited signal can be reproduced perfectly, and I mean literally
perfectly, using only those samples. This is admittedly weird, so let’s talk through
it. Imagine an ADC is recordin
g a sound. Every
40 thousandth of a second it takes an instantaneous reading of the signal it receives. By quantifying
it on a digital scale of our choosing, it creates 40 thousand discrete samples every
second. But remember, that signal has passed through a low-pass filter before it reached
the ADC, so it does not contain any frequencies above 20 kilohertz. The truly mind-blowing
part about Nyquist-Shannon is that the samples we get from this bandlimited signal can ONLY
reproduce the original s
ignal. There is only ONE signal that can possibly produce the exact
series of samples that the ADC recorded Again, this is because we are dealing with a bandlimit.
Without a bandlimit in place, the samples could be defining parts of other strange waveforms
due to aliasing, but when adhering to this bandlimit the resulting string of samples
can only define exactly one waveform. There is literally only one mathematical solution
for the bandlimited waveform that would pass through all samples. (Min
d blow) This is some complicated stuff here. But just
know that if I have any series of samples, and I assume these are representing a bandlimited
signal, then they can only possibly satisfy one waveform. And that, ladies and gentlemen,
is why the myth that digital sound creates a stair step pattern in the output is false.
But the weirder bit is that the DAC very well might. But before you freak out--that doesn’t
mean any stair-steppy signal has ever come out of a DAC or CD player or anything. A
nd
that’s because of the same low-pass filter that originally bandlimited the input signal. Many digital to analog converters are actually
quite simple. They use a resistor ladder, which is tied to the actual bits in each discrete
sample, to produce the appropriate voltage. I don’t want to go on too much of a tangent
here but they are really neat and explain how the simplest DACs work. Each bit of the
sample is tied to a resistor. If it’s a 1 the resistor is activated and passes voltage
through
it, and if it’s a 0 the resistor is not. The network of resistors will create
a unique voltage for each possible combination of bits, and thus you now end up with zeros
and ones equalling a voltage of however specific you want. A 16 bit DAC, like those used in
the Compact Disc standard (most of the time--we’ll get to that) will have 16 resistors, each
controlled by one bit of the datastream. These feed into intermediary resistors to create
all of the possible voltages. But the more bits you add,
the more accurate these resistors
have to become, which helps explain why the earliest DACs were very expensive. The technology
to produce resistors in an integrated circuit within an accuracy within approximately .000015% was expensive for a while. Anyway, these R-2R DACs, as they’re sometimes
called, will produce a stair-step waveform from the output of the resistor ladder. This
is what’s called sample-and-hold. Each sample sustains the given voltage level until the
next sample is received by
the DAC. This had led to many, many, far too many people believing
that this is the signal that comes out of your CD player and goes into your amplifier.
It is easy to imagine this blocky-looking waveform screwing around with your favorite
recording of Beethoven’s 9th. But you forget, dear audiophile, that the
stair-steppy waveform will pass through a low-pass filter on its way out. And that filter
will create the same bandlimit on the output from the DAC as was placed on the input of
the ADC.
Now, what this means, is that the output from the DAC is also bandlimited to
20 kilohertz. And why does that matter? Because the very stair-steppy nature of the resistor
ladder’s output is impossible with a bandlimit of 20 kilohertz. Just like our square wave
example, these vertical components require infinite bandwidth. Good luck with that. But what’s even weirder, and kinda difficult
to grasp, is that because the output of the DAC has the same bandlimit as the ADC did,
now we are dealing with
Nyquist-Shannon again. And the truly strange-but-true part of this,
is that the only possible result of the output from the low pass filter is the original waveform
that the ADC recorded. Remember, with a bandlimited signal, we can represent all of the detail
within that signal with discrete samples, and with only a sample rate that is twice
the bandlimit frequency. If we create a waveform that passes through all of the samples, then
it must be the original waveform recorded by the ADC. The fact
that the waveform comes out of the
resistor ladder all choppy-like doesn’t matter in the slightest, because the low pass
filter will bandlimit it and get rid of the choppies. Now it can only contain frequencies
of 20 kilohertz or below. Remember, the vertical parts of the stair-step pattern are impossible
with that bandlimit, so they just get smoothed out. And since we know that the DAC was outputting
the correct voltage level with each sample, all of the samples must have been satisfied.
Which
means that after the LPF smooths the waveform, it must have passed through all
of the samples. And because there’s a bandlimit in place, Nyquist-Shannon proves that the
output signal is the exact same one as the input. To provide some evidence to back this claim
up, take a look at this CD player. This is a Sony CD changer from 1993. It has a relatively
rudimentary DAC in part because it’s a cheaper machine and in part because it’s older.
Let’s hook an oscilloscope up to it and take a look at th
e output coming from its
RCA jacks. This is just some music it’s playing right
now. Notice that there’s nothing in here that looks remotely stair-steppy. But let’s
take it even further. I’ve created a CD with various tones generated in Audacity.
Let’s start with a 1 kilohertz square wave. Even though in Audacity the samples look like
this--straight up, then hold, with a completely straight line between peaks--the output from
the CD player is that wiggly wavy thing. That happens because those wig
gly wavies are the
only way to make this square wave with a 20 kilohertz bandlimit, and the wiggly bits are
passing through each of those samples. Now let’s switch to some sine waves. This
is again 1 kilohertz. This looks perfectly smooth, no stair-steps to be seen. To be fair,
though, even in Audacity it looks pretty good. Let’s move up to a 10 kilohertz sine wave.
Now in Audacity it looks really gnarly, with the connections between the samples making
a barely intelligible wave. There aren’t ev
en 5 samples per cycle, so how can the smooth
detail of the sine wave possibly be reproduced? Well, take a look. There’s a perfectly smooth
sine wave for you, right there. This is why some of you cringed when I drew
straight lines between the samples. That’s only sort of what happens, and even then it’s
not that accurate. But it does serve as a sort of blend between the two realities. There
is a stair-step pattern in the intermediate between the resistor ladder and the low pass
filter. So the DA
C does connect the dots, but like this. Then the LPF smooths out the
connections between the dots, but that only happens as a side-effect of the fact that
it’s creating a bandlimit so the high frequency components, that’s the vertical parts here,
get tossed out. What you’re left with is the only possible waveform that can both hit
all the samples, and which does not contain frequency components above Nyquist. Simple,
right? Ah! We haven’t even really talked about
the CD itself yet! And this is p
ushing into the 14 or 15 minute mark already, if my gauge
of time per written page is at all correct. OK, I guess we’re going to push the technology
of the CD into another video. But that’s OK, since we covered what makes sound out
of numbers. And hopefully we’ve destroyed the myth that digital audio cannot produce
smooth waveforms. It does. Much of the information from this video (and
indeed some selected clips) came from a lovely video by Monty at xiph.org. I’ve linked
to a great article of hi
s down below, and a card will pop up now heading to his video.
Many, many people brought this to my attention on Twitter and elsewhere, so thank you. He’s
got some much better demonstrations than I do that cover this topic. He also explains
why the bit depth affects noise, and not clarity, what dithering is and how it reduces quantization
noise, and much more. But I will give you one last tid-bit before
I sign off. You may have noticed that in the video we’ve been discussing a theoretical
sampli
ng rate of 40 kilohertz, as this Nyquist sampling rate could perfectly capture all
of what human hearing can pick up. But the CD standard’s sampling rate is 44.1 kilohertz.
Why the extra 4.1? That seems awfully specific. Well, that’s due to the fact that low-pass
filters aren’t perfect. They can’t just cut off frequencies above a point, they instead
have a transition window where the frequencies degrade to zero. The 44.1 kilohertz sampling rate is to accommodate
for that window. Without a hard-c
ut on the low-pass filter, aliasing could occur because
the samples might define a waveform of a higher frequency than Nyquist. This is a precisely
why we need an LPF. Both of these waveforms satisfy all the samples, so to prevent one
of them from coming through, we need to decide a limit in frequency. If the red waveform
is above the Nyquist limit, then it won’t get reproduced. But if the low-pass filter
could let slip some signals above our decided sample rate, scenarios like this might occur.
Therefore, the sample rate was chosen to be 44.1 kilohertz, that way it exceeds the transition
window for our desired 20 kilohertz cutoff. And by sampling a bit beyond the audible range,
we don’t have to worry about spurious aliasing artifacts from samples in the transition band. But the more interesting thing about the
44.1 kilohertz rate is that it was also chosen for easy digital sound storage before a CD
gets pressed. This was the perfect sampling rate for storing sound on both an NTSC and
a PAL U-Matic videocassette recorder. The commercial VCR format from Sony was among
the earliest ways to store a digital audio stream, using the video signal sort of like
a giant QR code. It’s not literally being read in that sense, but the data is stored
on the tape as a field of black and white bars in each line, so if watching the output on-screen
it would look like a flashing screen of QR codes whizzing by 60 times per second. 44.1
kilohertz would, on both NTSC and PAL signals, work out to 3
samples per video line. So in
a strange twist, the world of analog video dicated how digital sound would work. Thanks for watching, I hope you enjoyed the
video. As I end all of my videos apparently, if this is your first time coming across the
channel and you liked what you saw, please consider subscribing. Also, and I know this
can sound weird, but for those that follow my videos--you might want to make sure you
are actually subscribed. Often times YouTube just serves you content because it k
nows you
like it, and you might think you are subscribed when you aren’t. Same goes for all channels,
but I’ll be the selfish one today and post this reminder. As always, thank you to everyone who supports
this channel on Patreon, especially the fine folks who are scrolling up your screen. Supporters
on Patreon have turned my weird hobby of making videos about technology into a job, and you
all deserve my thanks. If you would like to pledge some support and help the channel grow,
please check ou
t my Patreon page. Thanks for your consideration, and I’ll see you next
time!
Comments
There's a shot in this video that's upside down because... I forgot to un-upside down it. Sigh.
"Don't worry about it." -Technology Connections 2018
The hardest part of how sound works for me was how multiple sound sources combine to a single data stream, a single wave form. Everything an ear hears is the input of a single wave traveling through the liquid of your cochlea, and your brain does extremely complicated processing on it to separate those elements, identify them, and locate them in 3d space. This is why your headphones don't need separate audio outputs for every instrument in the song you're listening to.
This is how I like to think of low pass filters "de-jaggying" (antialiasing). A low pass filter, at it's absolute simplest, is a capacitor (with a current limiting resistor to keep the smoke in) Capacitors resist change in voltage. Like a weight on a spring resists change in length of a vertical spring. So that infinite rate change stair step is slowed down by the capacitor. That's it. The digital signal tells the voltage to "CHANGE RIGHT NOW NOW NOW GO GO GO" Then the capacitor replies, "OK, will do, gimme a second, I'm getting there, plod plod plod" Simples.
I was a young lad when CD's became big but I can still remember the first time I heard a CD back in the late 80's... sounded like the musicians were in the room with me. To my ears it was superior to any cassette tape or vinyl record because there was no background noise... just a clarity of sound I had not experienced until then.
"wiggly wobblies" I told you to cut out the technobabble!
Xiph Monty? The same Monty from the opus and vorbis codecs? Damn this dude deserves an award!
The Nyquist-Shannon theory is one of my favourite. It just can't be questioned. It just is solid mathematics.
I've been a fan of technology connections for the longest time and have been watching the channel for fun, but I never expected it to explain a topic in one of my classes better than my professor! What a pleasant suprise, thanks for the video :)
I have a 4 year physics degree and have worked as an electronic testing engineer for the avionics industry for 2 years.... this video finally helped me understand what the Fourier transform does. Sure, I used it all the time on my physics and math homework in college, but I could never wrap my head around what it actually does. After this video, it finally makes sense!
The Covox Speech Thing and Disney Sound Source are two practical examples of really simple ladder resistor DACs. They both plugged into the parallel port of an old PC and used its 8 data lines to drive 8 resistor networks tied into a mono analog audio out. This chumpy setup basically created an 8-bit DAC with a sample rate that was limited only by the speed of the parallel port. Unfortunately, as they were completely dumb hunks of passive electronics, they couldn't tell the difference between data meant to be converted into sound and data meant for the printer. If their output was left on when printing, they made a horrible squealing noise.
I request one in-depth video on DACs, please. Or whatever, since you've yet to have a bad video on this channel.
Very cool explanation of the Nyquist-Shannon theorem. Made it easy to wrap my head around it. Thanks!
And yet another great video. I had read Monty's article eons ago, for a long time I had lost track of it. As an audiophile I used to share it with other audiophiles to help them understand how digital audio works and also help explain why buying audio downloads sampled at 192kHz is just a waste of money (maybe not for Batman). The reality is that almost no one understands the math and engineering underneath (including some engineers). The problem usually is a lousy ADC/DAC not the digital medium. Your lecture in this video is the closest I 've seen to make such complex issue understandable for those who lack the science background. Amazing work! Respect and Thanks!
Band-limiting was the conceptual bit I have been missing for ages! Thank you for putting it all together.
Thanks! Claude Shannon is absolutely an unsung hero. He should be as famous as Von Neumann (to me)! He was clearly a genius, and - more than any other person - is the father of information theory. He laid the groundwork for the digital age, and he did it back in the 1940s. tavi.
"The key here is... don't worry about it." Had me literally laughing out loud. Thanks again for these awesome videos. You inspire me to work harder on my own channel's production and entertainment value. 😀
Wow, this is an eye opening video! Like how DAC is a resistor ladder, it is so simple!
You are officially the first channel I have supported on patreon. Every video is informative, free of gimmicks, well written with great nerd humor thrown in.
A beautiful explanation. I can't imagine how you struggled to achieve such an elegant and balanced presentation without mathematics. Thanks! Ive been thinking about this topic for years and today you've helped me reach that very important intuitive feel.