The Digital Re-creation
What is “digital” in this context? Digital is a re-creation of an analog signal. You may know that computers can actually only understand 1 & 0 (or on/off, or T/F) at the most basic level. Computers can process on/off logic so quickly they appear to do so much more but all software and digital media must be built on this foundation. The term “bit” as in 8bit, 16bit, 24bit, 32bit, 64bit explains how fast they have become. 1 bit is the same as 1/0.
Our bodies and senses are analog, capable of many more readings than 1 and 0. The earth is analog. Electricity is analog. Musical instruments – even digital ones – make analog sounds.
Digital cannot move air and does not exist in space as energy. It ultimately is a collection of data meant to represent the original analog. It exists only inside of computer circuitry and digital media (CD’s, hard drives, RAM) and has to be converted back to analog for us to perceive it.
What digital allows for is infinite transmission and replication without loss. Since digital itself is a re-creation it can be infinitely recreated. This is a major plus for convenience.
Even though consumer music was the first media to convert to digital over 30 years ago, the format has never received an upgrade. The CD standard was 44,100 samples per second stored in 16bit data, which was set in 1979 and formalized as “Redbook CD-DA” in 1980.
A couple of attempts to upgrade this format for consumers have come and gone, hampered by various issues. Volumes can be filled with explanations of their failures: new file formats or new equipment were often required and backwards compatibility with existing media was questionable at best. People were convinced to re-buy much of their music library on CD and therefore were not receptive to another format change 10 years later.
At the same time, the quickly expanding internet and mobile convenience worlds were forces that these new, larger formats were counter to, and MP3 files eventually competed with the CD as the digital music standard. The internet and mobile led to file size becoming paramount and all else secondary.
MP3, an offshoot of mpeg video compression, takes the music file and applies data compression tactics to reduce file size. To achieve this it throws out audio data and uses masking and interpolations to fool the ear. The software engineers building MP3 focused on audible recognition over accuracy, depth or width, and largely accomplished their goals by reducing file sizes over 80%. If a song on CD clocks in at about 50mb of data – that same song as an MP3 would be 5mb.
Sound quality suffered. We all can recognize and enjoy songs as MP3 files, but even a casual listener can hear artifacts of the compression and other lesser qualities in the musical performance which overall seemed smaller and with less detail.
The release and success of Apple’s iPod in the early 00’s cemented MP3 as the new consumer format and it has held that position ever since. MP3 has recently been relabled lossy, acknowledging the loss of audio signal, but some also now call 16/44 lossless which is fallacious. Just like lossy MP3 was considered a fair compromise in order to move the files over the internet or onto our mobile devices, downsampled 16/44 was all that could fit on a CD and be decoded in real-time.
Why is calling 16bit/44k files lossless misleading? One reason is because most modern music is recorded and mixed at 24bit, usually 24/48, 24/88, or 24/96. To squeeze the 24bit data (a possible 16 million values) into 16bit space (a possible 65 thousand values) dithering was created. Dithering is a fancy way to say fuzz. If you do a raw conversion from 24bit to 16bit, as shown below, the resulting waveform gets a jagginess that most hear as distortions. Allowing the computer to apply ultra-fast increases and decreases to the signal covers up the distortions and results in a fuzziness that our ears don’t hate.
The picture below shows the reduction from 24bit to 8bit without and with dither. The middle waveform might hurt your ears, whereas the dithered version covers, or masks, the waveform jaggies. Going from 24 to 16bit is less obvious visually, but dithering is usually used to keep the 16bit file as free from audible distortion as possible.
Another way of thinking of resolution is with a simple X & Y axis. The bit-depth is the Y (vertical) axis and the sample rate is the X axis. 24bit data allows for roughly 16,000,000 possible values on the X axis, whereas 16bit data allows for about 65,000 possible values. To squeeze 16 million points into 65 thousand spaces the computer has to make assumptions and throw out points, then use dithering to fuzz up the jaggies. Same problem with the sample frequency on the X axis – putting 92 thousand samples into 44 thousand slots requires compromises.
In this case the engineers have done some amazing predictive math mixed with the science of sound to reduce sample rates without hearing the obvious results.
Focusing on instruments, vocals – the primary program of the song – doesn’t expose the effects of downsampling as much as when the listener focuses on the space and the delay-width of the parts in the recording. Things like the pan of the soundstage, the place in the room of each instrument, the interaction across the EQ of the multiple instruments and/or voices, and the very timbre and expressive humanity – these things are nearly impossible to measure but quite simple to hear.
Those that preach science over common sense say 16/44 digital is all anyone can ever hear because they lack their version of scientific proof that better exists. Yet millions of people can and have heard full audio, with many of them in the audio industry, and we all know that the consumer has been sold an inferior product for over 30 years.