Is "lossless" really lossless? - A Test

Started by lostflower4, May 27, 2006, 08:31:46

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

lostflower4

Although I've grown to accept it more and more over the last year, I've always been a bit skeptical of the various compressed lossless formats out there. At first I didn't trust them at all, but after the widespread sharing of them on the net I really grew to like them. And most recently I've become interested in archive things in smaller formats as opposed to wav, but I wanted to make sure I could fully trust these so-called lossless formats. People always say "it's lossless", but that doesn't really prove much, does it?

So what I did today was convert some wav files to various formats (some were converted multiple times, as you will see). To test how they compared to the originals, I compared the MD5 checksums of the post-conversion wav files to the original ones. Using these checksums is a great way to check file integrity. You can get some info here, as well as other places on the net:

http://en.wikipedia.org/wiki/Md5

http://en.wikipedia.org/wiki/Checksum


Onto the tests... I used five different programs to convert the files, mainly as a way to make sure all programs produce the same results. The programs used were as follows:

WinRAR
dBPowerAMP
FLAC Frontend
Adobe Audition
PlexTools Professional

And the following formats were tested:

Wav (wav)
Zip (zip)
RAR (rar)
Free Lossless Audio Codec (flac)
Shorten (shn)
Monkey's Audio (ape)
WavPack (wvpk)
Apple Lossless (apple)


Two wav files were used as the benchmarks. "Original Wav" is a live version of "Want" by our favorite band. "Original Wav 2" is "Stockholm Syndrome" by Muse. The second wav was used for the Apple lossless test (thanks to Marika for her help in this).

Now the results:

Part 1 (Windows formats test):






Part 2 (Apple format test):




Part 3 ("novelty" tests):




The novelty tests were cases in which I did not expect the results to match up, but on a few I tried to trick the testing method. On the file indicated -0.001 sec, I deleted 1/1000 of a second from the beginning of the original file. On the file indicated -0.01 dB, I lowered the volume of the original wav file 1/100 of a decibel. And on the file indicated as "flanger", I added some flanger to it. Both the smallest and most drastic changes to the original files yielded different checksums, but endless conversions among  lossless formats didn't change anything. By these tests, it appears that ALL of the popular lossless formats are 100% identical to the original source. It's also clear that you could convert these files hundreds (thousands) of times between all the different formats and they still wouldn't change! So now you don't have to be so afraid to change all those ancient SHN files to something a little more up-to-date. ;)

I also compared to mp3 and OGG files just for the fun of it:

mp3(1) = LAME Alt Extreme Preset
mp3(2) = LAME "High Quality" @ 320 kbps
mp3(3) = Fraunhofer "Radium" codec @ 192kbps
ogg = OGG vorbis @ 500 kbps

Of course, they don't match up the original...


As far as lossless files go, the only real consideration now seems to be which format to use. As I see it, there are three things to consider: file size, compatibility, and encoding/decoding speed. Here are the file sizes I got from the above tests followed by the percentage relative to the size of the original file, starting with wav (no compression):


wav   55.2 MB - 100.0%
zip     51.2 MB -  92.8%
shn    39.7 MB -  71.9%
rar     39.4 MB -  71.4%
flac    34.8 MB -  63.0%
wvpk  33.6 MB -  60.9%
ape    32.2 MB -  58.3%


The highest levels of compression were used for the various tests. It's interesting to note that RAR at the "best" compression level outperforms SHN. The Apple lossless has not yet been tested on the first wav file, but on the second file it achieved results virtually identical to those of FLAC at level 8. But as the Apple is a proprietary format (closed source) and is only fully usable with iTunes, it probably won't ever catch on as a widespread format in trading communities. Monkey's Audio (APE) using the "insane" compression mode produces the best compression of all formats, but it is tediously slow for encoding/decoding compared to other formats.

I've narrowed it down to either FLAC or WavPack as my ideal format. FLAC at level 6 (not shown above) is produces the best speed-size-compatibility combination as far as I can see. However, the file size is often 1-3 MB larger than that of WavPack using the highest compression level. FLAC level 8 (shown above) gets much closer to WavPack here, but it's much slower than FLAC Level 6.

As far as only speed and size go, I'm a fan of WavPack (in its high compression mode). It produces the second smallest files almost as quickly as FLAC in level 6.

This turned into a much longer message than I expected, but my main point is to prove that lossless formats are in fact what they are advertised to be. I've heard about tapers deleting their master files after converting to FLAC. Now this doesn't seem so extreme to me. In fact, it's very logical. I also conducted some tests on lossless video formats (Huffyuv and Canopus Lossless). I won't go into the details, but they also seem to be truly lossless.

In conclusion, it appears that "lossless" isn't a loosely used term. It's based on real mathematics after all. I'm sold on the idea now, and maybe you will be too now if you weren't already... :D

bluewater

Thanks for this comprehensive and great article! I think there?s only one possible downside to using lossless files:

- less error correction data compared to the original. it would be interesting
to test scratched cd?s that have flac files on them to cd?s having original
cdda audio. which one would be more critical to physical damages on the layer surface? and which one is more reliable as a storing media in case of flac or other lossless files, dvd-r, dvd+r, or cd-r?
Life's too short to listen to lossy music

japanesebaby

according to my experience you can still mostly play a scratched audio cd with wav files on it: it probably skips and jumps and might get stuck at some point unless you fast forward past that point. some players might also refuse it althogether while some would still play the unscratched part of it. but generally you could still play major part of it with a proper player. and of course it depends of the degree of the damage - if it's bad enough then it's a goodbye. and i've also noticed that if the very beginning of the disc is scratched then you often cannot play it anymore, even though the damaged didn't seen so bad(?).

however a scratched datadisc with flac files on it isn't readable at all anymore: it gets scratched badly enough and it's useless. one might think this favors audio cds as more secure storage format, but i don't think it's that simple. there are really many more aspects to it. and aren't the wav files actually more prone to errors  IF one doesn't use some secure program with secure settings when making the copies(?).
and you can also look at it the other way: if something has to get damaged then maybe it's better lose it all and start looking for a new copy than have some sad crippled version there reminding you about the whole thing.  ;-)

i don't know about differences between dvd-/+rs or cd-rs, but there are also scratchproof dvd-rs available (dvd+rs probably too, but i never use them so i haven't had any of those). i suppose these should be more secure - at least they are a bit more expensive so i like to think i'm not paying some extra for nothing... but so far none of them has gotten accidentally scratched and i haven't tried scratching one and see what happens, though.


and thanks to caley for this interesting and profound survey - i think many of us has been thinking about these things every now and then but never bothered to look into it (we're all so awfully lazy... 8) )
and you really proved the myth about shn not being entirely lossless to be false. i don't know why but at least i had heard that being said many times in many occasions, how shn wasn't designed for storing/encoding audio files and how it wasn't quite lossless then. well i guess one shouldn't believe everything one hears...  ;-)

marika
Ay, in the very temple of Delight
Veil'd Melancholy has her sovran shrine

bluewater

As all the lossless methods produce the same results (according to Caley), the quality question reduces down to two steps in the process:

1. ripping with a dvd- or cd- rw drive. (the most critical step).
2. storage media for re-doublication (if things get damaged)

Firstly to japanesebaby: cd-audio is not in wav - format, but in raw pcm format. The ripping program usually decides to change the audio to .wav format for best compatibility.

Audio cd?s have a mathematical method for correcting errors to produce identical results even if the disc gets damaged. The limit wherein an error correction can be done perfectly is up to about 2 millimeters of physical damage on the cd-disc surface. This is probably superior compared to error correction on flac - data disc.

EAC (exact audio copy) should be a good program for ripping, but on certain combinations of a scratched disc and a bad drive, secure - mode will not rip the disc. That Eac-secure won?t rip a disc at all doesn?t mean that the disc can?t be ripped. If Eac- secure fails, some other drive or some other method for ripping can produce perfect results in the theoretical (2- 2,5mm) physical limit.

Some further information about the two steps that besides lossless coding
can rise quality questions:

1. the best cd/dvd drives for audio - rips, statistical results:

http://www.hydrogenaudio.org/forums/index.php?showtopic=45643&hl=

2. about cd-r audio disc error correction

http://www.hydrogenaudio.org/forums/index.php?showtopic=17718&hl=error+correction

And last, some mathematics used on cd-audio level error correction. I wrote my graduate thesis on galois theory that is used in designing error correcting codes. This is some university article too, (hard theory stuff):

http://www.ee.washington.edu/conselec/CE/kuhn/cdaudio2/95x7.htm
Life's too short to listen to lossy music

japanesebaby

Quote from: bluewaterFirstly to japanesebaby: cd-audio is not in wav - format, but in raw pcm format. The ripping program usually decides to change the audio to .wav format for best compatibility.

yes, sure. i called it .wav here simply because that's what most people seem to have on their finished audio cds. of course .wav isn't even the only possible format but it could be .aiff too just as well, and the stuff is originally raw pcm like you said. but be it .wav or .aiff, the result what comes to the way a scratched disc in general seems to behave is the same. my observations were surely on very general level but i only wanted to point out that there's a notable difference in the way a scratched audio disc or a scratched data disc tend to behave in general.

thanks for interesting links - some good holiday reading!   ;-)

marika
Ay, in the very temple of Delight
Veil'd Melancholy has her sovran shrine

bluewater

Quote from: japanesebaby
thanks for interesting links - some good holiday reading!   ;-)

marika

8) . Holiday did made me make one mistake on the error correction thing. Error correction on a cd-rom can actually be better than on a audio-cd. Audio cd players interpolate from 2.47mm critical limit up to even  8 mm of damage the audio which means that the result will not be identical to original, but many people will not notice the difference in interpolated audio samples vs. correct (destroyed from big scratch) samples. Thus you had a right in saying that cd-rom is better than audio-cd if 100% perfection is required. But if perfection is not required, it is the other way round.

edit: The best way to make backup of recordings is to encode them to flac, and seed them on a torrent tracker to wide audience of leechers. Then you will have a copy of your files on 10-100 hard disks all over the world, and if the tracker is alive and well even if your house gets burned or some catastrophe happens, as soon as you log into internet again they are available to you from a torrent tracker.
Life's too short to listen to lossy music

lostflower4

#6
If you're not keen on checksums, I've come up with a second proof that the lossess formats are indeed perfect. There is a scientific law that if you take a sound and pair it with its mirror image, the sound waves will cancel each other out, resulting in silence.

You can test this for yourself, but I've made a simple demonstration. I took a wav, and then converted it to the extreme, just to make a point. 8). At this point I had both the original wav file and the one that had been converted several times. I opened one of the files in Audition and selected the "invert" function. Then I inserted both the original/unconverted file and the converted/inverted file into the multitrack and played them simultaneously. Absolute silence. I have to say it's pretty strange seeing two fairly loud wav files playing with no sound coming out.

Here are the two files I used in my experiment. The conversion steps for the second one are denoted.


File #1

WAV

File #2

WAV > FLAC > SHN > WAVPACK > APE > WAV (Inverted)


If you don't have a multitrack program on your computer, you can use the trusty ol' Sound Recorder program in Windows. Yes, it's still there. :wink: Here are two ways to access it:

1) start, run, and type "sndrec32"
2) or you can probably find it at C:\Windows\system32\sndrec32.exe

Open the original file, then go to edit, "mix with file", and select the inverted file. Hit play. If you did everything right, you won't hear anything. If the lossless conversions weren't 100% perfect, you would hear some background noise. Of course, if you mix any other two non-silent files (or even a file with itself), you're going to hear something. This is all proven science, once again demonstrating that the lossless formats are perfect.

So again, if people are afraid of updating their SHN files to FLAC â€" or just using any lossless format in general â€" they are PARANOID! :smt105  I admit I was one of those paranoid people before, but now I have two irrefutable proofs. I think this second one is even more resounding.
   
And actually, it is possible to make perfect extractions from audio CD and endlessly convert, burn back to CD, extract, burn, convert, extract, etc. I've also tested this using both checksums and the inversion test, but proper extraction depends on a lot of things. Using data discs is much more foolproof. But that's another subject for another time.

Questions/comments/criticisms are welcome, but I think I've got an airtight case here.

bluewater

Quote from: lostflower4
So again, if people are afraid of updating their SHN files to FLAC ? or just using any lossless format in general ? they are PARANOID! :smt105  I admit I was one of those paranoid people before, but now I have two unrefutable proofs. I think this second one is even more resounding.

Thanks for this test! It would be interesting for reference to hear the digital noise that results in converted lame --aps .wav to inverse of the original. There should be digi-noise there... This test i?m too lazy to do myself, but it would be nice.

Paranoid?  :evil:  :lol: That is a fun word, but the other word which is NOT ok to use on ANY public forum is "paranoia". See the difference? Well, think about this: you go into a store. After you have paid your things you shout to the people behind the desk out loud "PARANOIA"! They will call the police if you do so. Don?t do that test, never!  :lol: But if you shout "Paranoid!" they will not press the button behind the desk, just laugh.
Life's too short to listen to lossy music

bluewater

Another test using Caley?s inverse + original method.

Downmix the inverse .wav to mono, and place it to left or right channel only,
and put the original un-inversed .wav to other channel. You can now check if your speaker's stereo imaging is perfect. The sound coming from the left channel should namely result in complete silence with the sound coming from right channel, if your stereo imaging is good.

This is also a patented method, the inverse sound is calculated real-time, and it cancels the noise in some environments, where silence must be made without ear-plugs.
Life's too short to listen to lossy music

lostflower4

Quote from: bluewaterIt would be interesting for reference to hear the digital noise that results in converted lame --aps .wav to inverse of the original. There should be digi-noise there... This test i?m too lazy to do myself, but it would be nice.

MP3 tests are a bit tricky, because of the stupid microgaps. I created an mp3 file with LAME, and it created microgaps at both the beginning and end of the song. Since there would be a slight offset, there would be virtually no cancellation at all. You could try lining it up manually, but this could be tedious and prone to error. So I settled on a similar test - OGG.

I did one test with an inverted OGG at 192 kbps, and another at 500 kbps. The results went as predicted. Hear them here:

OGG @ 192 kbps

OGG @ 500 kbps


OGG at 500 kbps is about as good as it gets in the lossy domain. But it's clearly not perfect...

Time to test my stereo imaging... Great idea! :wink:

Oso Blanco

I really appreciate your efforts, but I am too tired and too lazy to read through all this. My simple question: Is FLAC really a 100 % lossless? I don't know, but I don't completely trust this format. Everytime a file is re-encoded into another format, the original file is changed.

I'd prefer WAV files for really important things, but that takes a lot of patience when it comes to uploading/downloading.
Time is the fire in which we burn ...

lostflower4

#11
Well, I'm not sure I can explain it much more succinctly than I already did, but yes... FLAC and other lossless formats are 100% lossless, bit-for-bit â€" no question about it.

The reason that the file size can go down is because WAV isn't perfectly efficient for storing sound data. When converting FLAC, for example, the bits are stored more efficiently, and thus take up less space. It's the same principle as putting something in a ZIP or RAR file, but FLAC can get the size down even more because it's optimized for audio.

When you play a FLAC file, or convert it to WAV for other purposes, it transforms into a WAV file that's 100% identical to what you started with.

I used to be skeptical of this too, but when you do the checksum tests as shown above â€" or even better â€" a direct binary comparison, which I've also done, it's irrefutable. There's no question that FLAC, SHN, WavPak, APE, etc. are 100% lossless.

:rocker

bluewater

Quote from: Oso Blanco on March 07, 2007, 07:47:35
Is FLAC really a 100 % lossless? I don't know, but I don't completely trust this format. Everytime a file is re-encoded into another format, the original file is changed.

If the checksum's match, then the files are identical. This means that there's no
change in audio data (it's theoretically impossible that checksum's would match if the files are different). FLAC and other lossless codecs don't use any lossy compression on the audio or any psychological acoustic modelling. Results are then bitwise identical like lostflower has checked here.

bluewater
Life's too short to listen to lossy music

rjl

Exactly - WAV filesizes are pretty much based upon resolution, sampling rate, number of channels and time. Two different 44.1khz/16-bit stereo tracks of the exact same length will be the same size file.

Lossless compression formats simply analyze the data and pack it more efficiently. I'm sure that the process involves taking an inventory of a file's makeup, especially redundant or repeating data, and writing up some "instructions" for piecing the file back together again.

As an experiment make 3 text files...

One with a gazillion "a"'s.

One with the an equal number of "b"'s.

Another with an equal amount of characters: half "a"'s, half "b"'s.

Name, save and zip all of them as .ZIP files.

The first two files (consisting of a gazillion identical characters) will have the same filesize. The one consisiting of gazillion mixed characters will have a larger filesize.

Granted, if dealing with small files, the filesizes may be close, as there is some overhead in the finished ZIP file dedicated to info. about the original file.

Now, record two WAV files, with the same attributes (16/44.1, stereo, exact length). Heck, have one of them be completely silent, or with sparse sound. They will have the same filesize. Compress them as FLAC, with the same compression level, and I'll bet that you'd have wildly different filesizes. Or at least appreciably different, depending upon content and it's nature, etc.

However, they will both decode to files identical to their source WAVs, as can be seen using MD5 checksums, as displayed above.

The checksums are really neat, as they allow you to trace a fileset back to the original encoding by whomever encoded it. In the case of a taper doing the transfer & encode, you can trace the files all the way back to the recording's source, now with perfect knowledge that you have a bit-for-bit EXACT clone of the lowest-possible circulating copy!

Really neat stuff.

I apologize if I am rambling... at work and have to keep stopping and starting the reply...