I'm was using sox to convert a 2 channels,48000Hz,24bits wav file(new.wav) to a mono wav file(post.wav). Here are the related commands and outputs:
[Farmer@Ubuntu recording]$ soxi new.wav
Input File : 'new.wav'
Channels : 2
Sample Rate : 48000
Precision : 24-bit
Duration : 00:00:01.52 = 72901 samples ~ 113.908 CDDA sectors
File Size : 447k
Bit Rate : 2.35M
Sample Encoding: 24-bit Signed Integer PCM
[Farmer@Ubuntu recording]$ sox new.wav -c 1 post.wav
[Farmer@Ubuntu recording]$ soxi post.wav
Input File : 'post.wav'
Channels : 1
Sample Rate : 48000
Precision : 24-bit
Duration : 00:00:01.52 = 72901 samples ~ 113.908 CDDA sectors
File Size : 219k
Bit Rate : 1.15M
Sample Encoding: 24-bit Signed Integer PCM
It is looks fine. But let us check the header of post.wav.
[Farmer@Ubuntu recording]$ xxd post.wav | head -10
00000000: 5249 4646 9856 0300 5741 5645 666d 7420 RIFF.V..WAVEfmt
00000010: 2800 0000 feff 0100 80bb 0000 8032 0200 (............2..
00000020: 0300 1800 1600 1800 0400 0000 0100 0000 ................
00000030: 0000 1000 8000 00aa 0038 9b71 6661 6374 .........8.qfact
00000040: 0400 0000 c51c 0100 6461 7461 4f56 0300 ........dataOV..
This is the standard wav file header structure.
The first line is no problem.
The second line "2800 0000" shows the size of sub chunk "fmt ", it should be 0x00000028(as this is little endian) = 40 bytes. But there are 54 bytes(before sub chunk "fmt " and sub chunk "data").
The third line shows "ExtraParamSize" is 0x0018 = 22 bytes. But actually it is 36 bytes(from third line's "1600" to 5th line's "0100"). The previous 16 bytes are standard.
So what's the extra 36 bytes?
Ok,I found out the answer.
Look at the second line, we can found that audio format is "feff", actual value is 0xFFFE, so this is not a PCM standard wave format, but a extensible format.
Wav head detailed introduction can refer to this link. The article is well written and thanks to the author.
So as this is a Non-PCM format wav, "fmt " chunk space occupied 40 bytes is no problem, and followed by a "fact" chunk, and then is "data" chunk, So everything makes sense.
User contributions licensed under CC BY-SA 3.0