The value of md5sum for the same file would be different after Gziped

I got a latest dataset from the collaborator yesterday, some files were already included in the previous version. However I failed in “md5sum -c md5sum.text”, that really tossed  me greatly. When I did the further check and  I found there was not any differences between the decompressed files, so only the compression step changed the MD5!!!

An email from Ray reminded me. I found the following lines on wiki page of gzip and I could have a sweet dream tonight~

“gzip” is often also used to refer to the gzip file format, which is:

  • a 10-byte header, containing a magic number, a version number and a timestamp
  • optional extra headers, such as the original file name,
  • a body, containing a DEFLATE-compressed payload
  • an 8-byte footer, containing a CRC-32 checksum and the length of the original uncompressed data