Reading a GZIP file in Java
Classes are provided in the java.util.zip package to read and write files in GZIP format.
The GZIP format, not to be confused with ZIP, is a popular file format used on UNIX systems to
compress a single file. The underlying compression mechanism is the DEFLATE
algorithm. Otherwise, the file format is relatively trivial, consisting of a header, the compressed
data and a trailer that includes a CRC of the decompressed data. Note that, unlike ZIP files,
a GZIP file per se has no concept of subfiles or individual "entries" in the archive; it is
a single compressed stream of data. (In practice, it is common to GZIP another
file which is a concatenation of different subfiles, such as a .tar file.)
To read the decompressed data from a GZIP file, we construct a GZIPInputStream around
the corresponding FileInputStream:
InputStream in = new GZIPInputStream(new FileInputStream(f));
// ... read decompressed data from 'in' as usual
(Of course, the data needn't actually be in a file. We could pass in any old InputStream:
for example the raw GZIP data could be cached in byte array and read from a ByteArrayInputStream.)
Decompressing to a file
To read the data from the GZIP file and write the decompressed data to another
file is fairly trivial. We repeatedly read a block of decompressed data into a buffer
before writing the contents of the buffer to file each time:
import java.io.*;
import java.util.zip.*;
public class Gunzipper {
private InputStream in;
public Gunzipper(File f) throws IOException {
this.in = new FileInputStream(f);
}
public void unzip(File fileTo) throws IOException {
OutputStream out = new FileOutputStream(fileTo);
try {
in = new GZIPInputStream(in);
byte[] buffer = new byte[65536];
int noRead;
while ((noRead = in.read(buffer)) != -1) {
out.write(buffer, 0, noRead);
}
} finally {
try { out.close(); } catch (Exception e) {}
}
}
public void close() {
try { in.close(); } catch (Exception e) {}
}
}
Reading from a ZIP file
The GZIP file format is common particularly on UNIX systems. On other systems
such as Windows, the ZIP file format is more common. The ZIP format is also used
for Java archive (jar) files. On the next page,
we look at how to read ZIP files in Java.
If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.
Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.