View on GitHub

Ann Molly Paul

learn -- code -- inspire

Unzip a file in-memory, add content, and zip it back up

May 6, 2014

As a part of making the deployment process easy, I was involved in a project where our software needed to be prefilled with authentication credentials per customer basis. All this functionality had to be implemented at the view level which required the file to be unzipped into a memory buffer, modified and zipped back. This needed 2 iterations

Python has a very useful zipfile library for this purpose to read/write zipped files in memory.

zipped_file # This contains the zipped file
file_content_buffer = StringIO() # in_memory buffer
config_filepath = '/sample.conf'
with ZipFile(zipped_file, 'r') as zipread:
    with ZipFile(file_content_buffer, 'w', ZIP_DEFLATED) as zipwrite:
        for item in zipread.infolist():
            if item.filename.endswith(config_filepath):
                credentials = 'xxxxx-top-secret'
                zipwrite.writestr(item.filename, credentials)
            else:
                # copy the data over as is
                data = zipread.read(item.filename)
                zipwrite.writestr(item.filename, data)
file_content_buffer.seek(0)

response = HttpResponse(file_content_buffer.getvalue())
response['Content-Disposition'] = ('attachment; 
                                  filename=custom_zipped_file.zip')
response['Content-Type'] = 'application/x-zip'
return response

This worked well until I tested it out, only to find out that several of the symlinks within the zipped file dont get copied over. They appear as broken files with 0kb size. After a lot of research, it looks like the file properites dont get preserved hence rendering the file broken. This also manifests in the created timestamp where it's set to epoch 0. So, the following changes had to keep the file in tact.

with ZipFile(zipped_file, 'r') as zipread:
    with ZipFile(file_content_buffer, 'w', ZIP_DEFLATED) as zipwrite:
        for item in zipread.infolist():
            # Copy all ZipInfo attributes for each file since by
            # default the properties are not preserved.
            dest = ZipInfo(item.filename)
            dest.CRC = item.CRC
            dest.date_time = item.date_time #sets the date modified correctly
            dest.create_system = item.create_system
            dest.compress_type = item.compress_type
            dest.external_attr = item.external_attr # preserves symlinks
            dest.compress_size = item.compress_size
            dest.file_size = item.file_size
            dest.header_offset = item.header_offset
            if item.filename.endswith(config_filepath):
                # Sets up config file with valid credentials
                zipwrite.writestr(dest, credentials)
            else:
                data = zipread.read(item.filename)
                zipwrite.writestr(dest, data)
file_content_buffer.seek(0)