Python: how to zip a text string • Programmer

In order to reduce the size of a long text string, for example, to minimize traffic when sending some text data over the Internet, it can be compressed before sending and unzipped after receiving. The size of transmitted data is significantly reduced in comparison with sending text strings in their original format.

To zip a text string in memory, we can use the “zlib” module.

Let’s use the “compress” function to compress the string. This function takes a byte string as an input parameter and returns the compressed byte string.

import zlib

long_text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.'

print('long_text', len(long_text))

long_text_compressed = zlib.compress(long_text.encode('utf-8'))

print('long_text_compressed', len(long_text_compressed))

# long_text 445
# long_text_compressed 270

import zlib

long_text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.'

print('long_text', len(long_text))

long_text_compressed = zlib.compress(long_text.encode('utf-8'))

print('long_text_compressed', len(long_text_compressed))

# long_text 445

# long_text_compressed 270

As we can see with this simplest example, the line size has been reduced by more than one and a half times.

Since “zlib.compress” takes a byte string as an input argument, we pre-encoded the source text into a set of UTF-8 bytes with the “encode” function.

As a result of compression, we also received a byte string. If the data transfer protocol requires the transfer of a text string, we can get it by encoding the result in base64.

import base64

long_text_compressed_b64 = base64.b64encode(long_text_compressed)

print('long_text_compressed_b64', len(long_text_compressed_b64))

# long_text_compressed_b64 360

import base64

long_text_compressed_b64 = base64.b64encode(long_text_compressed)

print('long_text_compressed_b64', len(long_text_compressed_b64))

# long_text_compressed_b64 360

As we can see, the size of the line has increased, but still remains smaller than the size of the original text string. With longer texts, the difference between compressed and uncompressed texts will increase.

To get the original text from the compressed, we need to decode and unzip it. Let’s do the reverse operation:

decoded_b64_text = base64.b64decode(long_text_compressed_b64)
undompressed_text = zlib.decompress(decoded_b64_text).decode('utf-8')

print(undompressed_text)

# Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

decoded_b64_text = base64.b64decode(long_text_compressed_b64)

undompressed_text = zlib.decompress(decoded_b64_text).decode('utf-8')

print(undompressed_text)

# Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.