In order to reduce the size of a long text string, for example, to minimize traffic when sending some text data over the Internet, it can be compressed before sending and unzipped after receiving. The size of transmitted data is significantly reduced in comparison with sending text strings in their original format.
To zip a text string in memory, we can use the “zlib” module.
Let’s use the “compress” function to compress the string. This function takes a byte string as an input parameter and returns the compressed byte string.
1 2 3 4 5 6 7 8 9 10 11 12 |
import zlib long_text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.' print('long_text', len(long_text)) long_text_compressed = zlib.compress(long_text.encode('utf-8')) print('long_text_compressed', len(long_text_compressed)) # long_text 445 # long_text_compressed 270 |
As we can see with this simplest example, the line size has been reduced by more than one and a half times.
Since “zlib.compress” takes a byte string as an input argument, we pre-encoded the source text into a set of UTF-8 bytes with the “encode” function.
As a result of compression, we also received a byte string. If the data transfer protocol requires the transfer of a text string, we can get it by encoding the result in base64.
1 2 3 4 5 6 7 |
import base64 long_text_compressed_b64 = base64.b64encode(long_text_compressed) print('long_text_compressed_b64', len(long_text_compressed_b64)) # long_text_compressed_b64 360 |
As we can see, the size of the line has increased, but still remains smaller than the size of the original text string. With longer texts, the difference between compressed and uncompressed texts will increase.
To get the original text from the compressed, we need to decode and unzip it. Let’s do the reverse operation:
1 2 3 4 5 6 |
decoded_b64_text = base64.b64decode(long_text_compressed_b64) undompressed_text = zlib.decompress(decoded_b64_text).decode('utf-8') print(undompressed_text) # Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. |