I have a binary data-set of known size that arrives in fixed-sized chunks. The chunks are out of order, but their position in the final result is known when I get them. Here is a simple example:
from random import sample, seed
import numpy as np
chunk_size = 10
chunk_count = 10
def generate_data():
seed(0xDEADBEEF)
for i in sample(range(chunk_count), chunk_count):
yield i, np.arange(i * chunk_size, (i + 1) * chunk_size, dtype=np.uint8)
My goal is to write this data to a file as it arrives:
with open('output.dat', 'wb') as output:
for i, d in generate_data():
output.seek(i * chunk_size)
d.tofile(output)
This seems to work well on my Windows Anaconda python 3.7 install: it creates a 100-byte file with the bytes 0-99:
>>> with open('output.dat', 'rb') as f:
... print(f.read())
b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abc'
I expect that it is python version agnostic, at least as far back as 2.7. I am not sure that it is as platform agnostic, but I would expect it to be.
The example above does not show any artifacts in the file because the data is contiguous once the loop terminates. If I introduce a missing block, I see zeros in the file:
def generate_data():
seed(0xDEADBEEF)
for i in sample(range(chunk_count + 2), chunk_count):
yield i, np.arange(i * chunk_size, (i + 1) * chunk_size, dtype=np.uint8)
with open('output.dat', 'wb') as output:
for i, d in generate_data():
output.seek(i * chunk_size)
d.tofile(output)
The file is 10 bytes larger since it contains one missing chunk. All the elements are placed correctly, including the zero-filled hole:
>>> with open('output.dat', 'rb') as f:
... print(f.read())
b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0023456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklm'
Is zero-fill consistent (documented) behavior I can rely on? Is the behavior of the holes OS-specific (as this question implies)? I have not been able to find anything python-specific regarding a write
following a seek
past the current end-of-file.
User contributions licensed under CC BY-SA 3.0