I'm trying to use a very large numpy array using numpy memmap, accessing each element as a ctypes Structure.
class My_Structure(Structure):
_fields_ = [('field1', c_uint32, 3),
('field2', c_uint32, 2),
('field3', c_uint32, 2),
('field4', c_uint32, 9),
('field5', c_uint32, 12),
('field6', c_uint32, 2),
('field7', c_uint32, 2)]
def __str__(self):
return f'MyStruct -- f1{self.field1} f2{self.field2} f3{self.field3} f3{self.field4} f5{self.field5} f6{self.field6} f7{self.field7}'
def __eq__(self, other):
for field in self._fields_:
if getattr(self, field[0]) != getattr(other, field[0]):
return False
return True
_big_array = np.memmap(filename = 'big_file.data',
dtype = 'uint32',
mode = 'w+',
shape = 1000000
)
big_array = _big_array.ctypes.data_as(ctypes.POINTER(My_Structure))
big_array[0].field1 = 5
...
And it seems to work correctly, but I'm getting an fault on a 64bit Windows machine where python.exe simply stops. In Event Viewer, I see that the faulting module name is _ctypes.pyd
and the exception code is 0xc0000005 which I believe is an access exception.
I don't seem to be getting the same error on Linux, though my testing has not been thorough.
My questions are:
Does my access look correct; ie. am I using numpy.memmap.ctypes.data_as
correctly?
Does the fact that I have functions (__str__
and __eq__
) defined on My_Structure
change its size? ie. can it still be used in the array as a uint32
?
Is there anything that you think might cause this behavior? Particularly considering the differences between Windows and Linux?
EDIT:
Using ctypes.addressof
and ctypes.sizeof
on big_array elements, it looks like the __str__
and __eq__
do not impact the size of My_Structure
I added some asserts before my access to big_array
and found that I was attempting to access big_array[-1]
, which explains the access error and crash.
Which leaves question 1 open: It looks like my code is technically correct, but I'm wondering if there is a better way to access the numpy array than using a ctypes.pointer so that I still get the benefits of using a numpy array (out-of-bound access warning, negative index wrapping, etc.). Daniel below suggested using a structured numpy array, but is it possible to do bitfield access with this?
You can cast to ctypes
at the last step, not the first step:
_big_array[0, ...].ctypes.data_as(ctypes.POINTER(My_Structure)).field1 = 5
Note that ...
is needed to keep the result as a 0d array, so that the .ctypes
attribute exists
Now of course, negative indexing will work just fine:
_big_array[-1, ...].ctypes.data_as(ctypes.POINTER(My_Structure)).field1 = 5
Daniel below suggested using a structured numpy array, but is it possible to do bitfield access with this?
No
User contributions licensed under CC BY-SA 3.0