Pyarrow write nested array to parquet

0

I want to write a parquet file that has some normal columns with 1d array data and some columns that have nested structure, i.e. 2d arrays.

I have tried the following:

import pyarrow as pa
import pyarrow.parquet as pq
import numpy as np

array1 = np.array([0, 1, 2], dtype=np.uint8)
array2 = np.array([[0,1,2], [3, 4, 5]], dtype=np.uint8).T

t1 = pa.uint8()
t2 = pa.list_(pa.uint8())

fields = [
    pa.field('a1', t1),
    pa.field('a2', t2)
]

myschema = pa.schema(fields)

mytable = pa.Table.from_arrays([
    pa.array(array1, type=t1),
    pa.array([array2[:,0], array2[:,1]], type=t2)],
    schema=myschema)

pq.write_table(mytable, 'example.parquet')

The table creation works as expected. The last line is where the issue lies. It causes the Python interpreted to crash.

On windows Python 3.6.4 64-bit I get the error code: EDIT: using pyarrow 0.11.1

Process finished with exit code -1073741819 (0xC0000005)

I have also tried in Windows Linux (WSL) using a separate install of Python 3.6.5 64-bit and I get: EDIT: using pyarrow 0.12.1

Segmentation fault (core dumped)

I have seen this post suggesting to reinstall Python, but since I've tried with two different installs so far I don't think this will help.

I can't see anything in the PyArrow docs to suggest writing nested arrays to Parquet doesn't work, I know there are issues with this in fastparquet

python
parquet
pyarrow
asked on Stack Overflow Mar 4, 2019 by S.B.G • edited Mar 5, 2019 by S.B.G

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0