Skip to content
This repository has been archived by the owner on Aug 13, 2020. It is now read-only.

Plan to support recursive data structures? #6

Open
MichaelChirico opened this issue Sep 24, 2019 · 6 comments
Open

Plan to support recursive data structures? #6

MichaelChirico opened this issue Sep 24, 2019 · 6 comments

Comments

@MichaelChirico
Copy link

A lot of my common use cases store map & array data types. It would be great to have support to read such parquet with miniparquet.

Is this out if scope?

@hannes
Copy link
Owner

hannes commented Sep 24, 2019

Are they stored as nested tables or more complex values? Also, can you provide some sample files please?

@MichaelChirico
Copy link
Author

I'm not sure how to answer about their storage, but the Hive type is array and/or map. Though those types are potentially recursive (and hence highly complex), I've only used one-level complexity (e.g. array(int) or map(int, varchar)).

Will try and create something & pass along. Any preferred medium?

@hannes
Copy link
Owner

hannes commented Sep 25, 2019

medium, e.g. wetransfer?

@MichaelChirico
Copy link
Author

yes, or dropbox, i could try gist...

@MichaelChirico
Copy link
Author

parquet_test.tar.gz

seems i can upload tar.gz here! i ran the following in SparkR and attached is the compressed output:

# spark start boilerplate
iris = iris
names(iris) = gsub('.', '_', names(iris), fixed = TRUE)
irisSDF = createDataFrame(iris)
irisSDF %>% createOrReplaceTempView('iris')

sql("
select 1 as int, 'a' as str, 1.1 as dbl,
       timestamp('2019-09-20T12:34:56Z') as ts,
       true as bool, date('2019-09-21') as dt,
       map(Species, Sepal_Length) as mp,
       array(Sepal_Width) as arr
from iris
") %>% write.parquet('/path/to/output')

@hannes
Copy link
Owner

hannes commented Sep 27, 2019

thanks, will see what i can do

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants