Read xlsx file using 1st row as header even using "open_options=['HEADERS=DISABLE', 'FIELD_TYPES=STRING'])" #323

michaelpopes · 2024-05-26T08:16:42Z

As you can see, there are two sheets in this Excel file. I wish to read all the cells as data, so I use this option "open_options=['HEADERS=DISABLE', 'FIELD_TYPES=STRING'])".

import duckdb

duckdb.install_extension("spatial")
duckdb.load_extension("spatial")

duckdb.sql(f"FROM st_read('DUCKDB_READ_XLSX_BUG.xlsx', layer='2', open_options=['HEADERS=DISABLE', 'FIELD_TYPES=STRING'])").show(max_width=198)
duckdb.sql(f"FROM st_read('DUCKDB_READ_XLSX_BUG.xlsx', layer='3', open_options=['HEADERS=DISABLE', 'FIELD_TYPES=STRING'])").show(max_width=198)

The results are different for the two sheets. The result of sheet "2" diables the header, which is correct. But the result of sheet "3" still uses 1st row as head, which is wrong. I don't see any difference between sheet "2" and sheet "3" to cause the different results.

Python 3.12.3 (tags/v3.12.3:f6650f9, Apr  9 2024, 14:05:25) [MSC v.1938 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import duckdb
>>> duckdb.sql(f"SELECT VERSION()").show(max_width=198)
┌───────────┐
│ version() │
│  varchar  │
├───────────┤
│ v0.10.3   │
└───────────┘

>>>
>>> duckdb.install_extension("spatial")
>>> duckdb.load_extension("spatial")
>>>
>>> duckdb.sql(f"FROM st_read('DUCKDB_READ_XLSX_BUG.xlsx', layer='2', open_options=['HEADERS=DISABLE', 'FIELD_TYPES=STRING'])").show(max_width=198)
┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ Field1  │ Field2  │ Field3  │ Field4  │ Field5  │ Field6  │ Field7  │ Field8  │ Field9  │ Field10 │ Field11 │ Field12 │ Field13 │ Field14 │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ AC      │ BC      │ CC      │ DC      │ EC      │ FC      │ GC      │ HC      │ IC      │ JC      │ KC      │ LC      │ MC      │ NC      │
│ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │
└─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘

>>> duckdb.sql(f"FROM st_read('DUCKDB_READ_XLSX_BUG.xlsx', layer='3', open_options=['HEADERS=DISABLE', 'FIELD_TYPES=STRING'])").show(max_width=198)
┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│   AC    │   BC    │   CC    │   DC    │   EC    │   FC    │   GC    │   HC    │   IC    │   JC    │   KC    │   LC    │   MC    │   NC    │   OC    │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │
│ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │ NULL    │
└─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘

As you can see from the results, I am using python 3.12.3 and Duckdb v0.10.3.

The correct results should be that both sheet "2" and sheet "3" have the same head, like "Field1", "Field2", etc.

Thank you for your consideration!

The text was updated successfully, but these errors were encountered:

iku000888 · 2024-10-18T04:56:27Z

I ran into a very similar issue attempting to read a file with multiple sheets, where the first sheet respects HEADERS=DISABLE but not for the second sheet.

D .version
SQLite v1.1.1 af39bd0dcf
clang-15.0.0

iku000888 · 2024-10-18T05:02:47Z

I seem to be able to work around it by exporting the sheet of interest to a single sheeted xlsx file and then reading the exported xlsx. Not great but I can possibly live with it.

D COPY (select * from st_read('file.xlsx', layer = 'sheet')) TO 'tempoutput.xlsx' WITH (FORMAT GDAL, DRIVER 'xlsx');

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read xlsx file using 1st row as header even using "open_options=['HEADERS=DISABLE', 'FIELD_TYPES=STRING'])" #323

Read xlsx file using 1st row as header even using "open_options=['HEADERS=DISABLE', 'FIELD_TYPES=STRING'])" #323

michaelpopes commented May 26, 2024 •

edited

Loading

iku000888 commented Oct 18, 2024

iku000888 commented Oct 18, 2024

Read xlsx file using 1st row as header even using "open_options=['HEADERS=DISABLE', 'FIELD_TYPES=STRING'])" #323

Read xlsx file using 1st row as header even using "open_options=['HEADERS=DISABLE', 'FIELD_TYPES=STRING'])" #323

Comments

michaelpopes commented May 26, 2024 • edited Loading

iku000888 commented Oct 18, 2024

iku000888 commented Oct 18, 2024

michaelpopes commented May 26, 2024 •

edited

Loading