You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried run NVIDIA Merlin on Microsoft’s News Dataset (MIND) tutorial ...
In running to Step 5: Feature Engineering - time-based features happened error:
data_train = nvt.Dataset(os.path.join(data_input_path, "train.parquet"), engine="parquet",part_size="256MB")
data_valid = nvt.Dataset(os.path.join(data_input_path, "valid.parquet"), engine="parquet",part_size="256MB")
dict_dtypes={}
for col in cat_features.columns:
dict_dtypes[col] = np.int64
for col in cont_features.columns:
dict_dtypes[col] = np.float32
for col in labels:
dict_dtypes[col] = np.float32
%%time
proc.fit(data_train)
%%time
**proc.transform(data_train).to_parquet**(output_path= output_train_path, ## <- this line error
shuffle=nvt.io.Shuffle.PER_PARTITION,
dtypes=dict_dtypes,
out_files_per_proc=10,
cats = cat_features.columns,
conts = cont_features.columns,
labels = labels)
/core/merlin/io/dataset.py:863: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1253: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1253: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
Failed to transform operator <nvtabular.ops.lambdaop.LambdaOp object at 0x7fa63bd86a00>
Traceback (most recent call last):
File "/nvtabular/nvtabular/workflow/workflow.py", line 485, in _transform_partition
raise TypeError(
TypeError: Improperly matched output dtypes detected in time, object and datetime64[ns]
distributed.worker - WARNING - Compute Failed
Function: _write_subgraph
args: (<merlin.io.dask.DaskSubgraph object at 0x7fa68c63f6d0>, ('part_0.parquet', 'part_1.parquet', 'part_2.parquet', 'part_3.parquet', 'part_4.parquet', 'part_5.parquet', 'part_6.parquet', 'part_7.parquet', 'part_8.parquet', 'part_9.parquet'), '/share/recommenders/MIND/processed_nvt/train', <Shuffle.PER_PARTITION: 0>, <fsspec.implementations.local.LocalFileSystem object at 0x7fa76da543a0>, ['time_hour', 'hist_cat_0', 'hist_subcat_0', 'hist_cat_1', 'hist_subcat_1', 'hist_cat_2', 'hist_subcat_2', 'hist_cat_3', 'hist_subcat_3', 'hist_cat_4', 'hist_subcat_4', 'hist_cat_5', 'hist_subcat_5', 'hist_cat_6', 'hist_subcat_6', 'hist_cat_7', 'hist_subcat_7', 'hist_cat_8', 'hist_subcat_8', 'hist_cat_9', 'hist_subcat_9', 'impr_cat', 'impr_subcat', 'impression_id', 'uid', 'time_minute', 'time_second', 'time_wd', 'time_day', 'time_day_week', 'time'], ['hist_count'], ['label'], 'parquet', 0, False, '')
kwargs: {} Exception: "TypeError('Improperly matched output dtypes detected in time, object and datetime64[ns]')"
I tried run NVIDIA Merlin on Microsoft’s News Dataset (MIND) tutorial ...
In running to Step 5: Feature Engineering - time-based features happened error:
/core/merlin/io/dataset.py:863: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1253: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1253: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
Failed to transform operator <nvtabular.ops.lambdaop.LambdaOp object at 0x7fa63bd86a00>
Traceback (most recent call last):
File "/nvtabular/nvtabular/workflow/workflow.py", line 485, in _transform_partition
raise TypeError(
TypeError: Improperly matched output dtypes detected in time, object and datetime64[ns]
distributed.worker - WARNING - Compute Failed
Function: _write_subgraph
args: (<merlin.io.dask.DaskSubgraph object at 0x7fa68c63f6d0>, ('part_0.parquet', 'part_1.parquet', 'part_2.parquet', 'part_3.parquet', 'part_4.parquet', 'part_5.parquet', 'part_6.parquet', 'part_7.parquet', 'part_8.parquet', 'part_9.parquet'), '/share/recommenders/MIND/processed_nvt/train', <Shuffle.PER_PARTITION: 0>, <fsspec.implementations.local.LocalFileSystem object at 0x7fa76da543a0>, ['time_hour', 'hist_cat_0', 'hist_subcat_0', 'hist_cat_1', 'hist_subcat_1', 'hist_cat_2', 'hist_subcat_2', 'hist_cat_3', 'hist_subcat_3', 'hist_cat_4', 'hist_subcat_4', 'hist_cat_5', 'hist_subcat_5', 'hist_cat_6', 'hist_subcat_6', 'hist_cat_7', 'hist_subcat_7', 'hist_cat_8', 'hist_subcat_8', 'hist_cat_9', 'hist_subcat_9', 'impr_cat', 'impr_subcat', 'impression_id', 'uid', 'time_minute', 'time_second', 'time_wd', 'time_day', 'time_day_week', 'time'], ['hist_count'], ['label'], 'parquet', 0, False, '')
kwargs: {}
Exception: "TypeError('Improperly matched output dtypes detected in time, object and datetime64[ns]')"
I environment refer [merlin-training:22.04]
Thanks!
The text was updated successfully, but these errors were encountered: