Float conversion issue screwing with numeric encoders. #27

germanjoey · 2019-04-25T06:40:31Z

I almost feel bad for reporting this one.

Using the yacht hydrodynamics UIC dataset, I got this error:

(env) (base) C:\Users\josep\Jeenee\AutoML\automl_train>python model.py -d ..\automl-testbench\yacht-hydrodynamics\data.csv -m train
Traceback (most recent call last):
  File "model.py", line 46, in <module>
    model_train(df, encoders, args, model)
  File "C:\Users\josep\Jeenee\AutoML\automl_train\pipeline.py", line 347, in model_train
    X, y = process_data(df, encoders)
  File "C:\Users\josep\Jeenee\AutoML\automl_train\pipeline.py", line 296, in process_data
    df['Length-beam ratio'].values, encoders['length_beam_ratio_bins'], labels=False, include_lowest=True, duplicates='drop')
  File "C:\Users\josep\Jeenee\AutoML\venv\lib\site-packages\pandas\core\reshape\tile.py", line 235, in cut
    raise ValueError('bins must increase monotonically.')
ValueError: bins must increase monotonically.

Hmmm, odd. Let's take a look at pipeline.py...

    # Length-beam ratio
    length_beam_ratio_enc = df['Length-beam ratio']
    length_beam_ratio_bins = length_beam_ratio_enc.quantile(
        np.linspace(0, 1, 10+1))
    encoders['length_beam_ratio_bins'] = length_beam_ratio_bins
    
    # ....

    # Length-beam ratio
    length_beam_ratio_enc = pd.cut(
        df['Length-beam ratio'].values, encoders['length_beam_ratio_bins'], labels=False, include_lowest=True, duplicates='drop')

The error is referring to the .cut line, which I had previously patched to include the duplicates='drop' bit. But the current error isn't related to that, but complaining about the encoder. Hmmm, nothing looks odd in the data about that column. Let's open up pdb and take a look...

>>> encoders['length_beam_ratio_bins']
[2.73, 2.76, 3.15, 3.15, 3.1499999999999995, 3.15, 3.17, 3.32, 3.51, 3.51, 3.64]

facepalm

Well now! I suppose I'll concede that's technically not monotonically increasing!

I appended a .round(4) to the two .quantile lines of encoders/numeric (lines 12 and 15), which worked for this test case. This is certainly not an adequate general solution, however, as e.g. that'll break data on data that needs precision at the 5th decimal place...

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Float conversion issue screwing with numeric encoders. #27

Float conversion issue screwing with numeric encoders. #27

germanjoey commented Apr 25, 2019 •

edited

Loading

Float conversion issue screwing with numeric encoders. #27

Float conversion issue screwing with numeric encoders. #27

Comments

germanjoey commented Apr 25, 2019 • edited Loading

germanjoey commented Apr 25, 2019 •

edited

Loading