Regarding the domain of the Monte Carlo random points #4

lanstonchu · 2022-04-11T16:14:47Z

Regarding em_bench_high.py, for the computation of lim_inf, lim_sup and the generation of unif, shall we use X (i.e. concatenation of X_train and X_test) instead of X_ (i.e. X_test only)?

I understand that if the distribution of testing data and training data are very different, MV would become inaccurate if we fire the Monte Carlo points based on the range of testing data only, as those MC points cannot reached the range of the training data. It also seems to me that using X instead of X_ would be more accurate to compute Leb(s >= u). Meanwhile, if we want to follow the same logic of the basic file em_bench.py, I believe we should use the concatenation instead of the testing data only?

Below are the key lines of the current file em_bench_high.py:

    X_train_ = X_train[:, features]
    X_ = X_test[:, features]

    lim_inf = X_.min(axis=0)
    lim_sup = X_.max(axis=0)
    volume_support = (lim_sup - lim_inf).prod()
    if volume_support > 0:
        nb_exp += 1
        t = np.arange(0, 100 / volume_support, 0.001 / volume_support)
        axis_alpha = np.arange(alpha_min, alpha_max, 0.001)
        unif = np.random.uniform(lim_inf, lim_sup,
                                 size=(n_generated, max_features))

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding the domain of the Monte Carlo random points #4

Regarding the domain of the Monte Carlo random points #4

lanstonchu commented Apr 11, 2022

Regarding the domain of the Monte Carlo random points #4

Regarding the domain of the Monte Carlo random points #4

Comments

lanstonchu commented Apr 11, 2022