Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding the domain of the Monte Carlo random points #4

Open
lanstonchu opened this issue Apr 11, 2022 · 0 comments
Open

Regarding the domain of the Monte Carlo random points #4

lanstonchu opened this issue Apr 11, 2022 · 0 comments

Comments

@lanstonchu
Copy link

Regarding em_bench_high.py, for the computation of lim_inf, lim_sup and the generation of unif, shall we use X (i.e. concatenation of X_train and X_test) instead of X_ (i.e. X_test only)?

I understand that if the distribution of testing data and training data are very different, MV would become inaccurate if we fire the Monte Carlo points based on the range of testing data only, as those MC points cannot reached the range of the training data. It also seems to me that using X instead of X_ would be more accurate to compute Leb(s >= u). Meanwhile, if we want to follow the same logic of the basic file em_bench.py, I believe we should use the concatenation instead of the testing data only?

Below are the key lines of the current file em_bench_high.py:

    X_train_ = X_train[:, features]
    X_ = X_test[:, features]

    lim_inf = X_.min(axis=0)
    lim_sup = X_.max(axis=0)
    volume_support = (lim_sup - lim_inf).prod()
    if volume_support > 0:
        nb_exp += 1
        t = np.arange(0, 100 / volume_support, 0.001 / volume_support)
        axis_alpha = np.arange(alpha_min, alpha_max, 0.001)
        unif = np.random.uniform(lim_inf, lim_sup,
                                 size=(n_generated, max_features))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant