Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/implementation of mimca #163

Open
wants to merge 5 commits into
base: dev
Choose a base branch
from

Conversation

JulienRoussel77
Copy link
Collaborator

No description provided.

@JulienRoussel77 JulienRoussel77 marked this pull request as ready for review September 27, 2024 15:47
Copy link
Collaborator

@hlbotterman hlbotterman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comments:

  • typing in the functions (not only in docstring)
  • I do not understand the structure of the 3 files in imputations/mimca folder and utils. There seems to be some redundancies
  • some tests do not pass
  • remove comments

from tqdm import tqdm


def moy_p(V, weights):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not hesitate to give more explicit name for functions

missing_indices = rng.choice(total_values, n_missing, replace=False)
row_indices = missing_indices // n_cols
col_indices = missing_indices % n_cols
for i in range(n_missing):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a vectorization to avoid loop/
Suggestion :
...
missing_indices = rng.choice(total_values, n_missing, replace=False)
row_indices, col_indices = np.unravel_index(missing_indices, (n_rows, n_cols))
data.values[row_indices, col_indices] = np.nan
return data

return df_reconstructed

def imputeMCA(
don,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename "don" ? Not very an explicit name...

dict
Dictionary containing:
- "tab_disj": Disjunctive coded table after imputation.
- "completeObs": Complete dataset with missing values imputed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename completeObs ? usually, variable names in python do not contain capital letter, but _ if multiple "words"

- "completeObs": Complete dataset with missing values imputed.

"""
don = pd.DataFrame(don)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why pd.DataFrame(don) if don is already a DataFrame (specified as such in docstring) ?

if (
not pd.api.types.is_numeric_dtype(don[col])
or don[col].dtype == "bool"
): # noqa: E501
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need of E501

- "completeObs": Complete dataset with missing values imputed.

"""
# Ensure the data is a DataFrame
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: since it is expected a DataFrame in input, check the type of "don". If not dataframe, report in a log and convert it.

Z = Z.subtract(Z_mean, axis=1)
Zscale = Z.multiply(np.sqrt(M), axis=1)

print("Centered and scaled data (Zscale):")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not fan of print.
Use logging instead

@@ -0,0 +1,665 @@
import numpy as np
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add general documentation for the file.


"""
if verbose:
print(f"{print_msg}...", end="", flush=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging instead of print

else:
row_w = np.array(row_w, dtype=float)
row_w /= row_w.sum()
ncp = int(min(ncp, X.shape[0] - 1, X.shape[1]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why shape[0] - 1 ?
I think your tests do not passed because of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants