Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Engineering using Lambda Layers for an end to end training pipeline. #812

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

fernandonieuwveldt
Copy link

@fernandonieuwveldt fernandonieuwveldt commented Feb 26, 2022

In this example we look at how we can create a full training and inference pipeline implemented only using the Keras library. As we build up our graph we also visualize our network.

We will end up with only one artifact containing the full pipeline. This can easily be deployed and you do not need to create features with other libraries before feeding data to your model.

Feature engineering will be part of our network.

@google-cla
Copy link

google-cla bot commented Feb 26, 2022

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

For more information, open the CLA check for this pull request.

@fernandonieuwveldt fernandonieuwveldt changed the title Example/feature engineering lambda Feature Engineering using Lambda Layers for an end to end training pipeline. Feb 27, 2022
Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Feature engineering for categorical data is a great topic. However, using Lambda layers is not recommended. They're not safely serializable and I wouldn't recommend them in production for this reason.

We also already have an example on structured data feature engineering here: https://keras.io/examples/structured_data/structured_data_classification_from_scratch/

I would recommend turning your example into a tutorial that focuses on something that's absent from the example above. Perhaps we could take the approach of doing the feature engineering in a single Layer subclass that takes in a dict of data. What do you think?

@fernandonieuwveldt
Copy link
Author

Thanks for the PR! Feature engineering for categorical data is a great topic. However, using Lambda layers is not recommended. They're not safely serializable and I wouldn't recommend them in production for this reason.

We also already have an example on structured data feature engineering here: https://keras.io/examples/structured_data/structured_data_classification_from_scratch/

I would recommend turning your example into a tutorial that focuses on something that's absent from the example above. Perhaps we could take the approach of doing the feature engineering in a single Layer subclass that takes in a dict of data. What do you think?

@fchollet That sounds great yes. I will have a look at changing it to subclass Layer class.

@fernandonieuwveldt
Copy link
Author

fernandonieuwveldt commented Mar 8, 2022

Thanks for the PR! Feature engineering for categorical data is a great topic. However, using Lambda layers is not recommended. They're not safely serializable and I wouldn't recommend them in production for this reason.

We also already have an example on structured data feature engineering here: https://keras.io/examples/structured_data/structured_data_classification_from_scratch/

I would recommend turning your example into a tutorial that focuses on something that's absent from the example above. Perhaps we could take the approach of doing the feature engineering in a single Layer subclass that takes in a dict of data. What do you think?

@fchollet Hi Francois. Thanks for the suggestion. Changes made to use a single feature layer by subclassing Layer class and using a dict of Input objects as input. Please let me know if this is also what you had in mind.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. A lot of the complexity here comes from the fact that you use a separate Input layer for each feature in the data, which isn't necessary if you use a Layer subclass. In addition, we should be showcasing Keras preprocessing layers.

I recommend something like:

class FeaturePreprocessing(layers.Layer):
    def __init__(self):
        # Create preprocessing layers that will be needed for feature encoding / normalization / etc

    def adapt(self, dataset):
        # Split the dataset into individual feature datasets and use them to adapt the previously created  layers

    def call(self, data):
        # Preprocess the data dict with the previously created layers, then concatenate the features
        

Does that make sense? Perhaps a different dataset might be a better fit too, since we're going to want to do things like:

  • Indexing a set of categorical string values
  • Indexing a set of categorical int values
  • Normalizing numerical features
  • Hashing large categorical feature spaces
  • etc.

@fernandonieuwveldt
Copy link
Author

fernandonieuwveldt commented Mar 21, 2022

Hi @fchollet . Thanks for the suggestions. We now have one layer for feature preprocessing that utilises keras preprocessing layers. I have implemented the class you suggested. It contains multiple preprocessing layers and combinations of them.

Please let me know what you think. Is the dataset used fine for showcasing preprocessing layers?

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

@pcoet
Copy link
Collaborator

pcoet commented Aug 15, 2023

Hi @fernandonieuwveldt, thanks again for this PR. Are you planning to make the requested changes? Let us know if you're still working on this. Otherwise we'll close the request. Thanks!

@fernandonieuwveldt
Copy link
Author

Hi. Let me give it another go and than we can see if this will be a good addition to the website.

@fernandonieuwveldt
Copy link
Author

Hi. Let me give it another go and than we can see if this will be a good addition to the website.

Hi @fernandonieuwveldt, thanks again for this PR. Are you planning to make the requested changes? Let us know if you're still working on this. Otherwise we'll close the request. Thanks!

Hi. I made the requested changes. Hope we can still work further on this.

@fernandonieuwveldt
Copy link
Author

@fchollet @pcoet Let me know if the changes look good. All the requested changes have now been implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants