Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Notebook-friendly connectors as importable classes #2685

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

jedrazb
Copy link
Member

@jedrazb jedrazb commented Jul 8, 2024

Generate elastic-connectors package with notebook-friendly connector classes

A lot of changes huh? Dw 95% is autogenerated code :) you can skip package/generated and package/docs, instead provide feedback on the actual template in scripts/package/codegen/templates

See package in action: colab notebook gist

Changes

I added a flow to auto-generate package code to turn the connectors framework into standalone importable connector classes that can be used independently of the framework application (connectors protocol dependent on special connectors indices).

The generated wrapper classes live under package/generate, they add DataSource config fields as constructor arguments, and use label together with tooltip to build docstrings.

The code gen scripts along with jinja2 templates lives under scripts/package.

The package.connector_base.ConnectorBase class is a class from which the generated classes inherit from. It provides some utils such as:

  • async context manager functions (to validate config on enter, and close our async dataprovider on exit)
  • async_get_docs that would both call get_docs and use local Apache Tika lib for content extraction
  • defines common constructor kwargs:
    • logger allows to pass custom logger to attach to the dataprovider logic
    • download_content flag, can be disabled (so Apache Tika is not fetched) when syncing with e.g. sql database where we don't do content extraction (since no files to download)

New requirements are specified under:

  • requirements/package.txt used by package logic
  • requirements/package-dev.txt used to build package

Automated code generation and packaging logic

This happens under the hood when you call make build_connector_package

Steps are as follows:

  • use scripts/package/codegen scripts and templates to update defs in package/generated
  • copy /connectors as well as /package/* and put it in temp folder /package/elastic_connectors
  • update imports to match elastic_connectors namespace
  • generate markdown api spec docs with lazydocs
  • build package with package/setup.py
  • publish generated package with twine (for now to testpypi)

Constructor + Docstrings = Hints from python language server

Screenshot 2024-07-11 at 12 17 29

Potential improvements

  • the package has A LOT of dependencies, since we have a lot of data source integrations, could be great to have per-source requirements list so that we could have multiple packages like
    • elastic_connectors[google_drive] or elastic_connectors[sharepoint_online] - so that you only install stuff that you need
  • all/connectors code is accessible in the package (we don't document but folks could e.g. import ConcurrentTask)
  • the way we build the package is a "naive" way, to be checked with our Python team what is best way to build and manage such package

Pre-Review Checklist

  • this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check config.yml.example)
  • this PR has a meaningful title
  • this PR links to all relevant github issues that it fixes or partially addresses
  • if there is no GH issue, please create it. Each PR should have a link to an issue
  • this PR has a thorough description
  • Covered the changes with automated tests
  • Tested the changes locally
  • Added a label for each target release version (example: v7.13.2, v7.14.0, v8.0.0)
  • Considered corresponding documentation changes
  • Contributed any configuration settings changes to the configuration reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant