[WIP] Notebook-friendly connectors as importable classes #2685
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Generate
elastic-connectors
package with notebook-friendly connector classesA lot of changes huh? Dw 95% is autogenerated code :) you can skip
package/generated
andpackage/docs
, instead provide feedback on the actual template inscripts/package/codegen/templates
See package in action: colab notebook gist
Changes
I added a flow to auto-generate package code to turn the connectors framework into standalone importable connector classes that can be used independently of the framework application (connectors protocol dependent on special connectors indices).
The generated wrapper classes live under
package/generate
, they addDataSource
config
fields as constructor arguments, and uselabel
together withtooltip
to build docstrings.The code gen scripts along with
jinja2
templates lives underscripts/package
.The
package.connector_base.ConnectorBase
class is a class from which the generated classes inherit from. It provides some utils such as:async_get_docs
that would both callget_docs
and use local Apache Tika lib for content extractionlogger
allows to pass custom logger to attach to the dataprovider logicdownload_content
flag, can be disabled (so Apache Tika is not fetched) when syncing with e.g. sql database where we don't do content extraction (since no files to download)New requirements are specified under:
requirements/package.txt
used by package logicrequirements/package-dev.txt
used to build packageAutomated code generation and packaging logic
This happens under the hood when you call
make build_connector_package
Steps are as follows:
scripts/package/codegen
scripts and templates to update defs inpackage/generated
/connectors
as well as/package/*
and put it in temp folder/package/elastic_connectors
elastic_connectors
namespacelazydocs
package/setup.py
twine
(for now to testpypi)Constructor + Docstrings = Hints from python language server
Potential improvements
elastic_connectors[google_drive]
orelastic_connectors[sharepoint_online]
- so that you only install stuff that you need/connectors
code is accessible in the package (we don't document but folks could e.g. import ConcurrentTask)Pre-Review Checklist
config.yml.example
)v7.13.2
,v7.14.0
,v8.0.0
)