DocETL: Powering Complex Document Processing Pipelines

DocETL is a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks. It offers a low-code, declarative YAML interface to define LLM-powered operations on complex data.

When to Use DocETL

DocETL is the ideal choice when you're looking to maximize correctness and output quality for complex tasks over a collection of documents or unstructured datasets. You should consider using DocETL if:

You want to perform semantic processing on a collection of data
You have complex tasks that you want to represent via map-reduce
You're unsure how to best express your task to maximize LLM accuracy
You're working with long documents that don't fit into a single prompt
You have validation criteria and want tasks to automatically retry when validation fails

Community Projects

Educational Resources

Installation

Prerequisites

Python 3.10 or later
OpenAI API key

Quick Start

Install from PyPI:

pip install docetl

To see examples of how to use DocETL, check out the tutorial.

Running the UI Locally

We offer a simple UI for building pipelines. We recommend building up complex pipelines one operation at a time, so you can see the results of each operation as you go and iterate on your pipeline. To run it locally, follow these steps:

Clone the repository:

git clone https://github.com/ucbepic/docetl.git
cd docetl

Install dependencies:

make install      # Install Python package
make install-ui   # Install UI dependencies

Set up environment variables in .env:

OPENAI_API_KEY=your_api_key_here
BACKEND_ALLOW_ORIGINS=
BACKEND_HOST=localhost
BACKEND_PORT=8000
BACKEND_RELOAD=True
FRONTEND_HOST=0.0.0.0
FRONTEND_PORT=3000

Start the development server:

make run-ui-dev

Visit http://localhost:3000/playground

Development Setup

If you're planning to contribute or modify DocETL, you can verify your setup by running the test suite:

make tests-basic  # Runs basic test suite (costs < $0.01 with OpenAI)

For detailed documentation and tutorials, visit our documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 670 Commits
.github/workflows		.github/workflows
docetl		docetl
docs		docs
example_data		example_data
server		server
tests		tests
website		website
.env.sample		.env.sample
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
mkdocs.yml		mkdocs.yml
package-lock.json		package-lock.json
package.json		package.json
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
todos.md		todos.md
vision.md		vision.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocETL: Powering Complex Document Processing Pipelines

When to Use DocETL

Community Projects

Educational Resources

Installation

Prerequisites

Quick Start

Running the UI Locally

Development Setup

About

Releases 7

Packages

Contributors 14

Languages

License

ucbepic/docetl

Folders and files

Latest commit

History

Repository files navigation

DocETL: Powering Complex Document Processing Pipelines

When to Use DocETL

Community Projects

Educational Resources

Installation

Prerequisites

Quick Start

Running the UI Locally

Development Setup

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 14

Languages

Packages