Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: 985 feature argillalabeller task #986

Open
wants to merge 34 commits into
base: develop
Choose a base branch
from

Conversation

davidberenstein1957
Copy link
Member

@davidberenstein1957 davidberenstein1957 commented Sep 17, 2024

Things to note:

  • build prompt template from rg.Settings
  • Does not include SpanQuestion or RankingQuestion
  • Does not include ImageField ChatField

Annotate a record with the same dataset and question:

import argilla as rg
from distilabel.steps.tasks import ArgillaLabeller
from distilabel.llms.huggingface import InferenceEndpointsLLM

# Get information from Argilla dataset definition
dataset = rg.Dataset("my_dataset")
pending_records_filter = rg.Filter(("status", "==", "pending"))
completed_records_filter = rg.Filter(("status", "==", "completed"))
pending_records = list(
    dataset.records(
        query=rg.Query(filter=pending_records_filter),
        limit=5,
    )
)
example_records = list(
    dataset.records(
        query=rg.Query(filter=completed_records_filter),
        limit=5,
    )
)
field = dataset.settings.fields["text"]
question = dataset.settings.questions["label"]

# Initialize the labeller with the model and fields
labeller = ArgillaLabeller(
    llm=InferenceEndpointsLLM(
        model_id="mistralai/Mistral-7B-Instruct-v0.2",
    )
    fields=[field],
    question=question,
    example_records=example_records,
    guidelines=dataset.guidelines
)
labeller.load()

# Process the pending records
result = next(
    labeller.process(
        [
            {
                "record": record
            } for record in pending_records
        ]
    )
)

# Add the suggestions to the records
for record, suggestion in zip(pending_records, result):
    record.suggestions.add(suggestion["suggestion"])

# Log the updated records
dataset.records.log(pending_records)

Annotate a record with alternating datasets and questions:

import argilla as rg
from distilabel.steps.tasks import ArgillaLabeller
from distilabel.llms.huggingface import InferenceEndpointsLLM

# Get information from Argilla dataset definition
dataset = rg.Dataset("my_dataset")
field = dataset.settings.fields["text"]
question = dataset.settings.questions["label"]
question2 = dataset.settings.questions["label2"]

# Initialize the labeller with the model and fields
labeller = ArgillaLabeller(
    llm=InferenceEndpointsLLM(
        model_id="mistralai/Mistral-7B-Instruct-v0.2",
    )
)
labeller.load()

# Process the record
record = next(dataset.records())
result = next(
    labeller.process(
        [
            {
                "record": record,
                "fields": [field],
                "question": question,
            },
            {
                "record": record,
                "fields": [field],
                "question": question2,
            }
        ]
    )
)

# Add the suggestions to the record
record.suggestions.add(result[0]["suggestion"])

# Log the updated record
dataset.records.log(record)

Overwrite default prompts and instructions:

import argilla as rg
from distilabel.steps.tasks import ArgillaLabeller
from distilabel.llms.huggingface import InferenceEndpointsLLM

# Overwrite default prompts and instructions
labeller = ArgillaLabeller(
    llm=InferenceEndpointsLLM(
        model_id="mistralai/Mistral-7B-Instruct-v0.2",
    ),
    system_prompt="You are an expert annotator and labelling assistant that understands complex domains and natural language processing.",
    question_to_label_instruction={
        "label_selection": "Select the appropriate label from the list of provided labels.",
        "multi_label_selection": "Select none, one or multiple labels from the list of provided labels.",
        "text": "Provide a text response to the question.",
        "rating": "Provide a rating for the question.",
    },
)
labeller.load()

@davidberenstein1957 davidberenstein1957 linked an issue Sep 17, 2024 that may be closed by this pull request
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@davidberenstein1957 davidberenstein1957 changed the base branch from main to develop September 17, 2024 19:13
Copy link

Documentation for this PR has been built. You can view it at: https://distilabel.argilla.io/pr-986/

Copy link

codspeed-hq bot commented Sep 17, 2024

CodSpeed Performance Report

Merging #986 will not alter performance

Comparing feat/985-feature-argillalabeller-task (1a0aaf2) with develop (d7e61b5)

Summary

✅ 1 untouched benchmarks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] ArgillaLabeller Task
1 participant