Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Labels added after project creation are not returned by https://app.cvat.ai/api/tasks?project_id= #8457

Open
2 tasks done
BeanBagKing opened this issue Sep 19, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@BeanBagKing
Copy link

BeanBagKing commented Sep 19, 2024

Actions before raising this issue

  • I searched the existing issues and did not find anything similar.
  • I read/searched the docs

Steps to Reproduce

  1. Create a new project with some initial labels
  2. After the project is created, add some more labels
  3. Query https://app.cvat.ai/api/labels?project_id= for the the project labels
  4. The project labels returned by the API should be only the original, and none of the newly added labels
    4a. While this appears to be an API endpoint (based on the URL), the correct data is returned by the labels api

Expected Behavior

The project labels across any api match the current constructor

Possible Solution

Have the source of https://app.cvat.ai/api/labels?project_id= updated and/or have it serve the labels list api.

Context

I was trying to write a script to give me basic statistics on my project (labels per class, etc.). The following script is rather longer than it needs to be with all the debugging, but it will return the results:

from cvat_sdk import Client
from prettytable import PrettyTable
from collections import defaultdict
import requests
import logging

project_id = 123456  # Update with your project ID

# Set up logging to a file
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')
#logging.basicConfig(filename='cvat_log.log', level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s') # Or log to file

# CVAT SDK connection (update these variables accordingly)
cvat_host = 'https://app.cvat.ai'
# CVAT credentials (replace with your username and password)
username = 'USERNAME'
#password = getpass.getpass(prompt='Enter your CVAT password: ')
password = r'PASSWORD'

# Manually login using requests to get the token
session = requests.Session()
login_url = f'{cvat_host}/api/auth/login'
login_data = {'username': username, 'password': password}
login_response = session.post(login_url, json=login_data)

if login_response.status_code == 200:
    token = login_response.json().get('key')
    logging.debug(f"Successfully logged in, received token: {token}")
else:
    logging.error(f"Failed to log in. Status code: {login_response.status_code}")
    raise Exception("Login failed. Please check your credentials.")

# Function to fetch project-level labels by making an API request to the provided URL
def fetch_project_labels(project_id):
    client = Client(cvat_host)
    client.login((username, password))

    # Retrieve project data
    project_data = client.projects.retrieve(project_id)
    
    # Debugging to inspect the structure of labels
    logging.debug(f"Project data for project {project_id}: {project_data}")

    # Get the labels URL
    label_url = project_data.labels['url']

    # Fetch labels from the label URL
    headers = {"Authorization": f"Token {token}"}
    response = requests.get(label_url, headers=headers)
    
    if response.status_code == 200:
        label_data = response.json()

        if 'results' in label_data:
            project_labels = {label['id']: label['name'] for label in label_data['results']}
            logging.debug(f"Fetched project-level labels from: {label_url}")
            logging.debug(f"Project-level labels are: {project_labels}")
            return project_labels
        else:
            logging.error(f"No labels found in the response from {label_url}")
            return {}
    else:
        logging.error(f"Failed to fetch labels from {label_url}, Status Code: {response.status_code}")
        return {}

fetch_project_labels(project_id)

The following is a sanitized version of the logs that are written:

2024-09-18 23:53:05,987 - DEBUG - Starting new HTTPS connection (1): app.cvat.ai:443
2024-09-18 23:53:06,991 - DEBUG - https://app.cvat.ai:443 "POST /api/auth/login HTTP/1.1" 200 50
2024-09-18 23:53:06,992 - DEBUG - Successfully logged in, received token: 8c742c94ef756723e1b2bc0f7344829b20316781
2024-09-18 23:53:08,339 - DEBUG - Project data for project 123456: {'assignee': None,
 'assignee_updated_date': None,
 'bug_tracker': '',
 'created_date': datetime.datetime(2024, 6, 11, 22, 28, 35, 840052, tzinfo=tzutc()),
 'dimension': '2d',
 'guide_id': 12345,
 'id': 123456,
 'labels': {'url': 'https://app.cvat.ai/api/labels?project_id=123456'},
 'name': 'ProjName',
 'organization': None,
 'owner': {'first_name': 'FIRST',
           'id': 111111,
           'last_name': 'LAST',
           'url': 'https://app.cvat.ai/api/users/111111',
           'username': 'USERNAME'},
 'source_storage': {'cloud_storage_id': None,
                    'id': 2222222,
                    'location': 'local'},
 'status': 'annotation',
 'target_storage': {'cloud_storage_id': None,
                    'id': 3333333,
                    'location': 'local'},
 'task_subsets': ['train'],
 'tasks': {'count': 7,
           'url': 'https://app.cvat.ai/api/tasks?project_id=123456'},
 'updated_date': datetime.datetime(2024, 9, 19, 3, 7, 14, 522143, tzinfo=tzutc()),
 'url': 'https://app.cvat.ai/api/projects/123456'}
2024-09-18 23:53:08,862 - DEBUG - Fetched project-level labels from: https://app.cvat.ai/api/labels?project_id=123456
2024-09-18 23:53:08,862 - DEBUG - Project-level labels are: {1: 'Raccoon', 2: 'Possum', 3: 'Armadillo', 4: 'Dog', 5: 'Cat'}

Missing from the Project-level labels on the last line are 6: Car, 7: Truck, and 8: Van
Again, sanitized, but the missing labels are in a similar numerical order. That is what leads me to believe it is the ones added after I started the project that are missing, though I can't be positive.

However, taking a page directly from the API documentation and creating the following script:

from pprint import pprint

from cvat_sdk.api_client import Configuration, ApiClient, exceptions
from cvat_sdk.api_client.models import *

configuration = Configuration(
    host = 'https://app.cvat.ai',
    username = 'USERNAME',
    password = r'PASSWORD'
)

with ApiClient(configuration) as api_client:
    project_id = 149228 
    try:
        (data, response) = api_client.labels_api.list(
            project_id=project_id,
        )
        pprint(data)
    except exceptions.ApiException as e:
        print("Exception when calling LabelsApi.list(): %s\n" % e)

Gets me the literal output of the raw constructor from the project page, with all 8 labels.

Environment

- Windows 11 23H2
- PowerShell 7 shell
- Python 3.12.6
- CVAT SDK updated as of today
@BeanBagKing BeanBagKing added the bug Something isn't working label Sep 19, 2024
@zhiltsov-max
Copy link
Contributor

Hi,

It's not clear from the request how the project is created and how extra labels are added. I guess the reason might be that your function fetch_project_labels fetches just the 1st page of the results. Try to use cvat_sdk.core.helpers.get_paginated_collection or this snippet:

from cvat_sdk import make_client

with make_client(...) as client:
    project = client.projects.retrieve(42)
    print(project.get_labels())

SDK docs: https://docs.cvat.ai/docs/api_sdk/sdk/highlevel-api/ and https://docs.cvat.ai/docs/api_sdk/sdk/lowlevel-api/

@BeanBagKing
Copy link
Author

I'm not sure it matters how the project is created or how extra labels are added, though in both cases I would have done it via the web interface. Like I said I can't be positive that this is even exactly what's going on here. The behavior matches though.

Regarding the first page of results, I don't believe there is any other pages, or at least none returned by that API that I see. Specifically, it gets the project data from client.projects.retrieve(project_id), which returns JSON containing a url to retrieve lables 'labels': {'url': 'https://app.cvat.ai/api/labels?project_id=123456'},, calling that URL in turn returns a single JSON string of just the label ID and label, no other information, no indication of a second page, nothing else.

I did find a better way to get the labels, so this isn't a stopper for me and I don't even remember how I ended up in that area to get the labels. I think it is a bug though, I'm just not sure what the proper fix is:

  • Ensure that it does return all the labels
  • Ensure it returns an indication that labels continue on a second page
  • Remove the labels URL from project data all together so nobody inadvertently references it
  • Something else?

@zhiltsov-max
Copy link
Contributor

We need a reproducing example to be able to help, that's why I asked about how the task was created. Could you provide a full example? E.g. a full python script that creates a task and then retrieves incomplete labels? If it's on app.cvat.ai, you can write to our dedicated support channel (https://www.cvat.ai/ -> Company -> Contact Us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants