Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having trouble replicating IrisVectorStore Llama Index demo from iris-vector-search for my program's user table #10

Open
ericmariasis opened this issue Jul 22, 2024 · 0 comments

Comments

@ericmariasis
Copy link

I'll preface this by saying I'm not sure if I found a bug or because I'm somehow misusing IrisVectorStore. Also for testing you'll probably need an OpenAI token.

Basically I have code from the regular llama-index module working in my Python project which has SimpleDirectoryReader objects similar in nature to the demo I mentioned (https://github.com/intersystems-community/iris-vector-search/blob/main/d...). And I have other code working (not shown) that can add new users to a SQL table in Iris.

I tried to use IRISVectorStore in a manner similar to the below excerpt from the demo code but I just changed the table name to the name of my user table. And I also just changed the documents object in that code to my own SimpleDirectoryReader object.

However no matter how many times I try to run with those changes I get a flurry of exceptions where the trace makes little sense to me. I can confirm that my code in place to connect to my user table locally does work. I'll include the trace at the bottom.

# StorageContext captures how vectors will be stored
vector_store = IRISVectorStore.from_params(
    connection_string = url,
    table_name = "paul_graham_essay",
    embed_dim = 1536,  # openai embedding dimension
    engine_args = { "connect_args": {"sslcontext": sslcontext} }

Below is the entire code module where I use llama Index and you can see the block of commented out code I tried to add in run_query_on_files in addition to the setup steps above similar to the demo.

import textwrap

import nest_asyncio
from openai import OpenAIError
from pydantic import ValidationError

nest_asyncio.apply()

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext
from llama_index.llms.openai import OpenAI

from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_iris import IRISVectorStore

import os
from .myconfig import *

os.environ["OPENAI_API_KEY"] = f'{OPENAI_API_KEY}'

username = f'{DB_USER}'
password = f'{DB_PASS}'
hostname = os.getenv('IRIS_HOSTNAME', f'{DB_URL}')
port = f'{DB_PORT}'
namespace = f'{DB_NAMESPACE}'

from llama_index.core import Settings

Settings.llm = OpenAI(temperature=0.2, model="gpt-3.5-turbo")

import ssl

certificateFile = "/usr/cert-demo/certificateSQLaaS.pem"

if (os.path.exists(certificateFile)):
    print("Located SSL certficate at '%s', initializing SSL configuration", certificateFile)
    sslcontext = ssl.create_default_context(cafile=certificateFile)
else:
    print("No certificate file found, continuing with insecure connection")
    sslcontext = None

from sqlalchemy import create_engine, text

url = f"iris://{username}:{password}@{hostname}:{port}/{namespace}"

engine = create_engine(url, connect_args={"sslcontext": sslcontext})
with engine.connect() as conn:
    print(conn.execute(text("SELECT 'hello world!'")).first()[0])

# StorageContext captures how vectors will be stored
vector_store = IRISVectorStore.from_params(
    connection_string = url,
    table_name = "user",
    embed_dim = 1536,  # openai embedding dimension
    engine_args = { "connect_args": {"sslcontext": sslcontext} }
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
def get_filename_before_dot(filename):
    name, extension = os.path.splitext(filename)
    return name


def run_query_on_files(files, query):
    # Check if the OpenAI API key is provided
    if not os.getenv("OPENAI_API_KEY"):
        return "Cannot run model. No API key provided."

    try:
        queryEngineTools = []
        for file in files:
            curDoc = SimpleDirectoryReader(input_files=[file]).load_data()
            # index = VectorStoreIndex.from_documents(
            #     curDoc,
            #     storage_context=storage_context,
            #     show_progress=True,
            # )
            # query_engine = index.as_query_engine()
            # userResp = query_engine.query("Summarize this content.")
            # print(textwrap.fill(str(userResp), 100))
            curVectorStore = VectorStoreIndex.from_documents(curDoc)
            curEngine = curVectorStore.as_query_engine(similarity_top_k=3)
            curTool = QueryEngineTool(query_engine=curEngine, metadata=ToolMetadata(
                name=get_filename_before_dot(file),
                description=get_filename_before_dot(file)
            ))
            queryEngineTools.append(curTool)

        if len(files) > 0:
            s_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=queryEngineTools)
            response = s_engine.query(query)
            return response
        return ''
    except OpenAIError as e:
        return "Cannot run model. Invalid API key or other OpenAI error."
    except ValidationError as e:
        print(f"Validation error: {str(e)}")
        return "Validation error occurred."
    except Exception as e:
        print(f"An unexpected error occurred: {str(e)}")
        return "An unexpected error occurred."

And here is a trace of the error I get.

Parsing nodes: 100%|████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 991.33it/s]
Generating embeddings: 100%|█████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.29s/it]
An unexpected error occurred: 1 validation error for NodeWithScore
node
  Can't instantiate abstract class BaseNode with abstract methods get_content, get_metadata_str, get_type, hash, set_content (type=type_error)
Parsing nodes: 100%|███████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1009.22it/s]
Generating embeddings: 100%|█████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.74it/s] 
An unexpected error occurred: 1 validation error for NodeWithScore
node
  Can't instantiate abstract class BaseNode with abstract methods get_content, get_metadata_str, get_type, hash, set_content (type=type_error)
Parsing nodes: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s] 
Generating embeddings: 100%|█████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  8.55it/s] 
An unexpected error occurred: 1 validation error for NodeWithScore
node
  Can't instantiate abstract class BaseNode with abstract methods get_content, get_metadata_str, get_type, hash, set_content (type=type_error)
Parsing nodes: 100%|███████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1009.70it/s] 
Generating embeddings: 100%|█████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  5.21it/s] 
An unexpected error occurred: 1 validation error for NodeWithScore
node
  Can't instantiate abstract class BaseNode with abstract methods get_content, get_metadata_str, get_type, hash, set_content (type=type_error)

My question is basically does anybody know for sure that IRISVectorStore can successfully extract information from a user table? Or might I have hit some weird edge case when trying to use this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant