Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add document structure into GraphRAG #2033

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

KingSkyLi
Copy link
Contributor

image

@Aries-ckt
Copy link
Collaborator

it looks like db upsert error.

 document embedding, failed:xx.txt, Query execution failed: {code: ParserException} {message: ParserException: line 3:12 no

 viable alternative at input 'CALL db.upsertVertex("chunk", [{id: "62c3475f-3ecf-4520-ae97-88336aaca5fe", name: 

"none_header_chunk", content: d748118572'}

@fanzhidongyzby fanzhidongyzby changed the title Add the document structure of the graph. feat: add document structure into GraphRAG Sep 27, 2024
@github-actions github-actions bot added the enhancement New feature or request label Sep 27, 2024
@fanzhidongyzby fanzhidongyzby added the GraphRAG Module: GraphRAG label Sep 27, 2024
@@ -169,13 +246,14 @@ async def asimilar_search_with_scores(
# local search: extract keywords and explore subgraph
keywords = await self._keyword_extractor.extract(text)
subgraph = self._graph_store.explore(keywords, limit=topk).format()
document_subgraph = self._graph_store.explore_text_link(keywords, limit=topk).format()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explore_text_link should be implemented in CommunityStoreAdapter instead of directly modifying the API of GraphStore.

@@ -188,11 +188,11 @@ def get_neighbor_edges(
"""Get neighbor edges."""

@abstractmethod
def vertices(self) -> Iterator[Vertex]:
def vertices(self, vertex_prop:Optional[str] = None) -> Iterator[Vertex]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

过滤器使用lambda表达式实现:
def vertices(self, filter_fn: Optional[Callable[[Vertex], bool]]=None) -> Iterator[Vertex]:

@@ -132,20 +134,95 @@ def community_store_configure(name: str, cfg: VectorStoreConfig):
def get_config(self) -> BuiltinKnowledgeGraphConfig:
"""Get the knowledge graph config."""
return self._config

def _parse_chunks(slef, chunks: List[Chunk]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The responsibility of the _parse_chunks function should be to directly construct a document graph and save. Therefore, it should be declared as: def _load_doc_graph(self, chunks: List[Chunk]):.

@@ -172,3 +173,7 @@ def delete_vector_name(self, index_name: str):

logger.info("Drop triplet extractor")
self._triplet_extractor.drop()

def delete_by_ids(self, ids: str) -> List[str]:
self._graph_store.delete_document(chunk_ids = ids)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delete_document should be implemented in CommunityStoreAdapter instead of directly modifying the API of GraphStore.

@@ -65,7 +65,7 @@ def _parse_response(self, text: str, limit: Optional[int] = None) -> List[Graph]
match = re.match(r"\((.*?)#(.*?)\)", line)
if match:
name, summary = [part.strip() for part in match.groups()]
graph.upsert_vertex(Vertex(name, description=summary))
graph.upsert_vertex(Vertex(name, description=summary, vertex_type='entity'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. The vertex_type and edge_type should be determined by TuGraphStore itself when inserting.

)
document_type: str = Field(
default="document",
description="The type of document vertex, `entity` by default.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong comment.

if self._summary_enabled:
self._upload_plugin()

def _check_label(self, elem_type: str):
result = self.conn.get_table_names()
if elem_type == "vertex":
if elem_type == "entity":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use configuration items instead of hardcoding.

for data in data_list:
chunk_src = Vertex(f"""{data['parent_id']}""",name=data["parent_title"],vertex_type=data["type"],content=data["content"])
chunk_dst = Vertex(f"""{data["id"]}""",name=data["title"],vertex_type=data["type"],content=data["content"])
chunk_include_chunk = Edge(chunk_src.vid,chunk_dst.vid,name=f"include",edge_type="chunk_include_chunk")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The edge_type should not be directly specified at the outer layer. Let GraphStore decide on its own.

Comment on lines +33 to +34


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Leave one line at last in the code.

@Appointat
Copy link

@KingSkyLi Hi, I was assigned to help review this pr, could you please explain what kind of the doc structure needed to be added in the graph? Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request GraphRAG Module: GraphRAG
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][GraphRAG] Knowledge graph extraction supports document structure information
4 participants