-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add document structure into GraphRAG #2033
base: main
Are you sure you want to change the base?
Conversation
KingSkyLi
commented
Sep 20, 2024
it looks like db upsert error.
|
@@ -169,13 +246,14 @@ async def asimilar_search_with_scores( | |||
# local search: extract keywords and explore subgraph | |||
keywords = await self._keyword_extractor.extract(text) | |||
subgraph = self._graph_store.explore(keywords, limit=topk).format() | |||
document_subgraph = self._graph_store.explore_text_link(keywords, limit=topk).format() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explore_text_link
should be implemented in CommunityStoreAdapter
instead of directly modifying the API of GraphStore
.
@@ -188,11 +188,11 @@ def get_neighbor_edges( | |||
"""Get neighbor edges.""" | |||
|
|||
@abstractmethod | |||
def vertices(self) -> Iterator[Vertex]: | |||
def vertices(self, vertex_prop:Optional[str] = None) -> Iterator[Vertex]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
过滤器使用lambda表达式实现:
def vertices(self, filter_fn: Optional[Callable[[Vertex], bool]]=None) -> Iterator[Vertex]:
@@ -132,20 +134,95 @@ def community_store_configure(name: str, cfg: VectorStoreConfig): | |||
def get_config(self) -> BuiltinKnowledgeGraphConfig: | |||
"""Get the knowledge graph config.""" | |||
return self._config | |||
|
|||
def _parse_chunks(slef, chunks: List[Chunk]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The responsibility of the _parse_chunks
function should be to directly construct a document graph and save. Therefore, it should be declared as: def _load_doc_graph(self, chunks: List[Chunk]):
.
@@ -172,3 +173,7 @@ def delete_vector_name(self, index_name: str): | |||
|
|||
logger.info("Drop triplet extractor") | |||
self._triplet_extractor.drop() | |||
|
|||
def delete_by_ids(self, ids: str) -> List[str]: | |||
self._graph_store.delete_document(chunk_ids = ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The delete_document
should be implemented in CommunityStoreAdapter
instead of directly modifying the API of GraphStore
.
@@ -65,7 +65,7 @@ def _parse_response(self, text: str, limit: Optional[int] = None) -> List[Graph] | |||
match = re.match(r"\((.*?)#(.*?)\)", line) | |||
if match: | |||
name, summary = [part.strip() for part in match.groups()] | |||
graph.upsert_vertex(Vertex(name, description=summary)) | |||
graph.upsert_vertex(Vertex(name, description=summary, vertex_type='entity')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect. The vertex_type
and edge_type
should be determined by TuGraphStore
itself when inserting.
) | ||
document_type: str = Field( | ||
default="document", | ||
description="The type of document vertex, `entity` by default.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong comment.
if self._summary_enabled: | ||
self._upload_plugin() | ||
|
||
def _check_label(self, elem_type: str): | ||
result = self.conn.get_table_names() | ||
if elem_type == "vertex": | ||
if elem_type == "entity": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use configuration items instead of hardcoding.
for data in data_list: | ||
chunk_src = Vertex(f"""{data['parent_id']}""",name=data["parent_title"],vertex_type=data["type"],content=data["content"]) | ||
chunk_dst = Vertex(f"""{data["id"]}""",name=data["title"],vertex_type=data["type"],content=data["content"]) | ||
chunk_include_chunk = Edge(chunk_src.vid,chunk_dst.vid,name=f"include",edge_type="chunk_include_chunk") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The edge_type
should not be directly specified at the outer layer. Let GraphStore
decide on its own.
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave one line at last in the code.
@KingSkyLi Hi, I was assigned to help review this pr, could you please explain what kind of the doc structure needed to be added in the graph? Thanks a lot! |