chunking_eval_experiment #295

mtawbiaws · 2024-09-05T14:03:29Z

Identifying the best documents chunking strategy is key Building GenAI solution powered By RAG

Finding the right combination of the Chunking Strategy, Chunks size and Chunks overlap is a balance between responses accuracy, latency and cost
The goal of the notebook is to define a tool / framework allowing to test quickly multiple combinations of the three parameters above

This notebook demonstrates how to evaluate different chunking strategies, chunk sizing and overlap using RAGA framework for evaluation, Claude 3 Sonnet on Bedrock for LLM, FAISS for Vector Store and RAGA for Evaluation

tools and libraries used

LLM : anthropic.claude-3-sonnet on Bedrock
Embbeding: amazon.titan-embed-text-v1 on Bedrock
Vector Store: FAISS
RAG Retriever results evaluation :RAGA (context_recall,context_precision, answer_relevancy)
Evaluated Chunking strategy
1. NLTKTextSplitter
2. LatexTextSplitter
3. MarkdownTextSplitter
4. RecursiveCharacterTextSplitter
5. SpacyTextSplitter
Dataset: AWS 2019-2022 Shareholder-Letter

Workflow

Install required libraries
Create a bedrock Client
Load documents example in a local folder
Use a custom function "textsplitterStrategy" with 3 params : Strategy , chunk size and overlap
Run the function for 5 chunking strategies and load the documents in FAISS (5 stores)
Use Claude3 Sonnet and RAGA to evaluate context_recall,context_precision, answer_relevancy
plot the evaluation results

chunking_eval_experiment-rm

94cfecc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chunking_eval_experiment #295

chunking_eval_experiment #295

mtawbiaws commented Sep 5, 2024

chunking_eval_experiment #295

Are you sure you want to change the base?

chunking_eval_experiment #295

Conversation

mtawbiaws commented Sep 5, 2024

Identifying the best documents chunking strategy is key Building GenAI solution powered By RAG

tools and libraries used

Workflow