From f5f11c76b557b020bf02ac45284e146f829af2d6 Mon Sep 17 00:00:00 2001
From: Liam Thompson <liam.thompson@elastic.co>
Date: Fri, 4 Aug 2023 15:08:21 +0200
Subject: [PATCH 1/2] YAPR

---
 elser-getting-started.ipynb | 798 ++++++++++++++++++++++++++++++++++++
 1 file changed, 798 insertions(+)
 create mode 100644 elser-getting-started.ipynb

diff --git a/elser-getting-started.ipynb b/elser-getting-started.ipynb
new file mode 100644
index 0000000..80c47c6
--- /dev/null
+++ b/elser-getting-started.ipynb
@@ -0,0 +1,798 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "s49gpkvZ7q53"
+   },
+   "source": [
+    "# Semantic Search using ELSER and Python\n",
+    "\n",
+    "Learn how to use the [Elastic Learned Sparse Encoder](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html) for text expansion-powered semantic search."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Y01AXpELkygt"
+   },
+   "source": [
+    "# 🧰 Requirements\n",
+    "\n",
+    "For this example, you will need:\n",
+    "\n",
+    "- Python 3.6 or later\n",
+    "- An Elastic deployment with minimum **4GB machine learning node**\n",
+    "   - We'll be using [Elastic Cloud](https://www.elastic.co/guide/en/cloud/current/ec-getting-started.html) for this example (available with a [free trial](https://cloud.elastic.co/registration?elektra=en-ess-sign-up-page))\n",
+    "- The [ELSER](https://www.elastic.co/guide/en/machine-learning/8.8/ml-nlp-elser.html) model installed on your Elastic deployment\n",
+    "- The [Elastic Python client](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/installation.html)\n"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "N4pI1-eIvWrI"
+   },
+   "source": [
+    "# Create Elastic Cloud deployment\n",
+    "\n",
+    "If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?fromURI=%2Fhome) for a free trial.\n",
+    "\n",
+    "- Go to the [Create deployment](https://cloud.elastic.co/deployments/create) page\n",
+    "   - Under **Advanced settings**, go to **Machine Learning instances**\n",
+    "   - You'll need at least **4GB** RAM per zone for this tutorial\n",
+    "   - Select **Create deployment**"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "nSw1R8e28F_E"
+   },
+   "source": [
+    "# Setup ELSER\n",
+    "To use ELSER, you must have the [appropriate subscription]() level\n",
+    "for semantic search or the trial period activated.\n",
+    "\n",
+    "Follow these [instructions](https://www.elastic.co/guide/en/machine-learning/8.8/ml-nlp-elser.html#trained-model) to download and deploy ELSER in the Kibana UI or using the Dev Tools **Console**.\n",
+    "\n",
+    "(Console commands in comments 👇)\n",
+    "<!-- # download elser model\n",
+    "\n",
+    "```json\n",
+    "PUT _ml/trained_models/.elser_model_1\n",
+    "{\n",
+    "  \"input\": {\n",
+    "\t\"field_names\": [\"text_field\"]\n",
+    "  }\n",
+    "}\n",
+    "``` -->\n",
+    "<!-- # deploy model\n",
+    "```json\n",
+    "POST _ml/trained_models/.elser_model_1/deployment/_start?deployment_id=for_search -->"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "gaTFHLJC-Mgi"
+   },
+   "source": [
+    "# Install packages and initialize the Elasticsearch Python client\n",
+    "\n",
+    "To get started, we'll need to connect to our Elastic deployment using the Python client.\n",
+    "Because we're using an Elastic Cloud deployment, we'll use the **Cloud ID** to identify our deployment.\n",
+    "\n",
+    "First we need to `pip` install the following packages:\n",
+    "\n",
+    "- `elasticsearch`\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "K9Q1p2C9-wce",
+    "outputId": "204d5aee-571e-4363-be6e-f87d058f2d29"
+   },
+   "outputs": [],
+   "source": [
+    "!pip install elasticsearch"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "gEzq2Z1wBs3M"
+   },
+   "source": [
+    "[TODO: Update]\n",
+    "Next we need to import the `elasticsearch` module and the `getpass` module.\n",
+    "`getpass` is part of the Python standard library and is used to securely prompt for credentials."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "uP_GTVRi-d96"
+   },
+   "outputs": [],
+   "source": [
+    "from elasticsearch import Elasticsearch, helpers\n",
+    "from urllib.request import urlopen\n",
+    "import getpass\n",
+    "import requests\n",
+    "import json"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "AMSePFiZCRqX"
+   },
+   "source": [
+    "Now we can instantiate the Python Elasticsearch client.\n",
+    "First we prompt the user for their password and Cloud ID.\n",
+    "\n",
+    "🔐 NOTE: `getpass` enables us to securely prompt the user for credentials without echoing them to the terminal, or storing it in memory.\n",
+    "\n",
+    "Then we create a `client` object that instantiates an instance of the `Elasticsearch` class."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "h0MdAZ53CdKL",
+    "outputId": "96ea6f81-f935-4d51-c4a7-af5a896180f1"
+   },
+   "outputs": [],
+   "source": [
+    "# Found in the 'Manage Deployment' page\n",
+    "CLOUD_ID = getpass.getpass('Enter Elastic Cloud ID:  ')\n",
+    "\n",
+    "# Password for the 'elastic' user generated by Elasticsearch\n",
+    "ELASTIC_PASSWORD = getpass.getpass('Enter Elastic password:  ')\n",
+    "\n",
+    "# Create the client instance\n",
+    "client = Elasticsearch(\n",
+    "    cloud_id=CLOUD_ID,\n",
+    "    basic_auth=(\"elastic\", ELASTIC_PASSWORD)\n",
+    ")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "bRHbecNeEDL3"
+   },
+   "source": [
+    "Confirm that the client has connected with this test"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "rdiUKqZbEKfF",
+    "outputId": "43b6f1cd-a43e-4dbe-caa5-7fd170464881"
+   },
+   "outputs": [],
+   "source": [
+    "print(client.info())"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "enHQuT57DhD1"
+   },
+   "source": [
+    "Refer to https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new to learn how to connect to a self-managed deployment.\n",
+    "\n",
+    "Read https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new to learn how to connect using API keys.\n"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "TF_wxIAhD07a"
+   },
+   "source": [
+    "# Create Elasticsearch index with required mappings\n",
+    "\n",
+    "To use the ELSER model at index time, we'll need to create an index mapping that supports a [`text_expansion`](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-text-expansion-query.html) query.\n",
+    "The mapping must include a field of type [`rank_features`](https://www.elastic.co/guide/en/elasticsearch/reference/current/rank-features.html) to work with our feature vectors of interest.\n",
+    "This field contains the token-weight pairs the ELSER model created based on the input text.\n",
+    "\n",
+    "Let's create an index named `elser-movies` with the mappings we need.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "cvYECABJJs_2",
+    "outputId": "18fb51e4-c4f6-4d1b-cb2d-bc6f8ec1aa84"
+   },
+   "outputs": [],
+   "source": [
+    "INDEX = 'elser-movies'\n",
+    "client.indices.create(\n",
+    "            index=INDEX,\n",
+    "            settings={\n",
+    "                \"index\": {\n",
+    "                    \"number_of_shards\": 1,\n",
+    "                    \"number_of_replicas\": 1\n",
+    "                }\n",
+    "            },\n",
+    "            mappings={\n",
+    "    \"properties\": {\n",
+    "      \"genre\": {\n",
+    "        \"type\": \"text\",\n",
+    "        \"fields\": {\n",
+    "          \"keyword\": {\n",
+    "            \"type\": \"keyword\",\n",
+    "            \"ignore_above\": 256\n",
+    "          }\n",
+    "        }\n",
+    "      },\n",
+    "      \"keyScene\": {\n",
+    "        \"type\": \"text\",\n",
+    "        \"fields\": {\n",
+    "          \"keyword\": {\n",
+    "            \"type\": \"keyword\",\n",
+    "            \"ignore_above\": 256\n",
+    "          }\n",
+    "        }\n",
+    "      },\n",
+    "      \"plot\": {\n",
+    "        \"type\": \"text\",\n",
+    "        \"fields\": {\n",
+    "          \"keyword\": {\n",
+    "            \"type\": \"keyword\",\n",
+    "            \"ignore_above\": 256\n",
+    "          }\n",
+    "        }\n",
+    "      },\n",
+    "      \"released\": {\n",
+    "        \"type\": \"integer\"\n",
+    "      },\n",
+    "      \"runtime\": {\n",
+    "        \"type\": \"integer\"\n",
+    "      },\n",
+    "      \"title\": {\n",
+    "        \"type\": \"text\",\n",
+    "        \"fields\": {\n",
+    "          \"keyword\": {\n",
+    "            \"type\": \"keyword\",\n",
+    "            \"ignore_above\": 256\n",
+    "          }\n",
+    "        }\n",
+    "      },\n",
+    "      \"ml.tokens\": {\n",
+    "        \"type\": \"rank_features\"\n",
+    "      },\n",
+    "      \"keyScene\": {\n",
+    "        \"type\": \"text\"\n",
+    "      }\n",
+    "  }\n",
+    "}\n",
+    ")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "ohcvdngCGJlo"
+   },
+   "source": []
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "EmELvr_JK_22"
+   },
+   "source": [
+    "# Create an ingest pipeline with an inference processor to use ELSER\n",
+    "\n",
+    "In order to use ELSER on our Elastic Cloud deployment we'll need to create an ingest pipeline that contains an inference processor that runs the ELSER model.\n",
+    "Let's add that pipeline using the [`put_pipeline`](https://www.elastic.co/guide/en/elasticsearch/reference/master/put-pipeline-api.html) method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "XhRng99KLQsd",
+    "outputId": "00ea73b5-45a4-472b-f4bc-2c2c790ab94d"
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "client.ingest.put_pipeline(id=\"elser-v1-test\", body={\n",
+    "    \"processors\": [\n",
+    "    {\n",
+    "      \"inference\": {\n",
+    "        \"model_id\": \".elser_model_1\",\n",
+    "        \"target_field\": \"ml\",\n",
+    "        \"field_map\": {\n",
+    "          \"keyScene\": \"text_field\",\n",
+    "          \"plot\": \"text_field\"\n",
+    "        },\n",
+    "        \"inference_config\": {\n",
+    "          \"text_expansion\": {\n",
+    "            \"results_field\": \"tokens\"\n",
+    "          }\n",
+    "        }\n",
+    "      }\n",
+    "    }\n",
+    "  ]\n",
+    "})"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "0wCH7YHLNW3i"
+   },
+   "source": [
+    "Let's note a few important parameters from that API call:\n",
+    "\n",
+    "- `inference`: A processor that performs inference using a machine learning model.\n",
+    "- `model_id`: Specifies the ID of the machine learning model to be used. In this example, the model ID is set to `.elser_model_1`.\n",
+    "- `target_field`: Defines the field where the inference result will be stored. Here, it is set to `ml`.\n",
+    "- `text_expansion`: Configures text expansion options for the inference process.\n",
+    "In this example, the inference results will be stored in a field named \"tokens\"."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "WgWDMgf9NkHL"
+   },
+   "source": []
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "U3vT2g5LVIQF"
+   },
+   "source": [
+    "# Create index and mapping for test data\n",
+    "\n",
+    "\n",
+    "We have some test data in a `json` file at this [URL](https://raw.githubusercontent.com/leemthompo/notebook-tests/main/12-movies.json).\n",
+    "Let's load that into our Elastic deployment.\n",
+    "First we'll create an index named `search-movies` to store that data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "X3ONJckPnUIT",
+    "outputId": "07ea0766-c226-4510-c910-893db89757ad"
+   },
+   "outputs": [],
+   "source": [
+    "client.indices.create(\n",
+    "    index=\"search-movies\",\n",
+    "    mappings= {\n",
+    "    \"properties\": {\n",
+    "      \"genre\": {\n",
+    "        \"type\": \"text\",\n",
+    "        \"fields\": {\n",
+    "          \"keyword\": {\n",
+    "            \"type\": \"keyword\",\n",
+    "            \"ignore_above\": 256\n",
+    "          }\n",
+    "        }\n",
+    "      },\n",
+    "      \"keyScene\": {\n",
+    "        \"type\": \"text\",\n",
+    "        \"fields\": {\n",
+    "          \"keyword\": {\n",
+    "            \"type\": \"keyword\",\n",
+    "            \"ignore_above\": 256\n",
+    "          }\n",
+    "        }\n",
+    "      },\n",
+    "      \"plot\": {\n",
+    "        \"type\": \"text\",\n",
+    "        \"fields\": {\n",
+    "          \"keyword\": {\n",
+    "            \"type\": \"keyword\",\n",
+    "            \"ignore_above\": 256\n",
+    "          }\n",
+    "        }\n",
+    "      },\n",
+    "      \"released\": {\n",
+    "        \"type\": \"integer\"\n",
+    "      },\n",
+    "      \"runtime\": {\n",
+    "        \"type\": \"integer\"\n",
+    "      },\n",
+    "      \"title\": {\n",
+    "        \"type\": \"text\",\n",
+    "        \"fields\": {\n",
+    "          \"keyword\": {\n",
+    "            \"type\": \"keyword\",\n",
+    "            \"ignore_above\": 256\n",
+    "          }\n",
+    "        }\n",
+    "      }\n",
+    "    }\n",
+    "})"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "lFHgRUYVpNKP"
+   },
+   "source": [
+    "# Upload sample data\n",
+    "\n",
+    "> ⚠ To use the UI to upload data, follow the approach described [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/semantic-search-elser.html#load-data).\n",
+    "\n",
+    "Let's upload the JSON data.\n",
+    "The dataset provides information on twelve iconic films.\n",
+    "Each film's entry includes its title, runtime, plot summary, a key scene, genre classification, and release year."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "IBfqgdAcuKRG",
+    "outputId": "3b86daa1-ade1-4ff3-da81-4207fa814d30"
+   },
+   "outputs": [],
+   "source": [
+    "url = \"https://raw.githubusercontent.com/leemthompo/notebook-tests/main/12-movies.json\"\n",
+    "\n",
+    "# Send a request to the URL and get the response\n",
+    "response = urlopen(url)\n",
+    "\n",
+    "# Load the response data into a JSON object\n",
+    "data_json = json.loads(response.read())\n",
+    "\n",
+    "def create_index_body(doc):\n",
+    "    \"\"\" Generate the body for an Elasticsearch document. \"\"\"\n",
+    "    return {\n",
+    "        \"_index\": \"search-movies\",\n",
+    "        \"_source\": doc,\n",
+    "    }\n",
+    "\n",
+    "# Prepare the documents to be indexed\n",
+    "documents = [create_index_body(doc) for doc in data_json]\n",
+    "\n",
+    "# Use helpers.bulk to index\n",
+    "helpers.bulk(client, documents)\n",
+    "\n",
+    "print(\"Done indexing documents into `search-movies` index!\")\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "73d3Td-1ubhv"
+   },
+   "source": [
+    "# Ingest the data through the inference ingest pipeline\n",
+    "\n",
+    "Create tokens from the text by reindexing the data throught the inference pipeline that uses ELSER as the inference model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "ysYobyC9uhn5",
+    "outputId": "27af8c88-9039-4ff8-a20f-9af9ffcff05c"
+   },
+   "outputs": [],
+   "source": [
+    "client.reindex(wait_for_completion=False,\n",
+    "               source={\n",
+    "                  \"index\": \"search-movies\"\n",
+    "    },\n",
+    "               dest= {\n",
+    "                  \"index\": \"elser-movies\",\n",
+    "                  \"pipeline\": \"elser-v1-test\"\n",
+    "    }\n",
+    ")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "tUDGeY7e2-I2"
+   },
+   "source": [
+    "# Confirm documents are indexed with additional fields\n",
+    "\n",
+    "A successful API call in the previous step returns a task ID to monitor the job's progress.\n",
+    "Use the [task management API](https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html) to check progress.\n",
+    "You can also monitor this task using the **Trained Models** UI in Kibana, selecting the **Pipelines** tab under **ELSER**.\n",
+    "\n",
+    "Call the following, replacing `<task_id>` with the task id returned in the previous step."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "2KXeXCc63WVw",
+    "outputId": "e8fee6dd-34a1-401d-c879-71fd54de3c90"
+   },
+   "outputs": [],
+   "source": [
+    "client.tasks.get(task_id='cxy4bU9ASFKpFgZUpa-jnA:19545263')"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "oCj3jHHML4Tn"
+   },
+   "source": [
+    "Inspect a new document to confirm that it now has an `\"ml\": {\"tokens\":...}` field that contains a list of new, additional terms.\n",
+    "These terms are the **text expansion** of the field(s) you targeted for ELSER inference.\n",
+    "ELSER essentially creates a tree of expanded terms to improve the semantic searchability of your documents.\n",
+    "We'll be able to search these documents using a `text_expansion` query.\n",
+    "\n",
+    "But first let's start with a simple keyword search, to see how ELSER delivers semantically relevant results out of the box."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "_KahQAbPPd9l"
+   },
+   "source": [
+    "# Keyword match\n",
+    "\n",
+    "## Successful match\n",
+    "\n",
+    "Let's start by assuming a user queries the data set and hits an exact match.\n",
+    "BM25 is perfect for exact keyword matches.\n",
+    "Imagine our user remembers a movie where a child's spinning top was a recurring image.\n",
+    "They search for `spinning top` and because these exact words are used in the key scene description, we get a perfect hit.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "FsZkFhGaYnzD",
+    "outputId": "843c72f1-6a0c-43ce-c1e4-ad5e763ebc95"
+   },
+   "outputs": [],
+   "source": [
+    "response = client.search(\n",
+    "    index=\"elser-movies\",\n",
+    "    query= {\n",
+    "            \"match\": {\n",
+    "                \"keyScene\": \"spinning top\"\n",
+    "            }\n",
+    "        }\n",
+    ")\n",
+    "for hit in response['hits']['hits']:\n",
+    "    doc_id = hit['_id']\n",
+    "    score = hit['_score']\n",
+    "    title = hit['_source']['title']\n",
+    "    text = hit['_source']['keyScene']\n",
+    "    print(f\"\\nTitle: {title}\\nKey scene description: {text}\\n\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Y01WHeOtbTZ-"
+   },
+   "source": [
+    "## Unsuccessful match\n",
+    "\n",
+    "Unfortunately, searches that rely on exact matches are brittle.\n",
+    "What if you can't remember the exact name of the thing you're searching for?\n",
+    "Who knows what a spinning top is anyway?\n",
+    "\n",
+    "Imagine I can only think of the word `child toy` to describe this apparatus?\n",
+    "A match query won't find any relevant documents."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "osifkhqidjYw",
+    "outputId": "6b917df6-b0af-4947-9280-98f7b17f2ff9"
+   },
+   "outputs": [],
+   "source": [
+    "response = client.search(\n",
+    "    index=\"elser-movies\",\n",
+    "    query= {\n",
+    "            \"match\": {\n",
+    "                \"keyScene\": \"child toy\"\n",
+    "            }\n",
+    "        }\n",
+    ")\n",
+    "hits = response['hits']['hits']\n",
+    "\n",
+    "if not hits:\n",
+    "    print(\"No matches found\")\n",
+    "else:\n",
+    "    for hit in hits:\n",
+    "        doc_id = hit['_id']\n",
+    "        score = hit['_score']\n",
+    "        title = hit['_source']['title']\n",
+    "        text = hit['_source']['keyScene']\n",
+    "        print(f\"\\nTitle: {title}\\nKey scene description: {text}\\n\")\n"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "MPCVztOLeAk_"
+   },
+   "source": [
+    "So it turns out classical term matching strategies are very good, if you know precisely what you're looking for.\n",
+    "But they break down when a user has a hard time articulating what they're trying to find.\n",
+    "Here's where semantic search shines.\n",
+    "It helps capture a user's intent or meaning better, without relying on brittle term matches.\n",
+    "\n",
+    "Traditional dense vector based similarity strategies require you to generate embeddings for your data and then map queries into the same mathematical space as the data.\n",
+    "This works well but is time consuming and requires a lot of legwork.\n",
+    "The beauty of the Elastic Learned Sparse Encoder model is that it works out-of-the-box, without the need to fine tune on your data.\n",
+    "\n",
+    "The Elastic Learned Sparse Encoder creates a tree of expanded terms, adds them to your documents, improving their semantic searchability.\n",
+    "The fields that you targeted for inference are now enriched with a range of relevant synonyms and related terms, that increase the probability of a successful search."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Zy5GT2xb38oz"
+   },
+   "source": [
+    "# Semantic search with the `text_expansion` query\n",
+    "\n",
+    "Let's test out semantic search using the Elastic Learned Sparse Encoder, and see if we can improve our earlier unsuccessful search, using the query `child toy`.\n",
+    "\n",
+    "To perform semantic search using the Elastic Learned Sparse Encoder, you need the following:\n",
+    "- A `text_expansion` query\n",
+    "- Query text\n",
+    "   - In this example we use `child toy`\n",
+    "- ELSER model ID"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "bAZRxja-5Q6X",
+    "outputId": "37a26a2c-4284-4e51-c34e-9a55edf77cb8"
+   },
+   "outputs": [],
+   "source": [
+    "response = client.search(index='elser-movies', size=3,\n",
+    "              query={\n",
+    "                  \"text_expansion\": {\n",
+    "                  \"ml.tokens\": {\n",
+    "                      \"model_id\":\".elser_model_1\",\n",
+    "                      \"model_text\":\"child toy\"\n",
+    "                      \n",
+    "        }\n",
+    "    }\n",
+    "}\n",
+    ")\n",
+    "\n",
+    "for hit in response['hits']['hits']:\n",
+    "    doc_id = hit['_id']\n",
+    "    score = hit['_score']\n",
+    "    title = hit['_source']['title']\n",
+    "    text = hit['_source']['keyScene']\n",
+    "    print(f\"Score: {score}\\nTitle: {title}\\nKey scene description: {text}\\n\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "yYSJ7fnv5uWd"
+   },
+   "source": [
+    "Success! Out of the box ELSER has taken a fuzzy, but semantically similar query and found the correct match.\n",
+    "Our user has found the movie they're looking for!"
+   ]
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}

From 1d4bb900116ef126cc268f674b5c254538bec5fd Mon Sep 17 00:00:00 2001
From: Liam Thompson <liam.thompson@elastic.co>
Date: Fri, 4 Aug 2023 15:12:34 +0200
Subject: [PATCH 2/2] Help me

---
 notebook.ipynb | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 notebook.ipynb

diff --git a/notebook.ipynb b/notebook.ipynb
new file mode 100644
index 0000000..e69de29