From 1041ced272f269bfb644e5d0b023177bf2257fef Mon Sep 17 00:00:00 2001 From: Liam Thompson Date: Fri, 4 Aug 2023 14:06:05 +0200 Subject: [PATCH] Test action --- es-whisper.ipynb | 586 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 586 insertions(+) create mode 100644 es-whisper.ipynb diff --git a/es-whisper.ipynb b/es-whisper.ipynb new file mode 100644 index 0000000..b3e5089 --- /dev/null +++ b/es-whisper.ipynb @@ -0,0 +1,586 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "id": "87773ce7", + "metadata": {}, + "source": [ + "# Tutorial: Search audio transcriptions with Elasticsearch\n", + "\n", + "## What problem are we solving?\n", + "\n", + "Your organization likely has a lot of unstructured data, such as audio from recorded meetings, which are difficult to search.\n", + "Tools like Zoom and Teams have audio transcription features today, but they have two major limitations:\n", + "\n", + "- They are not very accurate, especially for technical terms and non-native English accents.\n", + "- They are not easily searchable outside of the meeting platform.\n", + "\n", + "This tutorial will show you how to use a state-of-the-art AI model to generate accurate transcriptions from audio files and sync them to an Elasticsearch index.\n", + "You'll be able to scale this approach up to keep track of all your organization's audio data, and search it from a single place.\n", + "This is a powerful way to make an important part of your organization's knowledge base more accessible.\n", + "You'll be able to use this tutorial as a blueprint for building search experiences for other types of unstructured data, such as images, video, and text.\n", + "\n", + "## What you'll learn\n", + "\n", + "This tutorial will walk you through the following steps:\n", + "\n", + "1. How to generate transcriptions from an audio file using the OpenAI [Whisper](https://openai.com/blog/whisper/) model [API](https://platform.openai.com/docs/api-reference/audio) in Python.\n", + "2. Sync the transcriptions to an Elasticsearch index, using use the official [Elasticsearch Python client](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#auth-apikey).\n", + "3. Query the index to retrieve transcriptions, using a hybrid search (vector-based semantic search + keyword search) strategy.\n", + "4. Use an Elastic Search UI to easily search the transcriptions.\n", + "5. 🎁 **Bonus**: We'll show you how to summarize your transcription results using the Hugging Face [BART model](https://huggingface.co/transformers/model_doc/bart.html#bartsummarizationpipeline).\n", + "Use this to get a quick overview of the contents of your audio files, and to find the most relevant ones.\n", + "We can update the documents that contain transcriptions in the Elasticsearch index with a `summary` field, making these searchable.\n", + "\n", + "First things first: let's import the libraries we'll need.\n", + "\n", + "🏃🏽‍♀️Run this notebook:\n", + "\n", + "- Locally using [jupyter](https://docs.jupyter.org/en/latest/install.html)\n", + "- Online using [Google Colab](https://colab.research.google.com/?hl=en)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "d0f7dafa", + "metadata": {}, + "source": [ + "## 🧰 Requirements\n", + "\n", + "For this example, you will need:\n", + "\n", + "- Python 3.6 or later\n", + "- An Elastic deployment\n", + " - We'll be using [Elastic Cloud](https://www.elastic.co/guide/en/cloud/current/ec-getting-started.html) for this example (available with a [free trial](https://cloud.elastic.co/registration?elektra=en-ess-sign-up-page))\n", + "- The [Elastic Python client](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/installation.html)\n", + "- The [OpenAI Python client](https://github.com/openai/openai-python)\n", + "- An OpenAI API key\n", + " - You can get one by [signing up for the OpenAI API](https://beta.openai.com/)\n", + "- (_Optional for bonus section_) The [`huggingface_hub` library](https://huggingface.co/docs/huggingface_hub/quick-start)\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "9cf3cbb5", + "metadata": {}, + "source": [ + "## Create Elastic Cloud deployment\n", + "\n", + "If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?fromURI=%2Fhome) for a free trial.\n", + "\n", + "- Go to the [Create deployment](https://cloud.elastic.co/deployments/create) page\n", + " - Select **Create deployment**" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "d7076a7a", + "metadata": {}, + "source": [ + "## Install packages and import modules\n", + "\n", + "To get started, we'll need to connect to our Elastic deployment using the Python client.\n", + "Because we're using an Elastic Cloud deployment, we'll use the **Cloud ID** to identify our deployment.\n", + "\n", + "First we need to `pip` install the following packages:\n", + "\n", + "- `elasticsearch`\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "6e237928", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: elasticsearch in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (8.8.0)\n", + "Requirement already satisfied: elastic-transport<9,>=8 in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from elasticsearch) (8.4.0)\n", + "Requirement already satisfied: urllib3<2,>=1.26.2 in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from elastic-transport<9,>=8->elasticsearch) (1.26.16)\n", + "Requirement already satisfied: certifi in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from elastic-transport<9,>=8->elasticsearch) (2023.5.7)\n", + "\u001b[33mWARNING: You are using pip version 21.2.3; however, version 23.1.2 is available.\n", + "You should consider upgrading via the '/Users/liamthompson/.pyenv/versions/3.9.7/bin/python3.9 -m pip install --upgrade pip' command.\u001b[0m\n", + "Collecting openai\n", + " Downloading openai-0.27.8-py3-none-any.whl (73 kB)\n", + "\u001b[K |████████████████████████████████| 73 kB 6.2 MB/s eta 0:00:01\n", + "\u001b[?25hCollecting aiohttp\n", + " Downloading aiohttp-3.8.4-cp39-cp39-macosx_11_0_arm64.whl (338 kB)\n", + "\u001b[K |████████████████████████████████| 338 kB 28.9 MB/s eta 0:00:01\n", + "\u001b[?25hRequirement already satisfied: requests>=2.20 in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from openai) (2.31.0)\n", + "Requirement already satisfied: tqdm in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from openai) (4.65.0)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from requests>=2.20->openai) (3.1.0)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from requests>=2.20->openai) (1.26.16)\n", + "Requirement already satisfied: idna<4,>=2.5 in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from requests>=2.20->openai) (3.4)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from requests>=2.20->openai) (2023.5.7)\n", + "Collecting yarl<2.0,>=1.0\n", + " Downloading yarl-1.9.2-cp39-cp39-macosx_11_0_arm64.whl (62 kB)\n", + "\u001b[K |████████████████████████████████| 62 kB 5.6 MB/s eta 0:00:01\n", + "\u001b[?25hCollecting aiosignal>=1.1.2\n", + " Using cached aiosignal-1.3.1-py3-none-any.whl (7.6 kB)\n", + "Collecting frozenlist>=1.1.1\n", + " Downloading frozenlist-1.3.3-cp39-cp39-macosx_11_0_arm64.whl (35 kB)\n", + "Collecting attrs>=17.3.0\n", + " Using cached attrs-23.1.0-py3-none-any.whl (61 kB)\n", + "Collecting multidict<7.0,>=4.5\n", + " Downloading multidict-6.0.4-cp39-cp39-macosx_11_0_arm64.whl (29 kB)\n", + "Collecting async-timeout<5.0,>=4.0.0a3\n", + " Using cached async_timeout-4.0.2-py3-none-any.whl (5.8 kB)\n", + "Installing collected packages: multidict, frozenlist, yarl, attrs, async-timeout, aiosignal, aiohttp, openai\n", + "Successfully installed aiohttp-3.8.4 aiosignal-1.3.1 async-timeout-4.0.2 attrs-23.1.0 frozenlist-1.3.3 multidict-6.0.4 openai-0.27.8 yarl-1.9.2\n", + "\u001b[33mWARNING: You are using pip version 21.2.3; however, version 23.1.2 is available.\n", + "You should consider upgrading via the '/Users/liamthompson/.pyenv/versions/3.9.7/bin/python3.9 -m pip install --upgrade pip' command.\u001b[0m\n", + "Requirement already satisfied: huggingface-hub in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (0.15.1)\n", + "Requirement already satisfied: filelock in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from huggingface-hub) (3.12.2)\n", + "Requirement already satisfied: typing-extensions>=3.7.4.3 in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from huggingface-hub) (4.6.3)\n", + "Requirement already satisfied: fsspec in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from huggingface-hub) (2023.6.0)\n", + "Requirement already satisfied: packaging>=20.9 in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from huggingface-hub) (23.1)\n", + "Requirement already satisfied: tqdm>=4.42.1 in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from huggingface-hub) (4.65.0)\n", + "Requirement already satisfied: pyyaml>=5.1 in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from huggingface-hub) (6.0)\n", + "Requirement already satisfied: requests in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from huggingface-hub) (2.31.0)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from requests->huggingface-hub) (2023.5.7)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from requests->huggingface-hub) (1.26.16)\n", + "Requirement already satisfied: idna<4,>=2.5 in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from requests->huggingface-hub) (3.4)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/liamthompson/.pyenv/versions/3.9.7/lib/python3.9/site-packages (from requests->huggingface-hub) (3.1.0)\n", + "\u001b[33mWARNING: You are using pip version 21.2.3; however, version 23.1.2 is available.\n", + "You should consider upgrading via the '/Users/liamthompson/.pyenv/versions/3.9.7/bin/python3.9 -m pip install --upgrade pip' command.\u001b[0m\n" + ] + } + ], + "source": [ + "!pip install elasticsearch\n", + "!pip install --upgrade openai\n", + "!pip install huggingface-hub" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "cccf5bf5", + "metadata": {}, + "source": [ + "Next we need to import the modules we need." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "8ed40603", + "metadata": {}, + "outputs": [], + "source": [ + "from elasticsearch import Elasticsearch, helpers\n", + "import openai\n", + "import huggingface_hub # optional for step 5\n", + "from urllib.request import urlopen\n", + "import getpass\n", + "import requests" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "4c25c2a9", + "metadata": {}, + "source": [ + "## Transcribe audio file(s)\n", + "\n", + "We need some sample audio files to transcribe.\n", + "We're going to use a podcast interview with Brian Kernighan available in MP3 format at this [URL](https://op3.dev/e/https://cdn.changelog.com/uploads/podcast/484/the-changelog-484.mp3). \n", + "The interview is about 96 minutes long.\n", + "First let's download the file and save it locally.\n", + "In your organization you might have audio files stored in a cloud storage bucket, or in a database.\n", + "You can adapt the code below to read the audio file from your storage system." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "af1eef70", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Downloading file into /Users/liamthompson/notebook-tests\n" + ] + } + ], + "source": [ + "import os # use this to get the current user's current working directory\n", + "\n", + "url = \"https://op3.dev/e/https://cdn.changelog.com/uploads/podcast/484/the-changelog-484.mp3\"\n", + "\n", + "# Download the file using the URL with the requests library\n", + "# File will be saved in the current working directory\n", + "\n", + "pwd = os.getcwd()\n", + "\n", + "\n", + "r = requests.get(url)\n", + "with open(\"kernighan.mp3\", \"wb\") as file:\n", + " file.write(r.content)\n", + "print(f\"Downloading file into {pwd}\")\n", + "\n", + "\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "91962d3b", + "metadata": {}, + "source": [ + "# Transcribe audio file\n", + "\n", + "Now we've got our sample audio file, let's transcribe it using the OpenAI API.\n", + "We'll use the [Whisper](https://openai.com/blog/whisper/) model.\n", + "The model is available via the OpenAI API.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "1e03999a", + "metadata": {}, + "outputs": [], + "source": [ + "openai.api_key = getpass.getpass(\"Enter your OpenAI API key: \")\n", + "\n", + "\n", + "audio_file= open(\"/Users/liamthompson/notebook-tests/kernighan.mp3\", \"rb\") # change this to the path of your audio file\n", + "\n", + "transcription = openai.Audio.transcribe(\"whisper-1\", audio_file)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "e1ab9c20", + "metadata": {}, + "source": [ + "Let's see what our transcription looks like:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "e2185211", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "print(type(transcription))\n", + "\n", + "# save the transcription to a file\n", + "\n", + "with open(\"kernighan-transcription.json\", \"w\") as file:\n", + " file.write(str(transcription))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "91b466d6", + "metadata": {}, + "source": [ + "## Connect Elasticsearch client\n", + "\n", + "Cool we have our transcription!\n", + "Let's connect our Elasticsearch Python client to our Elastic deployment, so we can sync the transcription to an index." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "38679016", + "metadata": {}, + "source": [ + "## Initialize the Elasticsearch client\n", + "\n", + "Now we can instantiate the Elasticsearch client.\n", + "First we prompt the user for their password and Cloud ID.\n", + "\n", + "🔐 NOTE: `getpass` enables us to securely prompt for credentials without echoing them to the terminal, or storing in memory.\n", + "\n", + "Then we create a `client` object that instantiates an instance of the `Elasticsearch` class." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "145a1222", + "metadata": {}, + "outputs": [], + "source": [ + "# Found in the 'Manage Deployment' page\n", + "CLOUD_ID = getpass.getpass('Enter Elastic Cloud ID: ')\n", + "\n", + "# Password for the 'elastic' user generated by Elasticsearch\n", + "ELASTIC_PASSWORD = getpass.getpass('Enter Elastic password: ')\n", + "\n", + "# Create the client instance\n", + "client = Elasticsearch(\n", + " cloud_id=CLOUD_ID,\n", + " basic_auth=(\"elastic\", ELASTIC_PASSWORD)\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "555fbc67", + "metadata": {}, + "source": [ + "Confirm that the client has connected with this test." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "92afc4a9", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'name': 'instance-0000000000', 'cluster_name': '9dd1e5c0b0d64796b8cf0746cf63d734', 'cluster_uuid': 'VeYvw6JhQcC3P-Q1-L9P_w', 'version': {'number': '8.9.0-SNAPSHOT', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': 'ac7d79178c3e57c935358453331efe9e9cc5104d', 'build_date': '2023-06-21T09:08:25.219504984Z', 'build_snapshot': True, 'lucene_version': '9.7.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0', 'transport_version': '8500019'}, 'tagline': 'You Know, for Search'}\n" + ] + } + ], + "source": [ + "print(client.info())" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "d0a03898", + "metadata": {}, + "source": [ + "Refer to https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new to learn how to connect to a self-managed deployment.\n", + "\n", + "Read https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new to learn how to connect using API keys." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "86945aaf", + "metadata": {}, + "source": [ + "## Index the transcription into Elasticsearch\n", + "\n", + "Now we can create an index to store our transcriptions and index our first document." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "c59aa463", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/var/folders/vz/v2f6_x6s0kg51j2vbm5rlhww0000gn/T/ipykernel_71565/2436590366.py:3: DeprecationWarning: The 'body' parameter is deprecated and will be removed in a future version. Instead use the 'document' parameter. See https://github.com/elastic/elasticsearch-py/issues/1698 for more information\n", + " client.index(index=\"transcriptions\", id=1, body=str(transcription))\n" + ] + }, + { + "data": { + "text/plain": [ + "ObjectApiResponse({'_index': 'transcriptions', '_id': '1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 0, '_primary_term': 1})" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# client.indices.create(index=\"transcriptions\", ignore=400)\n", + "\n", + "client.index(index=\"transcriptions\", id=1, body=str(transcription))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "944cff74", + "metadata": {}, + "source": [ + "## Aside: Pretty printing Elasticsearch responses\n", + "\n", + "Your API calls will return hard-to-read nested JSON.\n", + "We'll create a little function called `pretty_response` to return nice, human-readable outputs from our examples." + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "21c2a6fc", + "metadata": {}, + "outputs": [], + "source": [ + "def insert_newlines(string, every=64):\n", + " return '\\n'.join(string[i:i+every] for i in range(0, len(string), every))\n", + "\n", + "def pretty_response(response):\n", + " for hit in response['hits']['hits']:\n", + " id = hit['_id']\n", + " text = hit['_source']['text']\n", + " higlight = hit['highlight']['text']\n", + " pretty_output = (f\"\\nText: {text} \\n\\nHighlight: {higlight}\")\n", + " print(insert_newlines(pretty_output))\n", + "\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "f160acd0", + "metadata": {}, + "source": [ + "## Query the index\n", + "\n", + "Now we can query the index to search our transcription.\n", + "Let's start with a simple keyword search.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "id": "5ea91bfb", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Text: You know, is he a standout in terms of just like once in \n", + "a generation kind of a software developer or Are there a lot of \n", + "people that you've seen that have been just as good as he was bu\n", + "t he happened to have that Nugget, you know, he had to be the ri\n", + "ght place the right time with the right idea and the right peopl\n", + "e. I think He's a singularity. I have never seen anybody else wh\n", + "o's in the same league as him You know, I've certainly met a lot\n", + " of programmers who are very good Yeah, and you know some of my \n", + "students sure the people I worked with at Bell Labs very good Bu\n", + "t I can is in a different universe entirely as far as I can tell\n", + " and it's a combination of a bunch of things I mean just being a\n", + "ble to write code very quickly that works Very very well done co\n", + "de but also this insight into solving the right problem in a The\n", + " right way and just doing that repeatedly over all kinds of diff\n", + "erent domains I've never seen anybody remotely like that in any \n", + "setting at all he You know one night he and Joe Condon and I we \n", + "had gotten a new typesetter at Bell Labs It was basically a devi\n", + "ce controlled by a very small computer inside a computer automat\n", + "ion naked mini if you wish to know I'll be no just a generic kin\n", + "d of mediocre 16-bit Computer and it came the typesetter came wi\n", + "th really awful software And so you couldn't figure out what was\n", + " going on. And of course, you didn't get source code You just go\n", + "t more at something that ran and so Ken and Joe and I were puzzl\n", + "ing over what to do with this thing And I late afternoon. I said\n", + " I'm going home for dinner I'll be back in a while and I came ba\n", + "ck at sort of seven or eight o'clock at night and Ken had writte\n", + "n a Disassembler for this thing so that he could see what the as\n", + "sembly language was so that he could then start to write well, o\n", + "f course now you write the assembler and then you And you know t\n", + "hat kind of thing where in a couple of hours He had built a fund\n", + "amental tool that was then our first toehold and understanding m\n", + "achine now, you know Writing a disassembler is not rocket scienc\n", + "e But on the other hand to put it together that quickly and accu\n", + "rately on the basis of very little information Now this is befor\n", + "e the internet when you couldn't just sort of go and Google for \n", + "what's the opcode set of this machine? You had to find manuals a\n", + "nd it's always kind of thing So now off scale and he could just \n", + "kept doing that over such a wide domain of things I mean we thin\n", + "k of Unix, but he did all this work on the chess machine where h\n", + "e had the first Master level chess computer. That was his softwa\n", + "re and he wrote a lot of the CAD tools that made it go as well A\n", + "nd you know He built a thing that was like the Sony Walkman with\n", + " an mp3 like encoding before anybody else did because he talked \n", + "to the people who knew how to do speech coding down the hall is \n", + "on and on and on and you've said before that Programming is not \n", + "just a science but also an art Which leads me to believe that fo\n", + "r some reason Ken was blessed with this art side of the of the s\n", + "cience So you can know how to program you can know how to progra\n", + "m well with less bugs But to be able to apply the thinking to a \n", + "problem set in the ways you described Ken What do you think you \n", + "know without describing his you know for lack of better terms ge\n", + "nius What do you think helped him have that mindset? Like what h\n", + "ow did he begin to solve a problem? Do you think? You know, I ac\n", + "tually don't know I suspect part of it is that he had just been \n", + "interested in all kinds of things And you know I didn't meet him\n", + " until he and I arrived he arrived at labs a couple years before\n", + " I did and Then we were in the same group for many years, but hi\n", + "s background I think originally was electrical engineering He wa\n", + "s much more of a hardware person. In fact than a software person\n", + " originally And perhaps that gave him a different perspective on\n", + " how things work or at least a broader Perspective. I don't know\n", + " about let's say his mathematical background But for example, yo\n", + "u mentioned this art and science he built a regular expression r\n", + "ecognizer, which is \n", + "\n", + "Highlight: ['You know, is he a standout in\n", + " terms of just like once in a generation kind of a soft\n", + "ware developer or']\n" + ] + } + ], + "source": [ + "response = client.search(index=\"transcriptions\",\n", + " query= {\n", + " \"match\": {\n", + " \"text\": \"generation\"\n", + " }\n", + " },\n", + " highlight={\n", + " \"fields\": {\n", + " \"text\": {}\n", + " }\n", + " })\n", + "pretty_response(response)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file