{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Getting Started with RedisVL\n",
    "`redisvl` is a versatile Python library with an integrated CLI, designed to enhance AI applications using Redis. This guide will walk you through the following steps:\n",
    "\n",
    "1. Defining an `IndexSchema`\n",
    "2. Preparing a sample dataset\n",
    "3. Creating a `SearchIndex` object\n",
    "4. Testing `rvl` CLI functionality\n",
    "5. Loading the sample data\n",
    "6. Building `VectorQuery` objects and executing searches\n",
    "7. Updating a `SearchIndex` object\n",
    "\n",
    "...and more!\n",
    "\n",
    "Prerequisites:\n",
    "- Ensure `redisvl` is installed in your Python environment.\n",
    "- Have a running instance of [Redis Stack](https://redis.io/docs/install/install-stack/) or [Redis Cloud](https://redis.com/try-free).\n",
    "\n",
    "_____"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define an `IndexSchema`\n",
    "\n",
    "The `IndexSchema` maintains crucial **index configuration** and **field definitions** to\n",
    "enable search with Redis. For ease of use, the schema can be constructed from a\n",
    "python dictionary or yaml file.\n",
    "\n",
    "### Example Schema Creation\n",
    "Consider a dataset with user information, including `job`, `age`, `credit_score`,\n",
    "and a 3-dimensional `user_embedding` vector.\n",
    "\n",
    "You must also decide on a Redis index name and key prefix to use for this\n",
    "dataset. Below are example schema definitions in both YAML and Dict format.\n",
    "\n",
    "**YAML Definition:**\n",
    "\n",
    "```yaml\n",
    "version: '0.1.0'\n",
    "\n",
    "index:\n",
    "  name: user_simple\n",
    "  prefix: user_simple_docs\n",
    "\n",
    "fields:\n",
    "    - name: user\n",
    "      type: tag\n",
    "    - name: credit_score\n",
    "      type: tag\n",
    "    - name: job\n",
    "      type: text\n",
    "    - name: age\n",
    "      type: numeric\n",
    "    - name: user_embedding\n",
    "      type: vector\n",
    "      attrs:\n",
    "        algorithm: flat\n",
    "        dims: 3\n",
    "        distance_metric: cosine\n",
    "        datatype: float32\n",
    "```\n",
    "> Store this in a local file, such as `schema.yaml`, for RedisVL usage."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Python Dictionary:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "schema = {\n",
    "    \"index\": {\n",
    "        \"name\": \"user_simple\",\n",
    "        \"prefix\": \"user_simple_docs\",\n",
    "    },\n",
    "    \"fields\": [\n",
    "        {\"name\": \"user\", \"type\": \"tag\"},\n",
    "        {\"name\": \"credit_score\", \"type\": \"tag\"},\n",
    "        {\"name\": \"job\", \"type\": \"text\"},\n",
    "        {\"name\": \"age\", \"type\": \"numeric\"},\n",
    "        {\n",
    "            \"name\": \"user_embedding\",\n",
    "            \"type\": \"vector\",\n",
    "            \"attrs\": {\n",
    "                \"dims\": 3,\n",
    "                \"distance_metric\": \"cosine\",\n",
    "                \"algorithm\": \"flat\",\n",
    "                \"datatype\": \"float32\"\n",
    "            }\n",
    "        }\n",
    "    ]\n",
    "}"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sample Dataset Preparation\n",
    "\n",
    "Below, create a mock dataset with `user`, `job`, `age`, `credit_score`, and\n",
    "`user_embedding` fields. The `user_embedding` vectors are synthetic examples\n",
    "for demonstration purposes.\n",
    "\n",
    "For more information on creating real-world embeddings, refer to this\n",
    "[article](https://mlops.community/vector-similarity-search-from-basics-to-production/)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "\n",
    "data = [\n",
    "    {\n",
    "        'user': 'john',\n",
    "        'age': 1,\n",
    "        'job': 'engineer',\n",
    "        'credit_score': 'high',\n",
    "        'user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()\n",
    "    },\n",
    "    {\n",
    "        'user': 'mary',\n",
    "        'age': 2,\n",
    "        'job': 'doctor',\n",
    "        'credit_score': 'low',\n",
    "        'user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()\n",
    "    },\n",
    "    {\n",
    "        'user': 'joe',\n",
    "        'age': 3,\n",
    "        'job': 'dentist',\n",
    "        'credit_score': 'medium',\n",
    "        'user_embedding': np.array([0.9, 0.9, 0.1], dtype=np.float32).tobytes()\n",
    "    }\n",
    "]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">As seen above, the sample `user_embedding` vectors are converted into bytes. Using the `NumPy`, this is fairly trivial."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a `SearchIndex`\n",
    "\n",
    "With the schema and sample dataset ready, instantiate a `SearchIndex`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from redisvl.index import SearchIndex\n",
    "\n",
    "index = SearchIndex.from_dict(schema)\n",
    "# or use .from_yaml('schema_file.yaml')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we also need to facilitate a Redis connection. There are a few ways to do this:\n",
    "\n",
    "- Create & manage your own client connection (recommended)\n",
    "- Provide a simple Redis URL and let RedisVL connect on your behalf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bring your own Redis connection instance\n",
    "\n",
    "This is ideal in scenarios where you have custom settings on the connection instance or if your application will share a connection pool:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from redis import Redis\n",
    "\n",
    "client = Redis.from_url(\"redis://localhost:6379\")\n",
    "\n",
    "index.set_client(client)\n",
    "# optionally provide an async Redis client object to enable async index operations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Let the index manage the connection instance\n",
    "\n",
    "This is ideal for simple cases:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<redisvl.index.index.SearchIndex at 0x7f8670a51190>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "index.connect(\"redis://localhost:6379\")\n",
    "# optionally use an async client by passing use_async=True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the underlying index\n",
    "\n",
    "Now that we are connected to Redis, we need to run the create command."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "index.create(overwrite=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">Note that at this point, the index has no entries. Data loading follows."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inspect with the `rvl` CLI\n",
    "Use the `rvl` CLI to inspect the created index and its fields:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[32m11:53:23\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m   Indices:\n",
      "\u001b[32m11:53:23\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m   1. user_simple\n"
     ]
    }
   ],
   "source": [
    "!rvl index listall"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "Index Information:\n",
      "╭──────────────┬────────────────┬──────────────────────┬─────────────────┬────────────╮\n",
      "│ Index Name   │ Storage Type   │ Prefixes             │ Index Options   │   Indexing │\n",
      "├──────────────┼────────────────┼──────────────────────┼─────────────────┼────────────┤\n",
      "│ user_simple  │ HASH           │ ['user_simple_docs'] │ []              │          0 │\n",
      "╰──────────────┴────────────────┴──────────────────────┴─────────────────┴────────────╯\n",
      "Index Fields:\n",
      "╭────────────────┬────────────────┬─────────┬────────────────┬────────────────╮\n",
      "│ Name           │ Attribute      │ Type    │ Field Option   │ Option Value   │\n",
      "├────────────────┼────────────────┼─────────┼────────────────┼────────────────┤\n",
      "│ user           │ user           │ TAG     │ SEPARATOR      │ ,              │\n",
      "│ credit_score   │ credit_score   │ TAG     │ SEPARATOR      │ ,              │\n",
      "│ job            │ job            │ TEXT    │ WEIGHT         │ 1              │\n",
      "│ age            │ age            │ NUMERIC │                │                │\n",
      "│ user_embedding │ user_embedding │ VECTOR  │                │                │\n",
      "╰────────────────┴────────────────┴─────────┴────────────────┴────────────────╯\n"
     ]
    }
   ],
   "source": [
    "!rvl index info -i user_simple"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Data to `SearchIndex`\n",
    "\n",
    "Load the sample dataset to Redis:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['user_simple_docs:d424b73c516442f7919cc11ed3bb1882', 'user_simple_docs:6da16f88342048e79b3500bec5448805', 'user_simple_docs:ef5a590ef85e4d4888fd8ebe79ae1e8c']\n"
     ]
    }
   ],
   "source": [
    "keys = index.load(data)\n",
    "\n",
    "print(keys)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">By default, `load` will create a unique Redis \"key\" as a combination of the index key `prefix` and a UUID. You can also customize the key by providing direct keys or pointing to a specified `id_field` on load."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Upsert the index with new data\n",
    "Upsert data by using the `load` method again:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['user_simple_docs:9806a362604f4700b17513cc94fcf10d']\n"
     ]
    }
   ],
   "source": [
    "# Add more data\n",
    "new_data = [{\n",
    "    'user': 'tyler',\n",
    "    'age': 9,\n",
    "    'job': 'engineer',\n",
    "    'credit_score': 'high',\n",
    "    'user_embedding': np.array([0.1, 0.3, 0.5], dtype=np.float32).tobytes()\n",
    "}]\n",
    "keys = index.load(new_data)\n",
    "\n",
    "print(keys)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating `VectorQuery` Objects\n",
    "\n",
    "Next we will create a vector query object for our newly populated index. This example will use a simple vector to demonstrate how vector similarity works. Vectors in production will likely be much larger than 3 floats and often require Machine Learning models (i.e. Huggingface sentence transformers) or an embeddings API (Cohere, OpenAI). `redisvl` provides a set of [Vectorizers](https://www.redisvl.com/user_guide/vectorizers_04.html#openai) to assist in vector creation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "from redisvl.query import VectorQuery\n",
    "from jupyterutils import result_print\n",
    "\n",
    "query = VectorQuery(\n",
    "    vector=[0.1, 0.1, 0.5],\n",
    "    vector_field_name=\"user_embedding\",\n",
    "    return_fields=[\"user\", \"age\", \"job\", \"credit_score\", \"vector_distance\"],\n",
    "    num_results=3\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Executing queries\n",
    "With our `VectorQuery` object defined above, we can execute the query over the `SearchIndex` using the `query` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table><tr><th>vector_distance</th><th>user</th><th>age</th><th>job</th><th>credit_score</th></tr><tr><td>0</td><td>john</td><td>1</td><td>engineer</td><td>high</td></tr><tr><td>0</td><td>mary</td><td>2</td><td>doctor</td><td>low</td></tr><tr><td>0.0566299557686</td><td>tyler</td><td>9</td><td>engineer</td><td>high</td></tr></table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "results = index.query(query)\n",
    "result_print(results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using an Asynchronous Redis Client\n",
    "\n",
    "The `AsyncSearchIndex` class along with an async Redis python client allows for queries, index creation, and data loading to be done asynchronously. This is the\n",
    "recommended route for working with `redisvl` in production-like settings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from redisvl.index import AsyncSearchIndex\n",
    "from redis.asyncio import Redis\n",
    "\n",
    "client = Redis.from_url(\"redis://localhost:6379\")\n",
    "\n",
    "index = AsyncSearchIndex.from_dict(schema)\n",
    "await index.set_client(client)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table><tr><th>vector_distance</th><th>user</th><th>age</th><th>job</th><th>credit_score</th></tr><tr><td>0</td><td>john</td><td>1</td><td>engineer</td><td>high</td></tr><tr><td>0</td><td>mary</td><td>2</td><td>doctor</td><td>low</td></tr><tr><td>0.0566299557686</td><td>tyler</td><td>9</td><td>engineer</td><td>high</td></tr></table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# execute the vector query async\n",
    "results = await index.query(query)\n",
    "result_print(results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Updating a schema\n",
    "In some scenarios, it makes sense to update the index schema. With Redis and `redisvl`, this is easy because Redis can keep the underlying data in place while you change or make updates to the index configuration."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So for our scenario, let's imagine we want to reindex this data in 2 ways:\n",
    "- by using a `Tag` type for `job` field instead of `Text`\n",
    "- by using an `hnsw` vector index for the `user_embedding` field instead of a `flat` vector index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Modify this schema to have what we want\n",
    "\n",
    "index.schema.remove_field(\"job\")\n",
    "index.schema.remove_field(\"user_embedding\")\n",
    "index.schema.add_fields([\n",
    "    {\"name\": \"job\", \"type\": \"tag\"},\n",
    "    {\n",
    "        \"name\": \"user_embedding\",\n",
    "        \"type\": \"vector\",\n",
    "        \"attrs\": {\n",
    "            \"dims\": 3,\n",
    "            \"distance_metric\": \"cosine\",\n",
    "            \"algorithm\": \"hnsw\",\n",
    "            \"datatype\": \"float32\"\n",
    "        }\n",
    "    }\n",
    "])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "11:53:25 redisvl.index.index INFO   Index already exists, overwriting.\n"
     ]
    }
   ],
   "source": [
    "# Run the index update but keep underlying data in place\n",
    "await index.create(overwrite=True, drop=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table><tr><th>vector_distance</th><th>user</th><th>age</th><th>job</th><th>credit_score</th></tr><tr><td>0</td><td>john</td><td>1</td><td>engineer</td><td>high</td></tr><tr><td>0</td><td>mary</td><td>2</td><td>doctor</td><td>low</td></tr><tr><td>0.0566299557686</td><td>tyler</td><td>9</td><td>engineer</td><td>high</td></tr></table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Execute the vector query async\n",
    "results = await index.query(query)\n",
    "result_print(results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Check Index Stats\n",
    "Use the `rvl` CLI to check the stats for the index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Statistics:\n",
      "╭─────────────────────────────┬─────────────╮\n",
      "│ Stat Key                    │ Value       │\n",
      "├─────────────────────────────┼─────────────┤\n",
      "│ num_docs                    │ 4           │\n",
      "│ num_terms                   │ 0           │\n",
      "│ max_doc_id                  │ 4           │\n",
      "│ num_records                 │ 20          │\n",
      "│ percent_indexed             │ 1           │\n",
      "│ hash_indexing_failures      │ 0           │\n",
      "│ number_of_uses              │ 2           │\n",
      "│ bytes_per_record_avg        │ 1           │\n",
      "│ doc_table_size_mb           │ 0.00044632  │\n",
      "│ inverted_sz_mb              │ 1.90735e-05 │\n",
      "│ key_table_size_mb           │ 0.000138283 │\n",
      "│ offset_bits_per_record_avg  │ nan         │\n",
      "│ offset_vectors_sz_mb        │ 0           │\n",
      "│ offsets_per_term_avg        │ 0           │\n",
      "│ records_per_doc_avg         │ 5           │\n",
      "│ sortable_values_size_mb     │ 0           │\n",
      "│ total_indexing_time         │ 1.796       │\n",
      "│ total_inverted_index_blocks │ 11          │\n",
      "│ vector_index_sz_mb          │ 0.235603    │\n",
      "╰─────────────────────────────┴─────────────╯\n"
     ]
    }
   ],
   "source": [
    "!rvl stats -i user_simple"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cleanup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below we will clean up after our work. First, you can optionally flush all data from Redis associated with the index by\n",
    "using the `.clear()` method. This will leave the secondary index in place for future insertions or updates.\n",
    "\n",
    "But if you want to clean up everything, including the index, just use `.delete()`\n",
    "which will by default remove the index AND the underlying data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# (optionally) clear all data from Redis associated with the index\n",
    "await index.clear()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# but the index is still in place\n",
    "await index.exists()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "# remove / delete the index in its entirety\n",
    "await index.delete()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.8.13 ('redisvl2')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "9b1e6e9c2967143209c2f955cb869d1d3234f92dc4787f49f155f3abbdfb1316"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}