My Python workspace (copy)
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Weaviate workshop

    Goals:

    What you will see:
    • Create a vector database with Weaviate,
    • Add data to the database, and
    • Interact with the data, including searching, and using LLMs with your data in Weaviate
    You will learn today:

    Preparation

    Install the Weaviate python client, for environments that don't yet have it.

    !pip install -U weaviate-client

    Get the data

    We'll use a subset of the Jeopardy! quiz dataset:

    https://www.kaggle.com/datasets/tunguz/200000-jeopardy-questions

    Pre-processed version:

    https://raw.githubusercontent.com/databyjp/wv_demo_uploader/main/weaviate_datasets/data/jeopardy_1k.json

    Load (or download) the data, and preview it

    import requests
    import json
    
    def load_data():
        with open("jeopardy_1k.json", "r") as f:
            raw_data = f.read()
        return raw_data
    
    def download_data():
        response = requests.get('https://raw.githubusercontent.com/databyjp/wv_demo_uploader/main/weaviate_datasets/data/jeopardy_1k.json')
        raw_data = response.text
        return raw_data
    
    # Parse the JSON and preview it
    json_data = load_data()
    data = json.loads(json_data)
    print(type(data), len(data))
    print(json.dumps(data[0], indent=2))

    Step 1: Create a Weaviate instance (database)

    We'll use Embedded Weaviate - this is a quick way to create a Weaviate database.

    You can also use:

    • A free sandbox with Weaviate Cloud Services
    • Open-source Weaviate directly, available cross-platform with Docker
    • Or use Kubernetes in production :)
    #Temporary Key for live streaming
    

    Create a helper function as we'll be dealing with JSON responses a lot