Zoumana KEITA














Sign up
Beta
Spinner

Code-along 2023-12-22 Fine-tuning GPT3.5 with the OpenAI API Questions

Some background context

Fine-tuning models lets you customize them for new tasks. By fine-tuning GPT3.5, you can improve the accuracy of its response, use it to find a particular tone of voice, talk about niche topics, and more. This first of this two-part series, covers how to use the OpenAI API and Python to get started fine-tuning GPT3.5

Data

This case study uses the Yahoo Non-Factoid Question Dataset derived from the Yahoo’s Webscope L6 collection

  • It has 87,361 questions and their corresponding answers.
  • Freely available from Hugging Face.

Main tasks

The main tasks include

  • Loading data from Hugging Face
  • Preprocess the data for fine-tuning
  • Fine-tune the GPT3.5 model
  • Interaction with the fine-tuned model

Target audience

This case study The case study would be of interest to:

  • AI and Machine Learning Enthusiasts
  • Data Scientists and Analysts
  • Academics and Students
  • Industry Professionals
  • Software Developers

Key takeaways:

  • Learn when fine-tuning large language models can be beneficial
  • How to use the fine-tuning tools in the OpenAI API
  • Understand the workflow for fine-tuning

Task 0: Installing and Importing Relevant Packages

The main packages that need to be installed are:

  • datasets: to load datasets from Hugging Face.
  • openai: to interact with OpenAI models and built-in function.
  • time: used to track the fine-tuning time.
  • random: to select random observations from the training data.
  • json: the format of the training and validation data.

Instructions

Complete the following tasks to successfully complete the packages installation

  • Use the --upgrade option to install pip command using python
  • Install the datasets package
  • Install the version 0.28 of the openai package
%%bash 
python3 -m pip install --upgrade pip
pip -q install -U datasets
pip -q install openai==0.28
  • Note: Restart the kernel from the top left tap by selecting
    • Run > Restart kernel
  • This ensures that all the changes are successfully performed
  • Import the following packages
    • FineTuningJob and ChatCompletion from openai
    • load_dataset function from datasets
    • sleep from time
    • random
    • json
from openai import FineTuningJob, ChatCompletion
from datasets import load_dataset 
from time import sleep
import random 
import json

Task 1: Data Loading

In this section, you will load the yahoo_answers_qa dataset from Hugging Face using the load_dataset function.

Instructions

  • Acquire the train split of the yahoo_answers_qa data
yahoo_answers_qa = load_dataset("yahoo_answers_qa", split="train")
  • Check the features/columns and the total number of rows of the data
yahoo_answers_qa
  • From the above command, you will notice that there are 87362 rows from the dataset, and such a huge amount of data can be long to process, especially during the fine-tuning process.
  • For simplicity's sake, let's use a subset of 150 rows from the previously loaded dataset.
    • Use the .select and the range functions to select a subset of 150 rows



  • AI Chat
  • Code