Fine-Tuning Open-Source Large Language Models for Generating Math Explanations

Godinez, Paul; Hoffberg, Eli; Xiang, Neena

Student Work

Fine-Tuning Open-Source Large Language Models for Generating Math Explanations

公开 Deposited

Percy Liang’s article, “We have No Moat,” reveals that open-source large language models (LLMs) with 7 billion parameters are able to rival those of large tech companies with 500 billion parameters. Open-source LLMs have also become more accessible and easier to fine-tune with the rise of open-source resources like Hugging Face. Through the use of prompt engineering and fine-tuning, the goal of this project was to find and evaluate LLMs to potentially match the performance of OpenAI’s GPT-3.5. We aim to help ASSISTments, a non-profit organization that focuses on middle-school math education, in developing open-source LLMs to transition from tedious and somewhat inaccurate hand-written explanations to streamlined automatically generated ones. Open source LLMs offer a more cost-effective option compared to GPT-3.5 and a more time-efficient option compared to generating explanations by hand. ASSISTments has already started working on integrating LLMs into their website, and our focus was on improving the explanation generating LLMs. Leveraging a framework of prompt engineering and fine-tuning LLMs, we tested and evaluated the effectiveness of many models in writing accurate math explanations. During prompt engineering, we double-blinded the responses for each prompt and evaluated each response. This double-blind process allowed us to determine the score in an unbiased manner. Through an iterative process, we were able to see up to 80% improvement with our best prompts compared to just giving a labeled question-answer pair to prompt the LLM. Performing fine-tuning, we determined that we were unable to significantly improve a WizardMath’s mathematical reasoning, but fine-tuning was highly effective in producing consistently formatted answers which gave the explanations more readability compared to the base WizardMath. This framework was ultimately used to compare the performance of 3 LLMs in generating explanations to ASSISTments questions. We found that the fine-tuned model improved the base model by about 5%, while GPT-3.5 outperformed the base model by roughly 45%. Our results show promise in utilizing LLMs for generating accurate and readable explanations. Furthermore, our fine-tuning and prompt engineering framework can be utilized in other fields in which LLMs can be integrated in order to optimize the performance of the LLMs.

This report represents the work of one or more WPI undergraduate students submitted to the faculty as evidence of completion of a degree requirement. WPI routinely publishes these reports on its website without editorial or peer review.

Creator

Publisher

Worcester Polytechnic Institute

Identifier

E-project-022824-174239
117968

关键词

Advisor

Heffernan, Neil

Year

2024

Date created

2024-02-28

Resource type

Major Qualifying Project

Major

Computer Science

Source

E-project-022824-174239

Rights statement

In Copyright

关系

属于 Collection:

Major Qualifying Projects

项目

单件

缩略图	标题	公开度	Embargo Release Date	行动
	Fine-Tuning_Open-Source_LLMs_to_Generate_Math_Explanations_Project_Report__2_.pdf	公开		下载
	Fine-Tuning_Open-Source_LLMs_to_Generate_Math_Explanations_Project_Poster.pptx	公开		下载

Permanent link to this page: https://digital.wpi.edu/show/cr56n536x

Explore, Discover, Share

Fine-Tuning Open-Source Large Language Models for Generating Math Explanations

可下载的内容

关系

项目

单件