A reward-driven analysis of autoregressive natural language generation.

This thesis presents a reward-oriented approach to autoregressive natural language generation (NLG), introducing a novel decoding strategy that reframes beam search as a reward optimization process. In this framework termed Reward Search, each candidate sequence (beam) acts as an autonomous decision unit, selecting its next token by maximizing a composite payoff function. This function integrates the language model's fluency, measured via log-probability, with external control signals such as toxicity reduction or sentiment promotion, computed through a fine-tuned auxiliary reward model. By embedding these constraints directly into the decoding process, Reward Search enables fine-grained control over generation while preserving fluency and coherence. Empirical evaluations across detoxification and sentiment-controlled tasks demonstrate that the proposed method effectively reduces harmful or off-target outputs without sacrificing generation quality, offering a principled alternative to conventional decoding algorithms for safer and more controllable NLG systems.

A reward-driven analysis of autoregressive natural language generation.

ALUKO, OLUWATOBILOBA TEMITOPE

2024/2025

Abstract

This thesis presents a reward-oriented approach to autoregressive natural language generation (NLG), introducing a novel decoding strategy that reframes beam search as a reward optimization process. In this framework termed Reward Search, each candidate sequence (beam) acts as an autonomous decision unit, selecting its next token by maximizing a composite payoff function. This function integrates the language model's fluency, measured via log-probability, with external control signals such as toxicity reduction or sentiment promotion, computed through a fine-tuned auxiliary reward model. By embedding these constraints directly into the decoding process, Reward Search enables fine-grained control over generation while preserving fluency and coherence. Empirical evaluations across detoxification and sentiment-controlled tasks demonstrate that the proposed method effectively reduces harmful or off-target outputs without sacrificing generation quality, offering a principled alternative to conventional decoding algorithms for safer and more controllable NLG systems.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
			
	Anno Accademico
	
				2024
			
	Relatore
	
				TRIPODI, ROCCO
			
	Correlatore
	
				PELILLO, MARCELLO
			
	Appare nelle tipologie:
	
				Laurea magistrale

File in questo prodotto:

File	Dimensione	Formato
Thesis Documentation.pdf accesso aperto Dimensione 912.24 kB Formato Adobe PDF Visualizza/Apri	912.24 kB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14247/25802