Application of Large Language Model for Business-Like Code Generation and Repetition Degeneration Problem

The advent of Large Language Models (LLMs) has introduced new opportunities for automated code generation, but their general-purpose nature often fails to meet the strict requirements of enterprise software development. This thesis focuses on the application of LLM fine-tuning for business-oriented code generation in the context of Previnet S.p.A., where the target language is Perl, a widely used technology for batch processing, reporting, and legacy system integration. The proposed model leverages fine-tuning and prompt engineering to adapt the foundation model to Previnet’s needs, evaluating their results through a dedicated evaluation system. Results highlight the potential of fine-tuned LLMs for Perl code generation, but, at the same time, they find a critical limitation known as ”repetition degeneration”, a phenomenon in which the model produces redundant or looping code patterns, causing code degeneration. This problem is analyzed in detail and addressed through the implementation of some inference-time mitigation strategies. In conclusion, the contributions provided by this thesis are twofold: the field of automated code generation by bridging the gap between general-purpose LLMs and business-specific requirements; and the underexplored challenge of ”repetition degeneration”.

Application of Large Language Model for Business-Like Code Generation and Repetition Degeneration Problem

FORTIN, LUCA

2024/2025

Abstract

The advent of Large Language Models (LLMs) has introduced new opportunities for automated code generation, but their general-purpose nature often fails to meet the strict requirements of enterprise software development. This thesis focuses on the application of LLM fine-tuning for business-oriented code generation in the context of Previnet S.p.A., where the target language is Perl, a widely used technology for batch processing, reporting, and legacy system integration. The proposed model leverages fine-tuning and prompt engineering to adapt the foundation model to Previnet’s needs, evaluating their results through a dedicated evaluation system. Results highlight the potential of fine-tuned LLMs for Perl code generation, but, at the same time, they find a critical limitation known as ”repetition degeneration”, a phenomenon in which the model produces redundant or looping code patterns, causing code degeneration. This problem is analyzed in detail and addressed through the implementation of some inference-time mitigation strategies. In conclusion, the contributions provided by this thesis are twofold: the field of automated code generation by bridging the gap between general-purpose LLMs and business-specific requirements; and the underexplored challenge of ”repetition degeneration”.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				INFORMATICA - COMPUTER SCIENCE
			
	Anno Accademico
	
				2024
			
	Relatore
	
				VASCON, SEBASTIANO
			
	Appare nelle tipologie:
	
				Laurea magistrale

File in questo prodotto:

File	Dimensione	Formato
Master_Thesis.pdf non disponibili Dimensione 2.28 MB Formato Adobe PDF	2.28 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14247/26981