Generative Artificial Intelligence for Digital Scholarly Editing: a case study on the use of Prompt Engineering for Text Encoding

Digital Scholarly Editing aims at creating critical representations of historic documents by providing facsimiles, multiple analysis and functionalities. Text Encoding helps enrich the editions by supplying annotations and facilitating the reuse of the materials in different formats, contexts and by different users. It relies on Text Encoding Initiative (TEI), a de facto standard that maps textual segments to markup elements to make the text machine readable and computable. However, creating Digital Scholarly Editions implies a considerable workload for their development and maintenance. Thus, this research investigates whether Generative AI can be implemented in the editorial workflows to enhance, facilitate and partly automatize text encoding practices. It seeks to answer two research questions: 1. Is it possible to create a “universal prompt” that can be used to encode different textual genres? 2. Does the “universal prompt” work on different AI models? A heterogeneous corpus was encoded through Gemini and Mistral’s APIs using prompt engineering strategies. The AI-generated encodings were quantitatively evaluated against gold standards to assess the models’ performance. While the results show that Generative AI has strong capabilities in producing valid structural and semantic markup, a “human in the loop” is still required for validation, correction and enrichment.

Generative Artificial Intelligence for Digital Scholarly Editing: a case study on the use of Prompt Engineering for Text Encoding

CECERE, ROSSELLA

2024/2025

Abstract

Digital Scholarly Editing aims at creating critical representations of historic documents by providing facsimiles, multiple analysis and functionalities. Text Encoding helps enrich the editions by supplying annotations and facilitating the reuse of the materials in different formats, contexts and by different users. It relies on Text Encoding Initiative (TEI), a de facto standard that maps textual segments to markup elements to make the text machine readable and computable. However, creating Digital Scholarly Editions implies a considerable workload for their development and maintenance. Thus, this research investigates whether Generative AI can be implemented in the editorial workflows to enhance, facilitate and partly automatize text encoding practices. It seeks to answer two research questions: 1. Is it possible to create a “universal prompt” that can be used to encode different textual genres? 2. Does the “universal prompt” work on different AI models? A heterogeneous corpus was encoded through Gemini and Mistral’s APIs using prompt engineering strategies. The AI-generated encodings were quantitatively evaluated against gold standards to assess the models’ performance. While the results show that Generative AI has strong capabilities in producing valid structural and semantic markup, a “human in the loop” is still required for validation, correction and enrichment.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				DIGITAL AND PUBLIC HUMANITIES
			
	Anno Accademico
	
				2024
			
	Relatore
	
				FISCHER, FRANZ
			
	Correlatore
	
				MINELLO, GIORGIA
			
	Appare nelle tipologie:
	
				Laurea magistrale

File in questo prodotto:

File	Dimensione	Formato
Rossella_Cecere_PDFA_thesis.pdf embargo fino al 20/03/2027 Dimensione 13.54 MB Formato Adobe PDF	13.54 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14247/27562