Digital Scholarly Editing aims at creating critical representations of historic documents by providing facsimiles, multiple analysis and functionalities. Text Encoding helps enrich the editions by supplying annotations and facilitating the reuse of the materials in different formats, contexts and by different users. It relies on Text Encoding Initiative (TEI), a de facto standard that maps textual segments to markup elements to make the text machine readable and computable. However, creating Digital Scholarly Editions implies a considerable workload for their development and maintenance. Thus, this research investigates whether Generative AI can be implemented in the editorial workflows to enhance, facilitate and partly automatize text encoding practices. It seeks to answer two research questions: 1. Is it possible to create a “universal prompt” that can be used to encode different textual genres? 2. Does the “universal prompt” work on different AI models? A heterogeneous corpus was encoded through Gemini and Mistral’s APIs using prompt engineering strategies. The AI-generated encodings were quantitatively evaluated against gold standards to assess the models’ performance. While the results show that Generative AI has strong capabilities in producing valid structural and semantic markup, a “human in the loop” is still required for validation, correction and enrichment.
Generative Artificial Intelligence for Digital Scholarly Editing: a case study on the use of Prompt Engineering for Text Encoding
CECERE, ROSSELLA
2024/2025
Abstract
Digital Scholarly Editing aims at creating critical representations of historic documents by providing facsimiles, multiple analysis and functionalities. Text Encoding helps enrich the editions by supplying annotations and facilitating the reuse of the materials in different formats, contexts and by different users. It relies on Text Encoding Initiative (TEI), a de facto standard that maps textual segments to markup elements to make the text machine readable and computable. However, creating Digital Scholarly Editions implies a considerable workload for their development and maintenance. Thus, this research investigates whether Generative AI can be implemented in the editorial workflows to enhance, facilitate and partly automatize text encoding practices. It seeks to answer two research questions: 1. Is it possible to create a “universal prompt” that can be used to encode different textual genres? 2. Does the “universal prompt” work on different AI models? A heterogeneous corpus was encoded through Gemini and Mistral’s APIs using prompt engineering strategies. The AI-generated encodings were quantitatively evaluated against gold standards to assess the models’ performance. While the results show that Generative AI has strong capabilities in producing valid structural and semantic markup, a “human in the loop” is still required for validation, correction and enrichment.| File | Dimensione | Formato | |
|---|---|---|---|
|
Rossella_Cecere_PDFA_thesis.pdf
embargo fino al 20/03/2027
Dimensione
13.54 MB
Formato
Adobe PDF
|
13.54 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14247/27562