This thesis proposes a standardized, replicable strategy for evaluating how generative artificial intelligence systems translate dialects and other non‑ standard varieties, using the Sicilian of Andrea Camilleri as a case study. It constructs a parallel dataset consisting of original Italian excerpts from Camilleri’s works, Stephen Sartarelli’s official English translations, and translations generated by two AI models (ChatGPT and Gemini). Building on this corpus, the research combines automatic comparison of AI outputs with human reception data and a customized evaluation framework tailored to dialectal phenomena (customized MQM framework). The resulting annotated corpus functions both as an evaluation tool and as a prototype model that can be extended to future studies on dialect and non‑standard variety translation. By integrating qualitative annotation with quantitative techniques such as word embeddings and cosine similarity, the thesis identifies systematic patterns in AI translation errors and proposes a replicable standard for subsequent research in the field of Digital and Public Humanities.
Questa tesi propone una strategia standardizzata e replicabile per valutare come i sistemi di intelligenza artificiale generativa traducano dialetti e altre varietà non standard, utilizzando il siciliano di Andrea Camilleri come caso di studio. Essa costruisce un corpus parallelo composto da estratti originali in italiano tratti dalle opere di Camilleri, dalle traduzioni ufficiali in inglese di Stephen Sartarelli e dalle traduzioni generate da due modelli di intelligenza artificiale (ChatGPT e Gemini). A partire da questo corpus, la ricerca combina il confronto automatico degli output delle IA con dati di ricezione umana e un quadro di valutazione personalizzato, adattato ai fenomeni dialettali (framework MQM personalizzato). Il corpus annotato risultante funziona sia come strumento di valutazione, sia come modello prototipico estendibile a studi futuri sulla traduzione di dialetti e varietà non standard. Integrando l’annotazione qualitativa con tecniche quantitative come il word embedding, la tesi identifica schemi sistematici negli errori di traduzione delle IA e propone uno standard replicabile per la ricerca successiva nel campo delle Digital e Public Humanities.
Assessing AI Renderings of Dialectal Language: A Methodological Proposal Based on Camilleri’s Sicilian
SARDO, SIMONA
2024/2025
Abstract
This thesis proposes a standardized, replicable strategy for evaluating how generative artificial intelligence systems translate dialects and other non‑ standard varieties, using the Sicilian of Andrea Camilleri as a case study. It constructs a parallel dataset consisting of original Italian excerpts from Camilleri’s works, Stephen Sartarelli’s official English translations, and translations generated by two AI models (ChatGPT and Gemini). Building on this corpus, the research combines automatic comparison of AI outputs with human reception data and a customized evaluation framework tailored to dialectal phenomena (customized MQM framework). The resulting annotated corpus functions both as an evaluation tool and as a prototype model that can be extended to future studies on dialect and non‑standard variety translation. By integrating qualitative annotation with quantitative techniques such as word embeddings and cosine similarity, the thesis identifies systematic patterns in AI translation errors and proposes a replicable standard for subsequent research in the field of Digital and Public Humanities.| File | Dimensione | Formato | |
|---|---|---|---|
|
Assessing AI Renderings of Dialectal Language.pdf
non disponibili
Dimensione
3.65 MB
Formato
Adobe PDF
|
3.65 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14247/27564