Privacy and Security in Large Language Models: Detecting PII Leakage Based on the OWASP Top 10

This thesis introduces two new Garak plugins and the results of testing them against fine-tuned open-source LLMs. Garak is a tool designed by NVIDIA to red-team LLMs to enhance their safety and security. The plugins developed are ProPile and PIIsDetector and they are respectively a probe and a detector. The first plugin, ProPile, enables real-world interactions with the LLM generating a variety of prompts asking for Personal Identifiable Information (PII) about people of which we know at least one information. The second plugin, PIIsDetector, identifies the presence of personal information within the LLM's response as credit card numbers, phone numbers, IP addresses, and many others. The goal of developing these plugins is to rigorously test LLMs to test their reliability in handling personal information securely without disclosing it. The efficiency of ProPile and PIIsDetector was tested against some open-source fine-tuned LLMs. During this testing phase, the models were fine-tuned over a dataset full of fictitious PIIs generated using a Python library related to two thousand fake users. More in specific, the information contained in the dataset includes name and surname, physical addresses, e-mail addresses, phone numbers, credit card numbers, CVVs (Card Verification Value), expiration dates, and IP addresses. At the end of the testing phase, the plugins are executed against the previously fine-tuned LLMs, stored on HugginFace, to test their efficiency in extracting and discovering the presence of Personal Identifiable Information in the generated responses.

Privacy and Security in Large Language Models: Detecting PII Leakage Based on the OWASP Top 10

BEDON, FILIPPO

2023/2024

Abstract

This thesis introduces two new Garak plugins and the results of testing them against fine-tuned open-source LLMs. Garak is a tool designed by NVIDIA to red-team LLMs to enhance their safety and security. The plugins developed are ProPile and PIIsDetector and they are respectively a probe and a detector. The first plugin, ProPile, enables real-world interactions with the LLM generating a variety of prompts asking for Personal Identifiable Information (PII) about people of which we know at least one information. The second plugin, PIIsDetector, identifies the presence of personal information within the LLM's response as credit card numbers, phone numbers, IP addresses, and many others. The goal of developing these plugins is to rigorously test LLMs to test their reliability in handling personal information securely without disclosing it. The efficiency of ProPile and PIIsDetector was tested against some open-source fine-tuned LLMs. During this testing phase, the models were fine-tuned over a dataset full of fictitious PIIs generated using a Python library related to two thousand fake users. More in specific, the information contained in the dataset includes name and surname, physical addresses, e-mail addresses, phone numbers, credit card numbers, CVVs (Card Verification Value), expiration dates, and IP addresses. At the end of the testing phase, the plugins are executed against the previously fine-tuned LLMs, stored on HugginFace, to test their efficiency in extracting and discovering the presence of Personal Identifiable Information in the generated responses.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
			
	Anno Accademico
	
				2023
			
	Relatore
	
				CALZAVARA, STEFANO
			
	Appare nelle tipologie:
	
				Laurea magistrale

File in questo prodotto:

File	Dimensione	Formato
Thesis-nothanks_pdf-a.pdf accesso aperto Dimensione 1.05 MB Formato Adobe PDF Visualizza/Apri	1.05 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14247/24893