This thesis introduces two new Garak plugins and the results of testing them against fine-tuned open-source LLMs. Garak is a tool designed by NVIDIA to red-team LLMs to enhance their safety and security. The plugins developed are ProPile and PIIsDetector and they are respectively a probe and a detector. The first plugin, ProPile, enables real-world interactions with the LLM generating a variety of prompts asking for Personal Identifiable Information (PII) about people of which we know at least one information. The second plugin, PIIsDetector, identifies the presence of personal information within the LLM's response as credit card numbers, phone numbers, IP addresses, and many others. The goal of developing these plugins is to rigorously test LLMs to test their reliability in handling personal information securely without disclosing it. The efficiency of ProPile and PIIsDetector was tested against some open-source fine-tuned LLMs. During this testing phase, the models were fine-tuned over a dataset full of fictitious PIIs generated using a Python library related to two thousand fake users. More in specific, the information contained in the dataset includes name and surname, physical addresses, e-mail addresses, phone numbers, credit card numbers, CVVs (Card Verification Value), expiration dates, and IP addresses. At the end of the testing phase, the plugins are executed against the previously fine-tuned LLMs, stored on HugginFace, to test their efficiency in extracting and discovering the presence of Personal Identifiable Information in the generated responses.
Privacy and Security in Large Language Models: Detecting PII Leakage Based on the OWASP Top 10
BEDON, FILIPPO
2023/2024
Abstract
This thesis introduces two new Garak plugins and the results of testing them against fine-tuned open-source LLMs. Garak is a tool designed by NVIDIA to red-team LLMs to enhance their safety and security. The plugins developed are ProPile and PIIsDetector and they are respectively a probe and a detector. The first plugin, ProPile, enables real-world interactions with the LLM generating a variety of prompts asking for Personal Identifiable Information (PII) about people of which we know at least one information. The second plugin, PIIsDetector, identifies the presence of personal information within the LLM's response as credit card numbers, phone numbers, IP addresses, and many others. The goal of developing these plugins is to rigorously test LLMs to test their reliability in handling personal information securely without disclosing it. The efficiency of ProPile and PIIsDetector was tested against some open-source fine-tuned LLMs. During this testing phase, the models were fine-tuned over a dataset full of fictitious PIIs generated using a Python library related to two thousand fake users. More in specific, the information contained in the dataset includes name and surname, physical addresses, e-mail addresses, phone numbers, credit card numbers, CVVs (Card Verification Value), expiration dates, and IP addresses. At the end of the testing phase, the plugins are executed against the previously fine-tuned LLMs, stored on HugginFace, to test their efficiency in extracting and discovering the presence of Personal Identifiable Information in the generated responses.File | Dimensione | Formato | |
---|---|---|---|
Thesis-nothanks_pdf-a.pdf
accesso aperto
Dimensione
1.05 MB
Formato
Adobe PDF
|
1.05 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14247/24893