RAG Chatbot System

Abstract This thesis presents the RAG Chatbot System, a secure, local document-based conversational AI platform designed to address the growing organizational need for intelligent document processing without compromising data privacy or regulatory compliance. Problem Statement and Motivation The exponential growth of digital content has made it increasingly difficult to efficiently access and utilize information stored in heterogeneous formats such as PDF, CSV, and JSON. Traditional keyword-based search methods often fail to address complex queries, while cloud-based AI systems require sharing sensitive data with external services, creating significant barriers for regulated industries. Approach and Architecture The system implements a modular client-server architecture, combining Retrieval-Augmented Generation (RAG) with local document processing. The RAG approach integrates three core components: hybrid information retrieval (combining semantic embedding-based search with BM25 keyword search), context augmentation through consolidation of relevant passages and conversational history, and response generation via external LLM APIs while maintaining data confidentiality. Implementation and Technologies The implementation utilizes LangChain for RAG pipeline orchestration, HuggingFace for embedding generation (BAAI/bge-base-en-v1.5 model), Flask for web interface, and standardized JSON communication. The system includes GPU acceleration for processing, secure credential management, and multi-format document support for heterogeneous collections. Evaluation and Results Systematic evaluation across three benchmark datasets (SQuAD, MS MARCO, Natural Questions) demonstrates the effectiveness of the hybrid approach. The optimal configuration (30% sparse, 70% dense) achieves MRR of 0.805 on SQuAD with Recall@10 above 0.97, while MS MARCO reaches MRR@10 of 0.250. GPU acceleration provides 4.2× performance improvement for 1,000 document chunks. Qualitative evaluation via LLM-as-Judge reveals low hallucination rates (0.8% on SQuAD, 6.2% on MS MARCO) and high faithfulness scores (average 4.93 and 4.79 respectively). Contributions and Extensibility The system demonstrates extensible architecture through successful implementation of video generation capabilities, validating the modular approach for integrating new features. The video extension transforms RAG conversations into narrated content using automatically generated scripts and avatar synthesis via HeyGen API. Conclusions and Impact This research contributes a practical solution for enterprise conversational AI that balances advanced functionality with rigorous security requirements. The system offers a local alternative to cloud services for organizations needing intelligent document processing while maintaining complete control over sensitive data, with architecture ready for production deployment and enterprise scalability. Keywords: Retrieval-Augmented Generation, local document processing, conversational AI, data security, modular architecture, hybrid retrieval, privacy-preserving AI.

RAG Chatbot System

ASTRINO, PAOLO

2024/2025

Abstract

Abstract This thesis presents the RAG Chatbot System, a secure, local document-based conversational AI platform designed to address the growing organizational need for intelligent document processing without compromising data privacy or regulatory compliance. Problem Statement and Motivation The exponential growth of digital content has made it increasingly difficult to efficiently access and utilize information stored in heterogeneous formats such as PDF, CSV, and JSON. Traditional keyword-based search methods often fail to address complex queries, while cloud-based AI systems require sharing sensitive data with external services, creating significant barriers for regulated industries. Approach and Architecture The system implements a modular client-server architecture, combining Retrieval-Augmented Generation (RAG) with local document processing. The RAG approach integrates three core components: hybrid information retrieval (combining semantic embedding-based search with BM25 keyword search), context augmentation through consolidation of relevant passages and conversational history, and response generation via external LLM APIs while maintaining data confidentiality. Implementation and Technologies The implementation utilizes LangChain for RAG pipeline orchestration, HuggingFace for embedding generation (BAAI/bge-base-en-v1.5 model), Flask for web interface, and standardized JSON communication. The system includes GPU acceleration for processing, secure credential management, and multi-format document support for heterogeneous collections. Evaluation and Results Systematic evaluation across three benchmark datasets (SQuAD, MS MARCO, Natural Questions) demonstrates the effectiveness of the hybrid approach. The optimal configuration (30% sparse, 70% dense) achieves MRR of 0.805 on SQuAD with Recall@10 above 0.97, while MS MARCO reaches MRR@10 of 0.250. GPU acceleration provides 4.2× performance improvement for 1,000 document chunks. Qualitative evaluation via LLM-as-Judge reveals low hallucination rates (0.8% on SQuAD, 6.2% on MS MARCO) and high faithfulness scores (average 4.93 and 4.79 respectively). Contributions and Extensibility The system demonstrates extensible architecture through successful implementation of video generation capabilities, validating the modular approach for integrating new features. The video extension transforms RAG conversations into narrated content using automatically generated scripts and avatar synthesis via HeyGen API. Conclusions and Impact This research contributes a practical solution for enterprise conversational AI that balances advanced functionality with rigorous security requirements. The system offers a local alternative to cloud services for organizations needing intelligent document processing while maintaining complete control over sensitive data, with architecture ready for production deployment and enterprise scalability. Keywords: Retrieval-Augmented Generation, local document processing, conversational AI, data security, modular architecture, hybrid retrieval, privacy-preserving AI.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				DATA ANALYTICS FOR BUSINESS AND SOCIETY
			
	Anno Accademico
	
				2024
			
	Relatore
	
				PESENTI, RAFFAELE
			
	Appare nelle tipologie:
	
				Laurea magistrale

File in questo prodotto:

File	Dimensione	Formato
RAG_Chatbot_System_PaoloAstrino.pdf accesso aperto Dimensione 3.7 MB Formato Adobe PDF Visualizza/Apri	3.7 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14247/26369