The thesis focuses on the optimization of an existing algorithm called Treant for the generation of robust decision trees. Despite its good performances from the machine learning point of view, unfortunately, the code presented some strong limitations when employed with big datasets. The algorithm was originally written in Python, a very good programming language for fast prototyping but, as well as many other interpreted languages, it can lead to poor performances when it is asked to crunch a big amount of numbers if not supported by appropriated libraries. The code has been translated to the C++ compiled language, it has been parallelized with the OpenMP library, along with other optimizations regarding the memory management and the choice of third party libraries. A python module has been generated from the C++ code in order to expose an interface for the efficient C++ classes and use them as native Python classes. In this way, any python user can exploit both the Python flexibility and the C++ performances.
Efficient implementation of Treant: a robust decision tree learning algorithm
Girardini, Davide
2020/2021
Abstract
The thesis focuses on the optimization of an existing algorithm called Treant for the generation of robust decision trees. Despite its good performances from the machine learning point of view, unfortunately, the code presented some strong limitations when employed with big datasets. The algorithm was originally written in Python, a very good programming language for fast prototyping but, as well as many other interpreted languages, it can lead to poor performances when it is asked to crunch a big amount of numbers if not supported by appropriated libraries. The code has been translated to the C++ compiled language, it has been parallelized with the OpenMP library, along with other optimizations regarding the memory management and the choice of third party libraries. A python module has been generated from the C++ code in order to expose an interface for the efficient C++ classes and use them as native Python classes. In this way, any python user can exploit both the Python flexibility and the C++ performances.File | Dimensione | Formato | |
---|---|---|---|
865919-1236420.pdf
accesso aperto
Tipologia:
Altro materiale allegato
Dimensione
1.53 MB
Formato
Adobe PDF
|
1.53 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14247/6280