Mission detection aims to identify the sets of queries that users submit to a Web search engine in order to satisfy common information needs. Moreover, it is generally easier to figure out the actual search topic a user is interested in by leveraging signals coming from several queries (i.e., mission) instead of looking at each query separately. In this work, we present a system that automatically labels the search missions previously discovered from a search engine log using a set of predefined semantic categories, as provided by Wikipedia. Those are well-known categories, which have been already proposed in previous work since they cover almost every topic underneath search missions. Our solution consists of the following steps. First, we extract the set of Wikipedia articles (i.e., entities) from each single search mission using a state-of-the-art entity linking technique. This is achieved by representing a search mission as a virtual text document; such document is made of the queries composing the mission as well as the text included in the web page, which the user possibly clicked on in response to each query. Second, we retrieve the set of candidate Wikipedia categories that correspond to the set of entities extracted during the previous step. Finally, we rank the predefined set of target categories with respect to the candidates above using an unsupervised approach, and we therefore assign the highest ranked target category to each search mission. In our experiments, we use a dataset of 8,800 queries sampled from a real-world search engine log. Furthermore, such queries were already manually grouped into individual search missions, which in turn we use as input to our system. To evaluate the quality of our proposed solution, we conduct a user study where users are asked to manually evaluate the correctness of the labels assigned to the missions. This way it is possible to judge the goodness of our approach.

An Automatic Tool for Labeling Web Search Missions Using Wikipedia Categories

Gezzele, Marco
2014/2015

Abstract

Mission detection aims to identify the sets of queries that users submit to a Web search engine in order to satisfy common information needs. Moreover, it is generally easier to figure out the actual search topic a user is interested in by leveraging signals coming from several queries (i.e., mission) instead of looking at each query separately. In this work, we present a system that automatically labels the search missions previously discovered from a search engine log using a set of predefined semantic categories, as provided by Wikipedia. Those are well-known categories, which have been already proposed in previous work since they cover almost every topic underneath search missions. Our solution consists of the following steps. First, we extract the set of Wikipedia articles (i.e., entities) from each single search mission using a state-of-the-art entity linking technique. This is achieved by representing a search mission as a virtual text document; such document is made of the queries composing the mission as well as the text included in the web page, which the user possibly clicked on in response to each query. Second, we retrieve the set of candidate Wikipedia categories that correspond to the set of entities extracted during the previous step. Finally, we rank the predefined set of target categories with respect to the candidates above using an unsupervised approach, and we therefore assign the highest ranked target category to each search mission. In our experiments, we use a dataset of 8,800 queries sampled from a real-world search engine log. Furthermore, such queries were already manually grouped into individual search missions, which in turn we use as input to our system. To evaluate the quality of our proposed solution, we conduct a user study where users are asked to manually evaluate the correctness of the labels assigned to the missions. This way it is possible to judge the goodness of our approach.
2014-03-11
File in questo prodotto:
File Dimensione Formato  
810845-1170139.pdf

accesso aperto

Tipologia: Altro materiale allegato
Dimensione 3.77 MB
Formato Adobe PDF
3.77 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14247/374