This paper addresses the problem of named entity recognition (NER) in travel-related search queries. NER is an important step toward a richer understanding of user-generated inputs in information retrieval systems. NER in queries is challenging due to minimal context and few structural clues. NER in restricted-domain queries is useful in vertical search applications, for example following query classification in general search. This paper describes an efficient machine learningbased solution for the high-quality extraction of semantic entities from query inputs in a restricted-domain information retrieval setting. We apply a conditional random field (CRF) sequence model to travel-domain search queries and achieve high-accuracy results. Our approach yields an overall F1 score of 86.4% on a heldout test set, outperforming a baseline score of 82.0% on a CRF with standard features. The resulting NER classifier is currently in use in a real-life travel search engine.
Download Full PDF Version (Non-Commercial Use)