2020

Gazetteer Generation for Neural Named Entity Recognition


Abstract

We present a way to generate gazetteers from the Wikidata knowledge graph and use the lists to improve a neural NER system by adding an input feature indicating that a word is part of a name in the gazetteer. We empirically show that the approach yields performance gains in two distinct languages: a high-resource, word-based language, English and a highresource, character-based language, Chinese. We apply the approach to a low-resource language, Russian, using a new annotated Russian NER corpus from Reddit tagged with four core and eleven extended types, and show a baseline score.


Citation

@onlineSong_2020 author: Song Chan Hee and Lawrie Dawn and Finin Tim and Mayfield James title: Improving Neural Named Entity Recognition with Gazetteers year: 2020 month: Mar eprinttype: arXiv eprint: 2003.03072v1 howpublished: arXiv:2003.03072v1 url: http://arxiv.org/abs/2003.03072v1

Citation

@onlineSong_2020 author: Song Chan Hee and Lawrie Dawn and Finin Tim and Mayfield James title: Improving Neural Named Entity Recognition with Gazetteers year: 2020 month: Mar eprinttype: arXiv eprint: 2003.03072v1 howpublished: arXiv:2003.03072v1 url: http://arxiv.org/abs/2003.03072v1