This repo includes:
- A Gazetteer of tokens and NE tags annotated by 3 domain experts
- A Corpus of 475,085 job titles crawled from Linkedin, with NE tags prefixed using BIOES schemes
- Title2Vec pre-trained job title embedding finetuned from ELMo. Checkpoint available for Download.
Please cite the following papers when using IPOD:
@inproceedings{liu2020ipod,
title={IPOD: A Large-scale Industrial and Professional Occupation Dataset},
author={Liu, Junhua and Ng, Yung Chuen and Wood, Kristin L. and Lim, Kwan Hui},
booktitle={Proceedings of the 2020 ACM Conference on Computer Supported Cooperative Work and Social Computing Companion (CSCW'20)},
pages={323--328},
year={2020}
}