Learning to Contextually Aggregate Multi-Source Supervision for Sequence Labeling

Ouyu Lan∗, Xiao Huang∗, Bill Yuchen Lin, He Jiang, Liyuan Liu, and Xiang Ren

ACL, 2020

Paper Code Slides

Introduction

intro

Sequence labeling is a fundamental framework for various natural language processing problems. Its performance is largely influenced by the annotation quality and quantity in supervised learning scenarios. In many cases, obtaining ground truth labels is costly, but noisy annotations or annotations from different domains are accessible. Our intuition mainly comes from the phenomenon that different sources of supervision have different strengths. To better model this nature, we need to (1) explicitly model the unique traits of different sources when training and (2) find best suitable sources for generalizing the learned model on unseen sentences.

Method

method

We propose a novel framework, named Consensus Network (CONNET), for sequence labeling with multi-source supervisions. We represent the annotation patterns as different biases of annotators over a shared behavior pattern. Both annotator-invariant patterns and annotator-specific biases are modeled in a decoupled way. The first term comes through sharing part of low-level model parameters in a multi-task learning schema. For learning the biases, we decouple them from the model as the transformations on top-level tagging model parameters, such that they can capture the unique strength of each annotator. With such decoupled source representations, we further learn an attention network for dynamically assigning the best sources for every unseen sentence through composing a transformation that represents the agreement among sources (consensus).

Experiments

Extensive experimental results in two scenarios show that our model outperforms strong baseline methods, on various tasks and with different encoders.

Part of results on learning with crowd annotations: Part of results on learning with crowd annotations.

Part of results on unsupervised cross-domain adaptation: Part of results on unsupervised cross-domain adaptation.

To cite us

@inproceedings{
    lan2020learning,
    title={Learning to Contextually Aggregate Multi-Source Supervision for Sequence Labeling},
    author={Ouyu Lan and Xiao Huang and Bill Yuchen Lin and He Jiang and Liyuan Liu and Xiang Ren},
    booktitle={Proc. of ACL},
    year={2020},
    url={https://arxiv.org/abs/1910.04289}
    }

Contact

If you have any questions about the paper or the code, please feel free to contact Ouyu Lan (lanouyu at gmail dot com) or Xiao Huang (huan183 at usc dot edu).

Leave a Comment

Your email address will not be published. Required fields are marked *

Loading...