Hybrid Data Augmentation for Citation Function Classification

dc.contributor.authorZhang, Y.
dc.contributor.authorWang, Y.
dc.contributor.authorSheng, Q.Z.
dc.contributor.authorMahmood, A.
dc.contributor.authorZhang, W.E.
dc.contributor.authorZhao, R.
dc.contributor.conferenceInternational Joint Conference on Neural Networks (IJCNN) (18 Jun 2023 - 23 Jun 2023 : Gold Coast, Australia)
dc.date.issued2023
dc.description.abstractThe citation function generally signifies the purpose or reason underlying a citation within a scholarly paper or a research article. Automatic citation function classification is, therefore, a task in computational linguistics and information science that can facilitate further applications in reference research, citation recommendation, and evaluation of research activities. By taking into account the state of the art, we identify two major constraints pertinent to the data of the citation function classification task, i.e., data imbalance and data sparsity. On the one hand, the natural distribution of different types of citations in one scientific literature is uneven leading to data imbalance in the real scenario. On the other hand, the citation function data is generally labeled by an expert which takes huge human effort resulting in a limited data scale. To this end, in this paper, we propose HybridDA, a two-stage model based on GPT-2 data argumentation and data retrieval to synthesize more high-quality annotated citation function data in a bid to solve both data imbalance and data sparsity problems. We conduct experiments on imbalance setting and low resource setting with our proposed approach. The experimental results on both of these settings demonstrate that our proposed model can achieve competitive performance in contrast to the other baseline models.
dc.description.statementofresponsibilityYang Zhang, Yufei Wang, Quan Z. Sheng, Adnan Mahmood, Wei Emma Zhang, Rongying Zhao
dc.identifier.citationInternational Joint Conference on Neural Networks, 2023, vol.2023-June, pp.1-8
dc.identifier.doi10.1109/IJCNN54540.2023.10191695
dc.identifier.isbn9781665488679
dc.identifier.issn2161-4393
dc.identifier.orcidZhang, W.E. [0000-0002-0406-5974]
dc.identifier.urihttps://hdl.handle.net/2440/139660
dc.language.isoen
dc.publisherIEEE
dc.publisher.placeOnline
dc.relation.granthttp://purl.org/au-research/grants/arc/DP200102298
dc.relation.ispartofseriesIEEE International Joint Conference on Neural Networks (IJCNN)
dc.rights© 2023, IEEE
dc.source.urihttps://doi.org/10.1109/ijcnn54540.2023.10191695
dc.subjectCitation Function; Data Imbalance; Low Resource; Data Augmentation
dc.titleHybrid Data Augmentation for Citation Function Classification
dc.typeConference paper
pubs.publication-statusPublished

Files