Automatic, Meta and Human Evaluation for Multimodal Summarization with Multimodal Output

Date

2024

Authors

Zhuang, H.
Zhang, W.E.
Xie, L.
Chen, W.
Yang, J.
Sheng, Q.Z.

Editors

Duh, K.
Gomez, H.
Bethard, S.

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Conference paper

Citation

Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2024), 2024 / Duh, K., Gomez, H., Bethard, S. (ed./s), vol.1, pp.7761-7783

Statement of Responsibility

Haojie Zhuang, Wei Emma Zhang, Leon Xie, Weitong Chen, Jian Yang, Quan Z. Sheng

Conference Name

Conference of the North American Chapter of the Association for Computational Linguistics (16 Jun 2024 - 24 Jun 2024 : Hybrid and Mexico City)

Abstract

Multimodal summarization with multimodal output (MSMO) has attracted increasing research interests recently as multimodal summary could provide more comprehensive information compared to text-only summary, effectively improving the user experience and satisfaction. As one of the most fundamental components for the development of MSMO, evaluation is an emerging yet underexplored research topic. In this paper, we fill this gap and propose a research framework that studies three research questions of MSMO evaluation: (1) Automatic Evaluation: We propose a novel metric mLLM-EVAL, which utilizes multimodal Large LanguageModel for MSMO EVALuation. (2) Meta-Evaluation: We create a meta-evaluation benchmark dataset by collecting human-annotated scores for multimodal summaries. With our benchmark, we conduct meta-evaluation analysis to assess the quality of different evaluation metrics and show the effectiveness of our proposed mLLM-EVAL. (3) Human Evaluation: To provide more objective and unbiased human annotations for meta-evaluation, we hypothesize and verify three types of cognitive biases in human evaluation. We also incorporate our findings into the human annotation process in the metaevaluation benchmark. Overall, our research framework provides an evaluation metric, a meta-evaluation benchmark dataset annotated by humans and an analysis of cognitive biases in human evaluation, which we believe would serve as a valuable and comprehensive resource for the MSMO research community.¹

School/Discipline

Dissertation Note

Provenance

Description

Volume 1: Long Papers

Access Status

Rights

©2024 Association for Computational Linguistics. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.

License

Call number

Persistent link to this record