R-GAN: Exploring Human-like Way for Reasonable Text-to-Image Synthesis via Generative Adversarial Networks

Date

2021

Authors

Qiao, Y.
Chen, Q.
Deng, C.
DIng, N.
Qi, Y.
Tan, M.
Ren, X.
Wu, Q.

Editors

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Conference paper

Citation

Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp.2085-2093

Statement of Responsibility

Yanyuan Qiao, Qi Chen, Chaorui Deng, Ning Ding, Yuankai Qi, Mingkui Tan, Xincheng Ren, Qi Wu

Conference Name

ACM Multimedia Conference (MM) (20 Oct 2021 - 24 Oct 2021 : Virtual online China)

Abstract

Despite recent significant progress on generative models, context-rich text-to-image synthesis depicting multiple complex objects is still non-trivial. The main challenges lie in the ambiguous semantic of a complex description and the intricate scene of an image with various objects, different positional relationship and diverse appearances. To address these challenges, we propose R-GAN, which can generate reasonable images according to the given text in a human-like way. Specifically, just like humans will first find and settle the essential elements to create a simple sketch, we first capture a monolithic-structural text representation by building a scene graph to find the essential semantic elements. Then, based on this representation, we design a bounding box generator to estimate the layout with position and size of target objects, and a following shape generator, which draws a fine-detailed shape for each object. Different from previous work only generating coarse shapes blindly, we introduce a coarse-to-fine shape generator based on a shape knowledge base. At last, to finish the final image synthesis, we propose a multi-modal geometry-aware spatially-adaptive generator conditioned on the monolithic-structural text representation and the geometry-aware map of the shapes. Extensive experiments on the real-world dataset MSCOCO show the superiority of our method in terms of both quantitative and qualitative metrics.

School/Discipline

Dissertation Note

Provenance

Description

Access Status

Rights

© 2021 Association for Computing Machinery

License

Call number

Persistent link to this record