MVGFormer: Multi-view perspective with graph-guided transformer for cryo-ET segmentation

Li, H.; Li, X.; Wang, H.; Shi, J.; Chen, H.; Zhao, Y.; Du, B.; Barthelemy, J.; Kihara, D.; Shen, J.; Xu, M.

doi:10.1016/j.knosys.2025.114810

MVGFormer: Multi-view perspective with graph-guided transformer for cryo-ET segmentation

dc.contributor.author	Li, H.
dc.contributor.author	Li, X.
dc.contributor.author	Wang, H.
dc.contributor.author	Shi, J.
dc.contributor.author	Chen, H.
dc.contributor.author	Zhao, Y.
dc.contributor.author	Du, B.
dc.contributor.author	Barthelemy, J.
dc.contributor.author	Kihara, D.
dc.contributor.author	Shen, J.
dc.contributor.author	Xu, M.
dc.date.issued	2026
dc.description.abstract	Cryo-Electron Tomography (cryo-ET) is a cutting-edge 3D imaging technology that enables detailed examination of biological macromolecular structures at near-atomic resolution. Recent deep learning applications on cryo-ET, such as cryo-ET segmentation, have drawn widespread interest for their potential to improve particle alignment, classification, and other tasks. However, current methods heavily rely on convolutional architectures, which prioritize local information while neglecting the global structural information inherent in cryo-ET data. Transformer-based models, known for their large receptive field, have become the de-facto design for 2D vision tasks due to their ability to effectively capture global information. This approach is also well-suited for 3D tasks, given the complex nature of 3D objects. Based on this, we extend 2D vision transformers into 3D and propose a novel transformer-based framework for cryo-ET segmentation, named MVGFormer. MVGFormer introduces a multi-view perspective fusion transformer encoder, which captures rich global structural information from multiple perspectives using unique positional embeddings. To enhance contextual awareness, we design a parallel context encoder that builds a visual graph to guide attention. We further introduce two complementary 3D decoders: multi-level feature fusion (MF) and parallel atrous convolutions (P3DA), which together capture multi-scale structural cues for precise segmentation. Furthermore, we introduce a view-masked self-supervised learning strategy to reinforce the effectiveness of the multi-view design and improve the model’s representation capability. To our knowledge, MVGFormer is the first transformer-based model for cryo-ET segmentation. We empirically evaluate MVGFormer on six cryo-ET datasets across three different tasks. Extensive experimental results demonstrate its superiority over state-of-the-art 3D segmentation methods.
dc.description.statementofresponsibility	Haoran Li, Xingjian Li, Huan Wang, Jiahua Shi, Huaming Chen, Yizhou Zhao, Bo Du, Johan Barthelemy, Daisuke Kihara, Jun Shen, Min Xu
dc.identifier.citation	Knowledge-Based Systems, 2026; 331:114810-1-114810-15
dc.identifier.doi	10.1016/j.knosys.2025.114810
dc.identifier.issn	0950-7051
dc.identifier.issn	1872-7409
dc.identifier.orcid	Chen, H. [0000-0001-5678-472X]
dc.identifier.uri	https://hdl.handle.net/2440/149246
dc.language.iso	en
dc.publisher	Elsevier
dc.relation.grant	http://purl.org/au-research/grants/arc/IC220100028
dc.rights	© 2025 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
dc.source.uri	https://doi.org/10.1016/j.knosys.2025.114810
dc.subject	Cryo-electron tomography; volumetric image segmentation; deep learning
dc.title	MVGFormer: Multi-view perspective with graph-guided transformer for cryo-ET segmentation
dc.type	Journal article
pubs.publication-status	Published

Files

Original bundle

Now showing 1 - 1 of 1

Name:: hdl_149246.pdf
Size:: 10.1 MB
Format:: Adobe Portable Document Format
Description:: Published version

Download

Collections

Research Outputs