Text
Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation
Relations amongst entities play a central role in image understanding. Due to the complexity of modeling ( subject , predicate , object ) relation triplets, it is crucial to develop a method that can not only recognize seen relations, but also generalize to unseen cases. Inspired by a previously proposed visual translation embedding model, or VTransE [1] , we propose a context-augmented translation embedding model that can capture both common and rare relations. The previous VTransE model maps entities and predicates into a low-dimensional embedding vector space where the predicate is interpreted as a translation vector between the embedded features of the bounding box regions of the subject and the object . Our model additionally incorporates the contextual information captured by the bounding box of the union of the subject and the object, and learns the embeddings guided by the constraint predicate ≈ union ( subject , object ) − subject − object . In a comprehensive evaluation on multiple challenging benchmarks, our approach outperforms previous translation-based models and comes close to or exceeds the state of the art across a range of settings, from small-scale to large-scale datasets, from common to previously unseen relations. It also achieves promising results for the recently introduced task of scene graph generation.
Barcode | Tipe Koleksi | Nomor Panggil | Lokasi | Status | |
---|---|---|---|---|---|
art141142 | null | Artikel | Gdg9-Lt3 | Tersedia namun tidak untuk dipinjamkan - No Loan |
Tidak tersedia versi lain