Skip to content
Zhengyuan Zhu
Go back

Object Difference Attention Mechanism

Paper Basic Information

Paper Recommendation Reason

Attention mechanisms have greatly promoted the development of Visual Question Answering (VQA) technology. Attention allocation plays a crucial role in attention mechanisms, weighting objects in images (such as image regions or bounding boxes) differently based on their importance in answering questions. Most existing work focuses on fusing image features and text features to calculate attention distribution, without comparing different image objects. As a main attribute of attention, discriminability depends on comparisons between different objects. This comparison provides more information for better attention allocation. To achieve object perceptibility, we propose an Object-Difference Attention (ODA) method that calculates attention probabilities by implementing difference operations between different image objects in images. Experimental results show that our ODA-based VQA model achieves state-of-the-art results. In addition, a general form of relational attention is proposed. Besides ODA, this paper also introduces some other related attention methods. Experimental results show that these relational attention methods have advantages on different types of questions.

Object Difference Attention Mechanism: A Simple Relational Attention Mechanism in Visual Question Answering

Introduction

Paper Terminology

Paper Writing Motivation

As shown above, to answer the question What is the tallest flower in the picture?, the model we build needs to focus not only on the potential answer rose, but also on orchid.

Method to Solve the Problem

Rose Example

To answer What is the tallest flower in the picture?, how many steps are there?

The correct answer will be produced in the comparison process. Taking this example as inspiration, a new type of attention mechanism emerges: ODA calculates the attention distribution of objects in images by comparing each image object with all other objects under the guidance of questions.

Model Details

Can refer to the multi-head in the Attention is all you need model

Extension: Relational Attention

Expanding the $(V_i-V_j)\odot{Q}$ part of the model can yield different types of attention mechanisms

Experimental Results Analysis

Datasets

Evaluation Metrics

Experimental Result Evaluation

Conclusion

From an intuitive perspective, the object difference attention mechanism aligns with the human thinking process when answering questions based on images. Future research directions should be to build a world model through common sense knowledge about the world, reducing computational load and dependence on large amounts of labeled data through prior knowledge.

References and Citations


Share this post on:

Previous Post
From Pycharm to SpaceVim
Next Post
Interpretation of World Models
Jack the orange tabby cat
I'm Jack 🧡
Luna the tuxedo cat
I'm Luna! 🖤