Skip to content
Zhengyuan Zhu
Go back

Visual Question Answering (VQA) Study Notes

Paper Basic Information

Paper Recommendation Reason

Visual Question Answering (VQA) is a hot topic in recent years in the fields of computer vision and natural language processing. In VQA, an algorithm needs to answer text-based questions about images. Since the release of the first VQA dataset in 2014, more datasets have been released and many algorithms have been proposed. In this review, we critically examine the current state of VQA from the perspectives of problem formulation, existing datasets, evaluation metrics, and algorithms. In particular, we discuss the limitations of current datasets in properly training and evaluating VQA algorithms. Then, we exhaustively review existing algorithms for VQA. Finally, we discuss possible directions for future VQA and image understanding research.

Visual Question Answering: Datasets, Algorithms, and Future Challenges

Introduction

Research Value of VQA

VQA Datasets

VQA Evaluation Criteria

The evaluation criterion for multiple-choice tasks can directly use accuracy. But what about the evaluation criterion for open-ended tasks?

VQA Algorithms

Existing algorithms generally include the following structures:

Baseline and Model Performance

Model Architecture Overview

VQA Still Has Many Problems

Although VQA has made great progress, existing algorithms still have a huge gap from humans.

Existing problems include:

Conclusion

An algorithm that can answer any questions about images will be a milestone in artificial intelligence.

Research Direction Potential Stocks

References and Citations


Share this post on:

Previous Post
Deep Learning Module Documentation Memo
Next Post
Zero-Shot Learning Study Notes
Jack the orange tabby cat
I'm Jack 🧡
Luna the tuxedo cat
I'm Luna! 🖤