Skip to content
Zhengyuan Zhu
Go back

Video Captioning Based on Temporal Structure

Basic Paper Information

Paper Name: Describing Videos by Exploiting Temporal Structure

Paper Link: https://arxiv.org/pdf/1502.08029

Paper Source Code:

About Note Author:

Paper Recommendation Reason

This paper is a research result published at ICCV2015 by the University of Montreal. Its main innovation lies in proposing temporal structure and using attention mechanisms to achieve SOTA in 2015. By combining 3D-CNN to capture local information in videos with attention mechanisms to capture global information, it can comprehensively improve model performance. Another important contribution is the MVAD movie clip description dataset. This dataset has become a mainstream dataset in the current video captioning field.

Describing Videos by Exploiting Temporal Structure

Introduction to Video Captioning Task:

Generate single-sentence descriptions based on videos. One example is worth a thousand words:

  A monkey pulls a dog’s tail and is chased by the dog.

Earlier models in 2015:

LSTM-YT Model

Problems with Pre-2015 Models

Paper Ideas and Innovations

For each word generated by the Decoder, the model attends to specific frames in the video.

Model Architecture Design

Each convolutional layer is followed by ReLU activation function and Local max-pooling, dropout parameter set to 0.5.

Experiment Details

Dataset

1970 YouTube video clips: each about 10 to 30 seconds, containing only one activity, with no dialogue. 1200 for training, 100 for validation, 670 for testing.

The dataset contains 49,000 video clips from 92 movies, and each video clip is annotated with descriptive sentences.

Evaluation Metrics

Experimental Results

Experimental Results

The bar chart represents the attention weight for each frame when generating each word of the corresponding color.

Model comparison

Citations and References


Share this post on:

Previous Post
HTM_theory
Next Post
The First Deep Learning Model Paper in Video Captioning
Jack the orange tabby cat
I'm Jack 🧡
Luna the tuxedo cat
I'm Luna! 🖤