Note that the author is not Yoshua Bengio
Overview
In Seq2Seq sequence learning task, using Scheduled Sampling can improve performance of RNN model.
The ditribution bewteen traning stage and evaluating stage are different and reults in error accumulation question in evaluating stage.
The former methods deal with this error accumullation problem is Teacher Forcing
.
Scheduled Sampling can solve the problem through take generated words as input for decoder in certain probability.
Note that scheduled sampling is only applied in training stage.
Algorithm Details
In training stage, when generate word $t$, Instead of take ground truth word $y_{t1}$ as input, Scheduled Sampling take previous generated word $g{t-1}$ in certain probability.
Assume that in $i_{th}$ mini-batch, Schduled Sampling define a probability $\epsilon_i$ to control the input of decoder. And $\epsilon_i$ is a probability variable that decreasing as $i$ increasing.
There are three decreasing methods:
$$Linear Decay: \epsilon_i = max(\epsilon, (k-c)*i), where \epsilon restrict minimum of \epsilon_i, k and c controll the range of decay$$
Warning:
In time step $t$, Scheduled Sampling will take $y_{t-1}$ according to $\epsiloni$ as input. And take $g{t-1}$ according to $1-\epsilon_i$ as input.
As a result, decoder will tend to use generated word as input.
Implementation
Parameters
1 | parser.add_argument('--scheduled_sampling_start', type=int, default=0, help='at what epoch to start decay gt probability, -1 means never') |
Assign scheduled sampling probability
1 | # scheduled sampling probability is min(epoch*0.01, 0.25) |