Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change n to m for Y in Enc-Dec section #1942

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions encoder-decoder.md
Original file line number Diff line number Diff line change
Expand Up @@ -352,7 +352,7 @@ mapping.

Similar to RNN-based encoder-decoder models, the transformer-based
encoder-decoder models define a conditional distribution of target
vectors \\(\mathbf{Y}_{1:n}\\) given an input sequence \\(\mathbf{X}_{1:n}\\):
vectors \\(\mathbf{Y}_{1:m}\\) given an input sequence \\(\mathbf{X}_{1:n}\\):

$$
p_{\theta_{\text{enc}}, \theta_{\text{dec}}}(\mathbf{Y}_{1:m} | \mathbf{X}_{1:n}).
Expand All @@ -366,18 +366,18 @@ $$ f_{\theta_{\text{enc}}}: \mathbf{X}_{1:n} \to \mathbf{\overline{X}}_{1:n}. $$

The transformer-based decoder part then models the conditional
probability distribution of the target vector sequence
\\(\mathbf{Y}_{1:n}\\) given the sequence of encoded hidden states
\\(\mathbf{Y}_{1:m}\\) given the sequence of encoded hidden states
\\(\mathbf{\overline{X}}_{1:n}\\):

$$ p_{\theta_{dec}}(\mathbf{Y}_{1:n} | \mathbf{\overline{X}}_{1:n}).$$
$$ p_{\theta_{dec}}(\mathbf{Y}_{1:m} | \mathbf{\overline{X}}_{1:n}).$$

By Bayes\' rule, this distribution can be factorized to a product of
conditional probability distribution of the target vector \\(\mathbf{y}_i\\)
given the encoded hidden states \\(\mathbf{\overline{X}}_{1:n}\\) and all
previous target vectors \\(\mathbf{Y}_{0:i-1}\\):

$$
p_{\theta_{dec}}(\mathbf{Y}_{1:n} | \mathbf{\overline{X}}_{1:n}) = \prod_{i=1}^{n} p_{\theta_{\text{dec}}}(\mathbf{y}_i | \mathbf{Y}_{0: i-1}, \mathbf{\overline{X}}_{1:n}). $$
p_{\theta_{dec}}(\mathbf{Y}_{1:m} | \mathbf{\overline{X}}_{1:n}) = \prod_{i=1}^{m} p_{\theta_{\text{dec}}}(\mathbf{y}_i | \mathbf{Y}_{0: i-1}, \mathbf{\overline{X}}_{1:n}). $$

The transformer-based decoder hereby maps the sequence of encoded hidden
states \\(\mathbf{\overline{X}}_{1:n}\\) and all previous target vectors
Expand Down