Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

row/col stochastic matrix documentation #807

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 68 additions & 2 deletions src/reference-manual/transforms.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ constrained to be ordered, positive ordered, or simplexes. Matrices
may be constrained to be correlation matrices or covariance matrices.
This chapter provides a definition of the transforms used for each
type of variable. For examples of how to declare and define these
variables in a Stan program, see section
variables in a Stan program, see section
[Variable declaration](types.qmd#variable-declaration.section).

Stan converts models to C++ classes which define probability
Expand All @@ -39,7 +39,7 @@ the exact arithmetic:
rounded to the boundary. This may cause unexpected warnings or
errors, if in other parts of the code the boundary value is
invalid. For example, we may observe floating-point value 0 for
a variance parameter that has been declared to be larger than 0.
a variance parameter that has been declared to be larger than 0.
See more about [Floating point Arithmetic in Stan user's guide](../stan-users-guide/floating-point.qmd)).
- CmdStan stores the output to CSV files with 6 significant digits
accuracy by default, but the constraints are checked with 8
Expand Down Expand Up @@ -672,6 +672,72 @@ z_k
.
$$

## Stochastic Matrix {#stochastic-matrix-transform.section}

The `column_stochastic_matrix[N, M]` and `row_stochastic_matrix[N, M]` type in
Stan represents an \(N \times M\) matrix where each column (row) is a unit simplex
of dimension \(N\). In other words, each column (row) of the matrix is a vector
constrained to have non-negative entries that sum to one.

### Definition of a Stochastic Matrix {-}

A column stochastic matrix \(X \in \mathbb{R}^{N \times M}\) is defined such
that each column is a simplex. For column \(j\) (where \(1 \leq j \leq M\)):

$$
X_{i, j} \geq 0 \quad \text{for } 1 \leq i \leq N,
$$

and

$$
\sum_{i=1}^N X_{i, j} = 1.
$$

A row stochastic matrix is any matrix whose transpose is a column stochastic matrix
(i.e. the rows of the matrix are simplexes)


$$
X_{i, j} \geq 0 \quad \text{for } 1 \leq j \leq N,
$$

and

$$
\sum_{j=1}^N X_{i, j} = 1.
$$

This definition ensures that each column (row) of the matrix \(X\) lies on the
\(N-1\) dimensional unit simplex, similar to the `simplex[N]` type, but
extended across multiple columns(rows).

### Inverse Transform for Stochastic Matrix {-}

For the column and row stochastic matrices the inverse transform is the same
as simplex, but applied to each column (row).

### Absolute Jacobian Determinant for the Inverse Transform {-}

The Jacobian determinant of the inverse transform for each column \(j\) in
the matrix is given by the product of the diagonal entries \(J_{i,i,j}\) of
the lower-triangular Jacobian matrix. This determinant is calculated as:

$$
\left| \det J_j \right| = \prod_{i=1}^{N-1} \left( z_{i, j} (1 - z_{i, j}) \left( 1 - \sum_{i'=1}^{i-1} X_{i'j} \right) \right).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

J_j is unfortunate notation. We've tried to maintain a convention of using n in 1:N for indexing with lower case matching the name of the upper case bounds. So I'd take this to be J_m for m in 1:M. And similarly, convert the earlier thing to using n.

I've also tried to use commas between indexes everywhere---you're doing this in some places (like z) but not others (like X). Please make this consistently use commas.

$$

Thus, the overall Jacobian determinant for the entire `column_stochastic_matrix` and `row_stochastic_matrix`
is the product of the determinants for each column (row):

$$
\left| \det J \right| = \prod_{j=1}^{M} \left| \det J_j \right|.
$$

### Transform for Stochastic Matrix {-}

For the column and row stochastic matrices the transform is the same
as simplex, but applied to each column (row).

## Unit vector {#unit-vector.section}

Expand Down
77 changes: 77 additions & 0 deletions src/reference-manual/types.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -673,6 +673,83 @@ iterations, and in either case, with less dispersed parameter
initialization or custom initialization if there are informative
priors for some parameters.

### Stochastic Matrices {-}

A stochastic matrix is a matrix where each column, row, or both is a
unit simplex, meaning that each column (row) vector has non-negative
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the "both" description here as we're not providing that functionality for users (at least not yet!).

values that sum to 1. The following example is a \(3 \times 4\)
column-stochastic matrix.

$$
\begin{bmatrix}
0.2 & 0.5 & 0.1 & 0.3 \\
0.3 & 0.3 & 0.6 & 0.4 \\
0.5 & 0.2 & 0.3 & 0.3
\end{bmatrix}
$$

An example of a \(3 \times 4\) row-stochastic matrix is the following.

$$
\begin{bmatrix}
0.2 & 0.5 & 0.1 & 0.2 \\
0.2 & 0.1 & 0.6 & 0.1 \\
0.5 & 0.2 & 0.2 & 0.1
\end{bmatrix}
$$


In the examples above, each column (or row) sums to 1, making the matrices
valid `column_stochastic_matrix` and `row_stochastic_matrix` types.

bob-carpenter marked this conversation as resolved.
Show resolved Hide resolved
Column-stochastic matrices are often used in models where
each column represents a probability distribution across a
set of categories such as in multiple multinomial distributions,
factor models, transition matrices in Markov models,
or compositional data analysis.
They can also be used in situations where you need multiple simplexes
of the same dimensionality.

The `column_stochastic_matrix` and `row_stochastic_matrix` types are declared
with row and column sizes. For instance, a matrix `theta` with
3 rows and 4 columns, where each
column is a 3-simplex, is declared like a matrix with 3 rows and 4 columns.

```stan
column_stochastic_matrix[3, 4] theta;
```

A matrix `theta` with 3 rows and 4 columns, where each row is a 4-simplex,
is similarly declared as a matrix with 3 rows and 4 columns.

```stan
row_stochastic_matrix[3, 4] theta;
```

As with simplexes, `column_stochastic_matrix` and `row_stochastic_matrix`
variables are subject to validation, ensuring that each column (row)
satisfies the simplex constraints. This validation accounts for
floating-point imprecision, with checks performed up to a statically
specified accuracy threshold \(\epsilon\).

#### Stability Considerations {-}

In high-dimensional settings, `column_stochastic_matrix` and `row_stochastic_matrix`
types may require careful tuning of the inference
algorithms. To ensure stability:

- **Smaller Step Sizes:** In samplers like Hamiltonian Monte Carlo (HMC),
smaller step sizes can help maintain stability, especially in high dimensions.
- **Higher Target Acceptance Rates:** Setting higher target acceptance
rates can improve the robustness of the sampling process.
- **Longer Warmup Periods:** Increasing the warmup period allows the sampler
to better explore the parameter space before the actual sampling begins.
- **Tighter Optimization Tolerances:** For optimization-based inference,
tighter tolerances with more iterations can yield more accurate results.
- **Custom Initialization:** If prior information about the parameters is
available, custom initialization or less dispersed initialization can lead
to more efficient inference.

### Unit vectors {-}

A unit vector is a vector with a norm of one. For instance, $[0.5,
Expand Down