Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

row/col stochastic matrix documentation #807

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions src/reference-manual/transforms.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -672,6 +672,72 @@ z_k
.
$$

## Stochastic Matrix {#stochastic-matrix-transform.section}

The `column_stochastic_matrix[N, M]` and `row_stochastic_matrix[N, M]` type in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've used "col" everywhere else for column. Should this match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed irl, we are wishy washy with col vs column abbreviation, but I like column here so going to keep it

Stan represents an \(N \times M\) matrix where each column(row) is a unit simplex
of dimension \(N\). In other words, each column(row) of the matrix is a vector
constrained to have non-negative entries that sum to one.

### Definition of a Stochastic Matrix {-}

A column stochastic matrix \(X \in \mathbb{R}^{N \times M}\) is defined such
that for each column \(j\) (where \(1 \leq j \leq M\)):

$$
X_{ij} \geq 0 \quad \text{for } 1 \leq i \leq N,
$$

and

$$
\sum_{i=1}^N X_{ij} = 1.
$$

A row stochastic matrix is defined similarly but with the axis flipped such
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as easy to say a row stochastic matrix is any matrix whose transpose is a column stochastic matrix.

I would also say in words that a column stochastic matrix has columns that are simplexes, whereas the row version has rows that are simplexes.

I've tried to write matrix subscripts as "i, j" rather than "ij" to allow for multiple-character subscripts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, can't believe I forgot to just say what this is in simple terms

that


$$
X_{ij} \geq 0 \quad \text{for } 1 \leq j \leq N,
$$

and

$$
\sum_{j=1}^N X_{ij} = 1.
$$

This definition ensures that each column(row) of the matrix \(X\) lies on the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space between column and ( --- this also appears later

Just stick to $X$ to match the rest of our doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had a reason for why I did that (eigen docs did column(row)), but it looks like they fixed it in their docs so I'll fix it here as well

\(N-1\) dimensional unit simplex, similar to the `simplex[N]` type, but
extended across multiple columns(rows).

### Inverse Transform for Stochastic Matrix {-}

For the column and row stochastic matrices the inverse transform is the same
as simplex, but applied to each column(row).

### Absolute Jacobian Determinant for the Inverse Transform {-}

The Jacobian determinant of the inverse transform for each column \(j\) in
the matrix is given by the product of the diagonal entries \(J_{i,i,j}\) of
the lower-triangular Jacobian matrix. This determinant is calculated as:

$$
\left| \det J_j \right| = \prod_{i=1}^{N-1} \left( z_{ij} (1 - z_{ij}) \left( 1 - \sum_{i'=1}^{i-1} X_{i'j} \right) \right).
$$

Thus, the overall Jacobian determinant for the entire `column_stochastic_matrix` and `row_stochastic_matrix`
is the product of the determinants for each column(row):

$$
\left| \det J \right| = \prod_{j=1}^{M} \left| \det J_j \right|.
$$

### Transform for Stochastic Matrix {-}

For the column and row stochastic matrices the transform is the same
as simplex, but applied to each column(row).

## Unit vector {#unit-vector.section}

Expand Down
76 changes: 76 additions & 0 deletions src/reference-manual/types.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -673,6 +673,82 @@ iterations, and in either case, with less dispersed parameter
initialization or custom initialization if there are informative
priors for some parameters.

### Stochastic Matrices {-}

A stochastic matrix is a matrix where each column, row, or both is a
unit simplex, meaning that each column(row) vector has non-negative
values that sum to 1. For example, a \(3 \times 4\)
column stochastic matrix will look like:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say:

"The following example is a $3 \times 4$ column-stochastic matrix."

Note that it's a period at the end this way and doesn't assert that a column stochastic matrix will look one way or another. Also note that when you use a noun compound like "column stochastic" as an adjective (here modifying "matrix") then it should be hyphenated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixed Column-matrix below


$$
\begin{bmatrix}
0.2 & 0.5 & 0.1 & 0.3 \\
0.3 & 0.3 & 0.6 & 0.4 \\
0.5 & 0.2 & 0.3 & 0.3
\end{bmatrix}
$$

While a row stochastic matrix will look like:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a complete sentence---it's just a free-floating clause. How about:

"An example of a row-stochastic matrix is the following."


$$
\begin{bmatrix}
0.2 & 0.5 & 0.1 & 0.2 \\
0.2 & 0.1 & 0.6 & 0.1 \\
0.5 & 0.2 & 0.2 & 0.1
\end{bmatrix}
$$


In this example, each column(row) sums to 1, making the matrix a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"each row" --- this is a single example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this a bit to make it more clear we are talking about both of the examples

valid `column_stochastic_matrix` and `row_stochastic_matrix`.

bob-carpenter marked this conversation as resolved.
Show resolved Hide resolved
Column stochastic matrices are often used in models where
each column represents a probability distribution across a
set of categories, such as in multiple multinomial distributions,
transition matrices in Markov models, or compositional data analysis.
They can also be used in situations where multiple Dirichlet-distributed v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd drop the Dirichlet comment here. You can just say they can be used whenever you need multiple simplexes of the same dimensionality.

The other big application here is factor models, so you should definitely mention those. The rows in the row stochastic matrix in these models is, for exmaple, is something like the proportion of pollutants being emitted from a factory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added factor model to the examples

ariables are required across different dimensions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

premature line break

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cut sentence


The `column_stochastic_matrix` and `row_stochastic_matrix` types are declared
with full dimensionality. For instance, a matrix `theta` with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full dimensionality ---> row and column sizes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

3 rows and 4 columns, where each
column is a 3-simplex, is declared as:

```stan
column_stochastic_matrix[3, 4] theta;
```

A matrix `theta` with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many line breaks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

3 rows and 4 columns, where each
row is a 4-simplex, is declared as:

```stan
row_stochastic_matrix[3, 4] theta;
```

As with simplexes, `column(row)_stochastic_matrix` variables are subject to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too hard to parse---just repeat, we have tons of space here. That is, separate out column_stochastic_matrix and row_stochastic_matrix.

validation, ensuring that each column(row) satisfies the simplex constraints.
This validation accounts for floating-point imprecision, with checks
performed up to a statically specified accuracy threshold \(\epsilon\).

#### Stability Considerations {-}

In high-dimensional settings or when the matrix has many columns,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear how "high-dimensional" and "has many columns" differ. Is high-dimensional the rows? I'd just say "high-dimensional" here.

`column_stochastic_matrix` types may require careful tuning of the inference
algorithms. To ensure stability:

- **Smaller Step Sizes:** In samplers like Hamiltonian Monte Carlo (HMC),
smaller step sizes can help maintain stability, especially in high dimensions.
- **Higher Target Acceptance Rates:** Setting higher target acceptance
rates can improve the robustness of the sampling process.
- **Longer Warmup Periods:** Increasing the warmup period allows the sampler
to better explore the parameter space before the actual sampling begins.
- **Tighter Optimization Tolerances:** For optimization-based inference,
tighter tolerances with more iterations can yield more accurate results.
- **Custom Initialization:** If prior information about the parameters is
available, custom initialization or less dispersed initialization can lead
to more efficient inference.

### Unit vectors {-}

A unit vector is a vector with a norm of one. For instance, $[0.5,
Expand Down