Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
LucyMcGowan committed Sep 28, 2023
1 parent b3db1a0 commit 2677970
Show file tree
Hide file tree
Showing 14 changed files with 40 additions and 35 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: pald
Title: Partitioned Local Depth for Community Structure in Data
Version: 0.0.2
Version: 0.0.3
Authors@R:
c(person("Katherine", "Moore", email = "[email protected]", role = c("aut"),
comment = c(ORCID = "0000-0001-6943-2416")),
Expand All @@ -20,7 +20,7 @@ Description: Implementation of the Partitioned Local Depth (PaLD)
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2
RoxygenNote: 7.2.3
Imports:
igraph,
graphics,
Expand Down
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# pald 0.0.3

* Change output in `community_clusters` to be a data frame with two columns: `point` and `community`

# pald 0.0.2

* Allow non-symmetric matrices to be input
Expand Down
4 changes: 2 additions & 2 deletions R/pald_functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -462,7 +462,7 @@ plot_community_graphs <- function(c,
#'
#' @return A data frame with two columns:
#' * `point`: The points from cohesion matrix `c`
#' * `cluster`: The (community) cluster labels
#' * `community`: The community cluster labels
#'
#' @examples
#' D <- dist(exdata2)
Expand All @@ -475,7 +475,7 @@ community_clusters <- function(c) {
cl <- igraph::clusters(c_graphs$G_strong)$membership
data.frame(
point = names(cl),
cluster = cl
community = cl
)
}
#' Partitioned Local Depth (PaLD)
Expand Down
1 change: 1 addition & 0 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ Each time the function `pald()` is called, the matrix of cohesion values is re-c


## Cohesion Matrix

Cohesion reflects relationship strength from the perspective of relative position, see [@bmm22]. To begin PaLD analysis, we must first compute the matrix of cohesion values from the input distance matrix or `dist` object. Note that cohesion is not symmetric. The values, $C[x, w]$, in the cohesion matrix are interpretable probabilities which capture the strength of the alignment of $w$ to $x$. The sum of the cohesion matrix is always equal to $n/2$ (where $n$ is the number of data points).

```{r cohesion}
Expand Down
58 changes: 29 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,15 +55,15 @@ nor optimization criteria are employed.
The only information extracted from the distance matrix are
within-triplet dissimilarity comparisons. As a result, outputs are
unaffected by monotone transformations of the collection of distances
(e.g., log<sub>2</sub>). Further, one may transform any measure of
similarity, *s*(*x*,*y*), to a measure of dissimilarity, *d*(*x*,*y*),
via any order-reversing monotone transformation, for instance by taking
*d*(*x*,*y*) = 1/(1+*s*(*x*,*y*)). This provides the user some
flexibility in the choice of dissimilarity (e.g., triangle inequality is
not required) and care should be taken at this stage.
(e.g., $\log_2$). Further, one may transform any measure of similarity,
$s(x, y)$, to a measure of dissimilarity, $d(x,y)$, via any
order-reversing monotone transformation, for instance by taking
$d(x, y) = 1/(1 + s(x, y))$. This provides the user some flexibility in
the choice of dissimilarity (e.g., triangle inequality is not required)
and care should be taken at this stage.

The function `dist()` from the `stats` package converts an input data
frame (with *n* rows) into an *n* × *n* distance matrix. In Euclidean
frame (with $n$ rows) into an $n \times n$ distance matrix. In Euclidean
examples here, we will use the default Euclidean distance.

## A Small Example
Expand All @@ -85,6 +85,11 @@ par(mfrow = c(1, 2), pty = "s")

D <- dist(exdata1)
pald_results <- pald(D, emph_strong = 1, vertex.label.cex = 3)
```

<img src="man/figures/README-pald-1.png" width="100%" />

``` r

###

Expand All @@ -104,7 +109,7 @@ text(exdata1 + .23,
cex = .8)
```

<img src="man/figures/README-pald-1.png" width="100%" />
<img src="man/figures/README-pald-2.png" width="100%" />

The wrapper function `pald()` returns a list containing: the cohesion
matrix, local depths, (community) clusters, the threshold for
Expand All @@ -126,10 +131,10 @@ Cohesion reflects relationship strength from the perspective of relative
position, see (Berenhaut, Moore, and Melvin 2022). To begin PaLD
analysis, we must first compute the matrix of cohesion values from the
input distance matrix or `dist` object. Note that cohesion is not
symmetric. The values, *C*\[*x*,*w*\], in the cohesion matrix are
symmetric. The values, $C[x, w]$, in the cohesion matrix are
interpretable probabilities which capture the strength of the alignment
of *w* to *x*. The sum of the cohesion matrix is always equal to *n*/2
(where *n* is the number of data points).
of $w$ to $x$. The sum of the cohesion matrix is always equal to $n/2$
(where $n$ is the number of data points).

``` r
D <- dist(exdata1)
Expand Down Expand Up @@ -184,10 +189,9 @@ strong_threshold(C)
```

Pairs of points for which mutual cohesion (i.e.,
min {*C*<sub>*x*, *w*</sub>, *C*<sub>*w*, *x*</sub>}) is greater than
the above threshold are considered to be \`\`strongly cohesive.” The
thresholded and symmetrized cohesion matrix can be obtained using the
function ‘cohesion_strong.’
$\min\{C_{x, w}, C_{w, x}$}) is greater than the above threshold are
considered to be \`\`strongly cohesive.” The thresholded and symmetrized
cohesion matrix can be obtained using the function ‘cohesion_strong.’

``` r
round(cohesion_strong(C), 4)
Expand All @@ -209,11 +213,10 @@ round(cohesion_strong(C), 4)
The overall structure of the data can be observed via the networks
obtained from cohesion (referred to here as “community graphs”). The
community graph is a symmetric, weighted graph which is obtained from
symmetrizing the cohesion matrix (using
min {*C*<sub>*x*, *w*</sub>, *C*<sub>*w*, *x*</sub>}) and removing
self-loops. The “community cluster graph” is the subgraph consisting of
only the edges for which mutual cohesion greater than the above
threshold.
symmetrizing the cohesion matrix (using $\min\{C_{x, w}, C_{w, x}\}$)
and removing self-loops. The “community cluster graph” is the subgraph
consisting of only the edges for which mutual cohesion greater than the
above threshold.

The connected components of the community cluster graph, `G_strong`, are
referred to the (community) clusters of the data. Note that no
Expand Down Expand Up @@ -374,7 +377,7 @@ plot_community_graphs(
<img src="man/figures/README-lang-1.png" width="100%" />

One could alternatively use the wrapper function:
`pald(cognate_dist, emph_strong = 3, edge_width_factor = 30, vertex.label = lang_lab_subset, vertex.label.cex = .65, vertex.size = 3)`.
$\texttt{pald(cognate_dist, emph_strong = 3, edge_width_factor = 30, vertex.label = lang_lab_subset, vertex.label.cex = .65, vertex.size = 3)}$.
It will return a list containing: the cohesion matrix, local depths,
(community) clusters, the threshold for identifying strong ties, the
thresholded and symmetrized cohesion matrix, the community graph whose
Expand All @@ -390,7 +393,7 @@ cohesion) and can be found directly from the cohesion matrix.
library(igraph)
G_strong_lang <- community_graphs(C_lang)$G_strong
neighbors(G_strong_lang, "French")
#> + 8/87 vertices, named, from c8a0516:
#> + 8/87 vertices, named, from 8cc26e0:
#> [1] Italian Ladin Provencal Walloon
#> [5] French_Creole_C French_Creole_D Spanish Catalan

Expand All @@ -409,7 +412,7 @@ density, see discussion in (Berenhaut, Moore, and Melvin 2022). Note
that PaLD was able to detect the eight natural groups within the data
without the use of any additional inputs (e.g., number of clusters) nor
optimization criteria. Despite providing the “correct” number of
clusters (i.e., *k* = 8) both *k*-means and hierarchical clustering did
clusters (i.e., $k = 8$) both *k*-means and hierarchical clustering did
not give the desired result.

``` r
Expand All @@ -432,11 +435,6 @@ plot_community_graphs(
edge_width_factor = 2,
vertex.size = 5
)
```

<img src="man/figures/README-vary-d-1.png" width="100%" />

``` r
### The cluster vector is provided by `pald' and also may be computed via:
library(igraph)
cluster_graph <- community_graphs(C3)$G_strong
Expand All @@ -447,8 +445,10 @@ table(clusters(cluster_graph)$membership)
#> 40 40 60 20 20 20 20 20
```

<img src="man/figures/README-vary-d-1.png" width="100%" />

Here are the results for the data obtained from *k*-means and
hierarchical clustering when *k* = 8.
hierarchical clustering when $k = 8$.

``` r
par(mfrow = c(1, 2), pty = "s")
Expand Down
2 changes: 1 addition & 1 deletion man/community_clusters.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Binary file modified man/figures/README-comm-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified man/figures/README-fig-2-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified man/figures/README-k-mean-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified man/figures/README-lang-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified man/figures/README-pald-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added man/figures/README-pald-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified man/figures/README-rand-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion tests/testthat/test-pald_functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -57,5 +57,5 @@ test_that("community_clusters works", {
D <- dist(exdata1)
C <- cohesion_matrix(D)
cc <- community_clusters(C)
expect_equal(cc$cluster, c(1, 1, 1, 1, 2, 2, 2, 3))
expect_equal(cc$community, c(1, 1, 1, 1, 2, 2, 2, 3))
})

0 comments on commit 2677970

Please sign in to comment.