forked from awslabs/graphstorm
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Doc] Graph Construction PR Refactor - PR1: GSProcessing Doc Refactor (…
…awslabs#907) *Issue #, if available:* *Description of changes:* Preview on read the doc: https://jalencato-graphstorm-doc.readthedocs.io/en/gsprocessing-awsinfra-doc/graph-construction/index.html * Refactor the doc structure for existing GSProcessing doc. * Rename the title as what we previously discussed in the reorg doc plan. * Change a typo in the doc about `--model-name` to `--hf-model` in distributed set up. * It is the first PR about the Doc Refactor. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. --------- Co-authored-by: Theodore Vasiloudis <[email protected]> Co-authored-by: xiang song(charlie.song) <[email protected]>
- Loading branch information
1 parent
d5b1e03
commit 46c1331
Showing
14 changed files
with
148 additions
and
52 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
37 changes: 37 additions & 0 deletions
37
docs/source/graph-construction/gs-processing/aws-infra/index.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
================================================ | ||
Running distributed processing jobs on AWS Infra | ||
================================================ | ||
|
||
After successfully building the Docker image and pushing it to | ||
`Amazon ECR <https://docs.aws.amazon.com/ecr/>`_, | ||
you can now initiate GSProcessing jobs with AWS resources. | ||
|
||
We support running GSProcessing jobs on different AWS execution environments including: | ||
`Amazon SageMaker <https://docs.aws.amazon.com/sagemaker/>`_, | ||
`EMR Serverless <https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless.html>`_, and | ||
`EMR on EC2 <https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-what-is-emr.html>`_. | ||
|
||
|
||
Running distributed jobs on `Amazon SageMaker <https://docs.aws.amazon.com/sagemaker/>`_: | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:titlesonly: | ||
|
||
amazon-sagemaker.rst | ||
|
||
Running distributed jobs on `EMR Serverless <https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless.html>`_: | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:titlesonly: | ||
|
||
emr-serverless.rst | ||
|
||
Running distributed jobs on `EMR on EC2 <https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-what-is-emr.html>`_: | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:titlesonly: | ||
|
||
emr.rst |
2 changes: 2 additions & 0 deletions
2
...-processing/usage/row-count-alignment.rst → ...cessing/aws-infra/row-count-alignment.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
.. _row_count_alignment: | ||
|
||
Row count alignment | ||
=================== | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
============================== | ||
Distributed Graph Construction | ||
============================== | ||
|
||
Beyond single-machine graph construction, distributed graph construction offers enhanced scalability | ||
and efficiency. This process involves two main steps: GraphStorm Distributed Data Processing (GSProcessing) | ||
and GraphStorm Distributed Data Partitioning (GPartition). The documentations of GPartition will be released soon. | ||
|
||
The following sections provide guidance on doing distributed graph construction. | ||
The first section details the execution environment setup for GSProcessing. | ||
The second section offers examples on drafting a configuration file for a GSProcessing job. | ||
The third section explains how to deploy your GSProcessing job with AWS infrastructure. | ||
The final section shows an example to quick start GSProcessing. | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:glob: | ||
|
||
prerequisites/index.rst | ||
input-configuration.rst | ||
aws-infra/index.rst | ||
example.rst |
2 changes: 1 addition & 1 deletion
2
...cessing/developer/input-configuration.rst → ...ion/gs-processing/input-configuration.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
8 changes: 6 additions & 2 deletions
8
...-processing/developer/developer-guide.rst → ...cessing/prerequisites/developer-guide.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
26 changes: 26 additions & 0 deletions
26
docs/source/graph-construction/gs-processing/prerequisites/index.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
=============================================== | ||
Distributed GraphStorm Processing | ||
=============================================== | ||
|
||
GraphStorm Distributed Data Processing (GSProcessing) allows you to process | ||
and prepare massive graph data for training with GraphStorm. GSProcessing takes | ||
care of generating unique ids for nodes, using them to encode edge structure files, | ||
process individual features and prepare the data to be passed into the distributed | ||
partitioning and training pipeline of GraphStorm. | ||
|
||
We use PySpark to achieve horizontal parallelism, allowing us to scale to graphs with billions of nodes and edges. | ||
|
||
.. warning:: | ||
GraphStorm currently only supports running GSProcessing on AWS Infras including `Amazon SageMaker <https://docs.aws.amazon.com/sagemaker/>`_, `EMR Serverless <https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless.html>`_, and `EMR on EC2 <https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-what-is-emr.html>`_. | ||
|
||
The following sections outline essential prerequisites and provide a detailed guide to use | ||
GSProcessing. | ||
The first section provides an introduction to GSProcessing, how to install it locally and a quick example of its input configuration. | ||
The second section demonstrates how to set up GSProcessing for distributed processing, enabling scalable and efficient processing using AWS resources. | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:titlesonly: | ||
|
||
gs-processing-getting-started.rst | ||
distributed-processing-setup.rst |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
================== | ||
Graph Construction | ||
================== | ||
|
||
Graphstorm offers various methods to build graphs on both a single machine and distributed clusters. | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
:glob: | ||
|
||
gs-processing/index.rst |
Oops, something went wrong.