diff --git a/content/contributor/content/practice/cli.md b/content/contributor/content/practice/cli.md index fa5b8a21..88de6457 100644 --- a/content/contributor/content/practice/cli.md +++ b/content/contributor/content/practice/cli.md @@ -13,9 +13,9 @@ You can use [`httpie`](https://github.com/httpie/httpie) or format responses wit This is what we'll do: 1. Query the status of the content server. -2. Locate and download the [snapshot]({{< relref "snapshots" >}}) for a list of entities. +2. Locate and download a daily [snapshot]({{< relref "snapshots" >}}) with a list of entities. 3. Obtain the manifest for an entity. -4. Download some of the entity's files. +4. Download one of the entity's files. Let's begin by querying the server status using `/about`: @@ -23,133 +23,82 @@ Let's begin by querying the server status using `/about`: curl "https://peer.decentraland.org/about" ``` -```json +```js { "healthy": true, - "acceptingUsers": true, - "bff": { - "commitHash": "1a2ff915a216191ecc6ef85f3822f0809fe16f3c", - "healthy": true, - "protocolVersion": "1.0_0", - "publicUrl": "/bff", - "userCount": 12 - }, - "comms": { - "commitHash": "ad06af3339484411eb639e402462fd2715ce80f6", - "healthy": true, - "protocol": "v3" - }, "content": { - "commitHash": "d6e5d094ef81e2d5630eba9f3eaba2bdc21f5b13", "healthy": true, - "publicUrl": "https://peer-ec1.decentraland.org/content/", - "version": "5.1.9" - }, - "lambdas": { - "commitHash": "b80be3b95e0e8d5b4c9295e7f89fc6f17f7d8a8d", - "healthy": true, - "publicUrl": "https://peer-ec1.decentraland.org/lambdas/", - "version": "5.1.10" - }, - "configurations": { - "globalScenesUrn": [], - "networkId": 1, - "realmName": "hephaestus", - "scenesUrn": [] + "version": "6.5.0", }, + // ... more server information (feature flags, versions, paths, etc.) } ``` Looks like the server is up and running normally (`"healthy": true`), and is giving us information about the version it implements for each feature set, plus some configuration options of the instance. -We're interested in downloading some content, so let's explore the available [snapshots]({{< relref "snapshots" >}}) to get our hands on some identifiers via `/content/snapshot`. +We're interested in downloading some content, so let's explore the available [snapshots]({{< relref "snapshots" >}}) to get our hands on some identifiers via `/content/snapshots`. ```bash -curl "https://peer.decentraland.org/content/snapshot" +curl "https://peer.decentraland.org/content/snapshots" ``` - -```json -{ - "hash": "bafybeibtfyox4mho6nfxtfibvol3h6mvnsduslofnwuej33eag2yqgw5bi", - "lastIncludedDeploymentTimestamp": 1671631795620, - "entities": { - "scene": { - "hash": "bafybeiadxlvdmgmzhvhrgbgzumfdp52i7ubzmhq5sxfyfiivcjuxyddl54", - "lastIncludedDeploymentTimestamp": 1671631464866 - }, - "profile": { - "hash": "bafybeib6mk6p2ymod7lmsw6ttsacbmkgmvwpn23jmdkpkfdzvgvp7lzaeu", - "lastIncludedDeploymentTimestamp": 1671631795620 - }, - "wearable": { - "hash": "bafybeidcl3kmsnufkvl7tlmeussajiqiqqzsq3uto7m5wrzhstnix5f6vu", - "lastIncludedDeploymentTimestamp": 1671630538830 - }, - "store": { - "hash": "bafybeidq2yivbl5wlozds4okldqlb6uesqeflb7k6hyygyr7xjx63h5eta", - "lastIncludedDeploymentTimestamp": 1671604715626 +```js +[ + { + "hash": "bafybeia6qoum64psaooiqo3f45i6hykfwx723uc236waub3gng2naof224", + "timeRange": { + "initTimestamp": 1689120000000, + "endTimestamp": 1689206400000 }, - "emote": { - "hash": "bafkreialb5j2vhhes3dorynuyvkbvh3nxddjtadmupaukr4da7peyvlfny", - "lastIncludedDeploymentTimestamp": 1671621566675 - } - } -} + "replacedSnapshotHashes": [], + "numberOfEntities": 981, + "generationTimestamp": 1689219353866 + }, + // ...more snapshot files +] ``` -There we have the [file identifiers](({{< relref "filesystem#identifiers" >}})) for each [snapshot]({{< relref "snapshots" >}}) file. Say we want to look at some of the available emotes, and let's take the `entities.emote.hash` field to download the file from the `/content/contents` endpoint. +Each item in the array describes a [snapshot]({{< relref "snapshots" >}}). Let's grab a `hash` and download the file from the `/content/contents` endpoint. ```bash -curl "https://peer.decentraland.org/content/contents/bafkreialb5j2vhhes3dorynuyvkbvh3nxddjtadmupaukr4da7peyvlfny" > emotes.snapshot +curl "https://peer.decentraland.org/content/contents/bafybeia6qoum64psaooiqo3f45i6hykfwx723uc236waub3gng2naof224" > snapshot ``` {{< info >}} -You can experiment with the `profile` or global snapshots, but bear in mind that those are very large files. +You can experiment with larger snapshots, as in the [advanced python example](https://github.com/decentraland/documentation/blob/main/content/contributor/content/practice/snapshots.py). To try this out interactively, you probably want one of the smaller ones. {{< /info >}} We can check that we downloaded the right file in a format we know, by looking at the first line: ```bash -head -n1 emotes.snapshot +head -n1 snapshot ``` ``` ### Decentraland json snapshot ``` -Great! Now we have a local copy of the current set of `emote` entities. Let's take the first one listed (the second line in the file): +Great! Now we have a local summary of all entities that were captured in that snapshot. Let's take the first one listed (the second line in the file), a `profile`: ```bash -# You can omit jq and get the raw unformatted JSON: -head -n+2 emotes.snapshot | jq +tail -n+2 snapshot | head -n1 ``` -```json +```js { - "entityType": "emote", - "entityId": "bafkreigcreq7rv6b2wf4zc4fsnif43ziwb4q46v4qhsewpf7gbsyxew3om", + "entityId": "bafkreif7hjremkxlvixyxoxnoo7bdcnf7qqp245sjb2pag2nk3n6o6yc4c", + "entityType": "profile", "pointers": [ - "urn:decentraland:matic:collections-v2:0xc304f10579a499c967291c014365304207c59a62:0" + "0x271cdb3b1c792c336c4b2bdc52c4f415d0046b92" ], - "localTimestamp": 1667798266960, "authChain": [ - { - "type": "SIGNER", - "payload": "0xa8c7d5818a255a1856b31177e5c96e1d61c83991", - "signature": "" - }, - { - "type": "ECDSA_EPHEMERAL", - "payload": "Decentraland Login\r\nEphemeral address: 0x8cbbCC5B597981cf0c2B254089cf2d7c90943961\r\nExpiration: 2022-11-16T07:18:01.235Z", - "signature": "0x97d38594dad0388eb20cfc3ef2e2a692858f1a809594ba43b7bca9171aecff09218adeb949a72b60f8b2d4281f85f348dae5224a2f750b4b22fe7b6bf392f6941c" - }, - { - "type": "ECDSA_SIGNED_ENTITY", - "payload": "bafkreigcreq7rv6b2wf4zc4fsnif43ziwb4q46v4qhsewpf7gbsyxew3om", - "signature": "0xe0037fc5ddb00d0befae6b29214ed3a846fbd2982c70edf6d556ef813efe78cc7a64b3f0669a50d6aa1d12cbc3b5cc75adc833c6b3ee2c444d34de48227726321b" - } - ] + // See https://docs.decentraland.org/contributor/auth/authchain/ + ], + "entityTimestamp": 1689120135624 } ``` +{{< info >}} +Since snapshots expire and entities are replaced, the identifiers in this article won't work. Follow along in your command line to get real, active file IDs. +{{< /info >}} + This is information we could save. We'll use the `entityId` to download the entity's JSON manifest, but persisting the listed [pointer]({{< relref "pointers" >}}) is a good idea if we want to locate this entity and any updated versions in the future. We also have the [authentication chain]({{< relref "../entities#ownership" >}}) used to sign this entity, and we could validate the listed signatures to verify the authenticity of any related files we download. @@ -159,76 +108,33 @@ Let's get our hands on the entity manifest. Remember, the `entityId` is the [fil ```bash curl "https://peer.decentraland.org/content/contents/bafkreigcreq7rv6b2wf4zc4fsnif43ziwb4q46v4qhsewpf7gbsyxew3om" ``` -```json +```js { "version": "v3", - "type": "emote", - "image": "image.png", - "thumbnail": "thumbnail.png", - "pointers": [ "urn:decentraland:matic:collections-v2:0xc304f10579a499c967291c014365304207c59a62:0" ], - "timestamp": 1667798235182, + "type": "profile", + "pointers": [ + "0x273cdb3b1c791c336c4b2bcc52c4f415d0046b91" + ], + "timestamp": 1689120135624, "content": [ { - "file": "thumbnail.png", - "hash": "bafkreiajc6gwp4ldcnah7jrv4ligmuokb2ssn2yq3rmlx3erzvak42vjoe" - }, - { - "file": "male/PassedOut.glb", - "hash": "bafkreigdc2ytkyfqvdggodgkh4u4wae7x6b2q45mxqv3vfzh4xt6k772y4" + "file": "body.png", + "hash": "bafybeibzaqkirz7fk474xvyhurho5xviphs7anawaceb6gscuigia4x33u" }, - { - "file": "female/PassedOut.glb", - "hash": "bafkreigdc2ytkyfqvdggodgkh4u4wae7x6b2q45mxqv3vfzh4xt6k772y4" - }, - { - "file": "image.png", - "hash": "bafkreihxvd736yotitatwqyhl7fmrpzjprsah32kwf63djwdr5xepak54y" - } + // ...more files ], "metadata": { - "id": "urn:decentraland:matic:collections-v2:0xc304f10579a499c967291c014365304207c59a62:0", - "name": "Passed Out", - "description": "Sometimes you need to sleep it off", - "collectionAddress": "0xc304f10579a499c967291c014365304207c59a62", - "rarity": "legendary", - "i18n": [ - { "code": "en", "text": "Passed Out" } - ], - "emoteDataADR74": { - "category": "poses", - "representations": [ - { - "bodyShapes": [ "urn:decentraland:off-chain:base-avatars:BaseMale" ], - "mainFile": "male/PassedOut.glb", - "contents": [ "male/PassedOut.glb" ] - }, - { - "bodyShapes": [ "urn:decentraland:off-chain:base-avatars:BaseFemale" ], - "mainFile": "female/PassedOut.glb", - "contents": [ "female/PassedOut.glb" ] - } - ], - "tags": [ "sleep", "mvmf22", "emote", "canessa", "passed", "out", "beer" ], - "loop": true - }, - "metrics": { - "triangles": 0, - "materials": 0, - "textures": 0, - "meshes": 0, - "bodies": 0, - "entities": 1 - } + // ...avatars and other information } } ``` -This is all the information World Explorers use to animate avatars with this emote. If we're interested in getting one of the packaged files, we can continue to use the `/content/contents` endpoint. +This `profile` entity has all the information World Explorers use to render and animate a player. If we're interested in getting one of the packaged files, we can continue to use the `/content/contents` endpoint. -If we look at the `thumbnail` field, we can see that the internal file name is `thumbnail.png`. This is listed in `content` array of files, where we can match the `name` to a `hash` that identifies the file. Let's get it: +If we look at the `content` field, we can see the `hash` of the file internally called `body.png`. Let's get it: ```bash -curl "https://peer.decentraland.org/content/contents/bafkreiajc6gwp4ldcnah7jrv4ligmuokb2ssn2yq3rmlx3erzvak42vjoe" > thumbnail.png +curl "https://peer.decentraland.org/content/contents/bafybeibzaqkirz7fk474xvyhurho5xviphs7anawaceb6gscuigia4x33u" > body.png ``` We can open this `png` file in an image viewer or web browser, and check out the author's work. Nice! \ No newline at end of file diff --git a/content/contributor/content/practice/python.md b/content/contributor/content/practice/python.md index 868a3436..90042f08 100644 --- a/content/contributor/content/practice/python.md +++ b/content/contributor/content/practice/python.md @@ -4,101 +4,98 @@ bookhidden: true url: "/contributor/content/practice/python" --- -This practice shows how to write a simple program that downloads and analyzes some content. +This practice shows how to write a simple program that downloads and analyzes some content using the [snapshots]({{< relref "snapshots" >}}) provided by content servers. -We'll be using the Decentraland Foundation's instance at `peer.decentraland.org`, and Python 3 as our language of choice. You can find the full script [in this gist](https://gist.github.com/slezica/bbe58316c9cf09c22099eade87bcd49c). +{{< info >}} +You can find the [full script](https://github.com/decentraland/documentation/blob/main/content/contributor/content/practice/snapshots_mini.py) in GitHub, along with a [more advanced example](https://github.com/decentraland/documentation/blob/main/content/contributor/content/practice/snapshots.py). +{{< /info >}} + +We'll be using the Decentraland Foundation's server at `peer.decentraland.org`, and Python 3 as our language of choice. This is what we'll do: 1. Query the status of the content server. -2. Locate and download the [snapshot]({{< relref "snapshots" >}}) for a list of entities. -3. Print the ID of all entities that were deployed after a certain date. - -Let's begin our script with some preparations. We'll use standard library modules only, but in real practice you'll probably want a more comfortable HTTP client (like the [requests](https://github.com/psf/requests) library). - -```python3 -#!/usr/bin/env python3 +2. Select and download a [snapshot]({{< relref "snapshots" >}}) with a list of entities. +3. Print the type and ID of all referenced entities. -import sys -import json -import urllib.request +Let's begin our script with some preparations. We'll use standard library modules only, but in real code you'll probably want a more comfortable HTTP client (like the [requests](https://github.com/psf/requests) library). -def http_get(url): - headers = { - "User-Agent": "urllib" # Important! If empty, 403 Forbidden - } +```py +# Make an HTTP GET request, return a file-like HTTP response. +def fetch(path): + url = f"https://peer.decentraland.org/{path}" + headers = { "User-Agent": "urllib" } # important on some servers (if empty, 403 Forbidden) request = urllib.request.Request(url, headers=headers) response = urllib.request.urlopen(request) - content = response.read().decode('utf-8') - return content + return response ``` -Our simple helper makes an HTTP `GET` request, and decodes the response body. Nothing fancy. We'll be using the `json` module to parse some responses, such as the one from `/about` we're starting with: +Our simple helper makes an HTTP `GET` request, and returns the file-like response object. Nothing fancy. Let's use it to hit the `/about` endpoint and check the server's status: -```python3 +```py # Check the server status: -about = json.loads(http_get('https://peer.decentraland.org/about')) +about = json.load(fetch('about')) if not about['healthy']: - print("Server is not healthy!") + print("Server not healthy!") sys.exit(1) ``` -If we get past this point, the server is up and running (we got a `200` response) and reports being operational. We can request the current set of snapshots (which comes in JSON format): +If we get past this point, the server is up and running (we got a `200` response) and reports being operational. We can request the current set of snapshots (which comes in JSON array format): -```python3 +```py # Get the list of snapshots: -snapshots = json.loads(http_get('https://peer.decentraland.org/content/snapshot')) +all_snapshots = json.load(fetch('content/snapshots')) ``` -Let's obtain the identifier for the emote snapshot file, as we did in the manual practice above: +Snapshot files (especially for the longer time ranges) can be very large. For a quick experiment, let's grab the smallest in the list by `numberOfEntities`: -```python3 -# Obtain the file identifier tor the emote snapshot, and download it: -emote_snapshot_hash = snapshots['entities']['emote']['hash'] -emote_snapshot_url = f'https://peer.decentraland.org/content/contents/{emote_snapshot_hash}' -emote_snapshot = http_get(emote_snapshot_url) +```py +# Take the smallest snapshot, in terms of included entities: +snapshot = min(all_snapshots, key=lambda s: s['numberOfEntities']) ``` -It's important to note that snapshot files are potentially huge, so buffering the entire content might be a bad idea. We happen to know that the emote snapshot is tiny, so we're not going to worry about memory. +To download the content, we need the `hash` field of `snapshot`. We get the file URL by appending it to the content root: -Let's split the snapshot into lines, and check for the correct header: - -```python3 -emote_snapshot_lines = emote_snapshot.split('\n') - -# Check the header: -if emote_snapshot_lines[0] != '### Decentraland json snapshot': - print("Invalid snapshot header!") - sys.exit(1) +```py +# Request the file from the content API: +response = fetch('content/contents/' + snapshot['hash']) ``` -The rest of the lines in this list are JSON documents describing entities. We have decided that we only care about items deployed after an arbitrary date, so let's go through the list and print the relevant entity identifiers: +The file we selected is small enough to buffer in memory, but let's pretend we don't know that and stream it. The first line is the snapshot header, and every line after that contains a JSON object. -```python3 -emote_min_timestamp = 1667798160000 +Let's check the header, always a good idea: -for line in emote_snapshot_lines[1:]: - if len(line) == 0: - break # the snapshot can end with a newline +```py +# Verify the snapshot header: +header = response.readline().decode('utf-8').strip() - emote = json.loads(line) - - if emote['localTimestamp'] >= emote_min_timestamp: - print(emote['entityId']) +if header != '### Decentraland json snapshot': + print("Invalid snapshot header: " + header) + sys.exit(1) ``` -Note the `break` in the loop: snapshot files can (and often will) end with an empty line, which we must be prepared to handle. +Now we can process all entities in the snapshot, reading the response line by line. For our humble purposes, _process_ means printing the entity type and ID: + +```py +# Read and decode all items, one JSON per line: +for line in response: + item = json.loads(line) + print(item['entityType'], item['entityId']) +``` -Running this script at the moment of writing outputs a list of 68 results, beginning with... +This loop will start streaming, parsing and printing lines like these until it's done with the snapshot: ``` -bafkreigcreq7rv6b2wf4zc4fsnif43ziwb4q46v4qhsewpf7gbsyxew3om -bafkreidk3hyw3sq7frwc6qtv3cp3xq3jx5ogcznla7ru4yznhtbayx5no4 -bafkreiacjqf7uzt7isdsbtqrwlvfjajrit4vye6kjoy647dggki6gfv7by -bafkreickzvceg2w7ac73ir3gaybt4okxzntqn4nd7rx3jhilwnlibqiz7e +profile bafkreic36qmzyprs6whkpuxbeiif4no6kvdrr2tfpichbx2fzfz5py6eyv +scene bafkreibr5xfujqrp5q3o4s73vm2yljlcp7cucqgugssnarsuclxv4emlmy +profile bafkreid7khr5wnkialba44rsslffi633rh3lvctad5oa5vjoe6wa7s4c5a +wearable bafkreihlqcb7jgubomyidikpwpqhgzbagltk5m4rgbjdvzydxmoka7bg4i ``` -Cheers! We've systematically explored the available emotes by leveraging the content API. \ No newline at end of file +Cheers! We've used the snapshot system to explore some of the available content in Decentraland. + +Remember you can find the [full script](https://github.com/decentraland/documentation/blob/main/content/contributor/content/practice/snapshots_mini.py) in GitHub, along with a [more advanced example](https://github.com/decentraland/documentation/blob/main/content/contributor/content/practice/snapshots.py). + diff --git a/content/contributor/content/practice/snapshots.py b/content/contributor/content/practice/snapshots.py new file mode 100644 index 00000000..c2b0ef65 --- /dev/null +++ b/content/contributor/content/practice/snapshots.py @@ -0,0 +1,86 @@ +#!/usr/bin/env python3 + +import sys +import json +import urllib.request +from datetime import datetime + +# Decentraland Content Indexing Example + +# See: +# - Simpler version: https://github.com/decentraland/documentation/blob/main/content/contributor/content/practice/snapshots_mini.py +# - Guide for it: https://docs.decentraland.org/contributor/content/practice/python +# - Snapshot documentation https://docs.decentraland.org/contributor/content/snapshots +# - Entity documentation: https://docs.decentraland.org/contributor/content/entities + + +def fetch(path): + url = f"https://peer.decentraland.org/{path}" # any compatible content server will do + headers = { "User-Agent": "urllib" } # important on some servers (if empty, 403 Forbidden) + + request = urllib.request.Request(url, headers=headers) + response = urllib.request.urlopen(request) + + return response + + +# Check the server status: +about = json.load(fetch('about')) + +if not about['healthy']: + print("server: not healthy!") + sys.exit(1) + +# Get the list of currently active snapshots: +print("Fetching current snapshots...") +all_snapshots = json.load(fetch('content/snapshots')) + +print(f"Found {len(all_snapshots)} snapshots\n") + +# Sort them, newest to oldest, so we can process entities in a convenient order: +all_snapshots.sort(key=lambda s: s['timeRange']['initTimestamp'], reverse=True) + +# NOTE: +# Normally, if we were keeping an up-to-date entity index, we'd want to skip files +# we already downloaded, including those replaced by later snapshots. + +# Index all entities! +seen_pointers = set() + +for snapshot in all_snapshots: + # Extract relevant properties: + hash = snapshot['hash'] + init_dt = datetime.fromtimestamp(snapshot['timeRange']['initTimestamp'] / 1000) + end_dt = datetime.fromtimestamp(snapshot['timeRange']['endTimestamp'] / 1000) + n_days = (end_dt - init_dt).days + n_entities = snapshot['numberOfEntities'] + + # Show some information about the snapshot: + print(f"Snapshot {hash}") + print(f" {n_days} days, {n_entities} entities ({init_dt} to {end_dt})") + + print(" requesting file...") + response = fetch(f"content/contents/{hash}") + + # Verify the snapshot header: + header = response.readline().decode('utf-8').strip() + + if header != '### Decentraland json snapshot': + print(" error: invalid snapshot header: " + header) + sys.exit(1) + + # Read all entities, one JSON per line: + print(f" processing entities...") + + for line in response: + item = json.loads(line) + + if any(pointer in seen_pointers for pointer in item['pointers']): + continue # skip if we already found a more entity for this pointer + + seen_pointers.update(item['pointers']) + + print(f" done ({len(seen_pointers)} accumulated entities)\n") + +# Done! +print(f"Finished with {len(seen_pointers)} total entities") diff --git a/content/contributor/content/practice/snapshots_mini.py b/content/contributor/content/practice/snapshots_mini.py new file mode 100644 index 00000000..6a198d9b --- /dev/null +++ b/content/contributor/content/practice/snapshots_mini.py @@ -0,0 +1,53 @@ +#!/usr/bin/env python3 + +import sys +import json +import urllib.request + +# Decentraland Content Indexing Example + +# See: +# - Guide to this script: https://docs.decentraland.org/contributor/content/practice/python +# - More complex version: https://github.com/decentraland/documentation/blob/main/content/contributor/content/practice/snapshots.py +# - Snapshot documentation https://docs.decentraland.org/contributor/content/snapshots +# - Entity documentation: https://docs.decentraland.org/contributor/content/entities + + +def fetch(path): + url = f"https://peer.decentraland.org/{path}" + headers = { "User-Agent": "urllib" } # important on some servers (if empty, 403 Forbidden) + + request = urllib.request.Request(url, headers=headers) + response = urllib.request.urlopen(request) + + return response + + +# Check the server status: +about = json.load(fetch('about')) + +if not about['healthy']: + print("Server not healthy!") + sys.exit(1) + +# Get the list of snapshots: +all_snapshots = json.load(fetch('content/snapshots')) + +# Take the smallest snapshot, in terms of included entities: +snapshot = min(all_snapshots, key=lambda s: s['numberOfEntities']) + +# Request the file from the content API: +response = fetch('content/contents/' + snapshot['hash']) + +# Verify the snapshot header: +header = response.readline().decode('utf-8').strip() + +if header != '### Decentraland json snapshot': + print("Invalid snapshot header: " + header) + sys.exit(1) + +# Read and decode all items, one JSON per line: +for line in response: + item = json.loads(line) + print(item['entityType'], item['entityId'], item['pointers']) + diff --git a/content/contributor/content/snapshots.md b/content/contributor/content/snapshots.md index 06a35ac6..e7029ef7 100644 --- a/content/contributor/content/snapshots.md +++ b/content/contributor/content/snapshots.md @@ -7,11 +7,48 @@ weight: 5 Content servers will periodically compile summaries of the active entities they are hosting, called _snapshots_. They are regular [files]({{< relref "filesystem" >}}) and can be downloaded using their identifier. -Separate snapshots are created for each entity type, as well as a global one for all content. Clients that want to systematically discover content can use these files as the starting point. +Snapshots are created on a daily, weekly, monthly and yearly basis. Each contains the set of active entities that changed since the prior snapshot for that range. -You can play around with snapshots in the [practice]({{< relref "practice" >}}) section. +Snapshots will contain conflicting versions of the same entities (i.e. different [manifest files]({{< relref "entities#properties" >}}) associated to the same pointer) as they are updated. When scanning them, clients should keep the version in the most recent snapshot. Since content servers are allowed to delete inactive files, stale entity versions are not guaranteed to be available for download. -## Format +When a new snapshot _replaces_ older ones (e.g. a weekly snapshot that combines a series of daily ones), its metadata indicates which prior files are replaced so clients don't need to download them. + +The full set of active entities can be discovered by combining all the available snapshots (more on this below), keeping the most recent entity referenced by each [pointer]({{< relref "pointers" >}}) discovered along the way. + +You can experiment with snapshots using working code in the [practice]({{< relref "practice" >}}) section. + + +## Discovering Snapshots {#discover} + +To locate the current set of snapshots, use the [`snapshots` endpoint](https://decentraland.github.io/catalyst-api-specs/#tag/Content-Server/operation/getSnapshots). The response contains an array of items with these fields: + +| Field | Value | +| ----- | --- | +| `generationTimestamp` | The Unix UTC timestamp when this snapshot was created. +| `hash` | The snapshot [file]({{< relref "filesystem" >}}). +| `numberOfEntities` | The number of entries in the snapshot file. +| `replacedSnapshotHashes` | An array with the `hash` of any snapshots replaced by this one. +| `timeRange.initTimestamp` | The Unix UTC timestamp (in milliseconds) for the beginning of the snapshot range. +| `timerange.endTimestamp` | The Unix UTC timestamp (in milliseconds) for the end of the snapshot range. + +For example: + +```json +{ + "generationTimestamp": 1684979298844, + "hash": "bafybeiflmm46nr4vv2h3wuzbx3pukcz7ju4fhbfzt6yxmoo533uktlgru4", + "numberOfEntities": 12345, + "replacedSnapshotHashes": [ "bafybeicw6x75ieaxfwynekbyhpcsgctpjkt6cb4j6oa7s57qjj6e4b5phd" ], + "timeRange": { + "initTimestamp": 1684281600000, + "endTimestamp": 1684886400000 + } +} +``` + +## Downloading Snapshots {#download} + +Using the `hash` field of a snapshot, clients can download the associated containing entities created or updated in that time range. Snapshot files begin with this exact line: @@ -26,7 +63,7 @@ After that, each line is a JSON document describing an [entity]({{< relref "enti | `entityId` | The immutable identifier for this [entity]({{< relref "entities" >}}). | `entityType` | One of `scene`, `profile`, `wearable`, `emote`, `store` or `outfits`. | `pointers` | An array of [pointers]({{< relref "pointers" >}}) that resolve (or used to resolve) to this entity. -| `localTimestamp` | The Unix UTC timestamp when this entity was uploaded. +| `entityTimestamp` | The Unix UTC timestamp (in milliseconds) when this entity was uploaded. | `authChain` | The [authentication chain]({{< relref "entities#ownership" >}}) for this entity. A typical entry looks like this: @@ -36,7 +73,7 @@ A typical entry looks like this: "entityId": "bafkreigrvaqynmiglpvewwhn2yd63q5dvagrrt5jbhimzvbrn5kimj5zne", "entityType": "wearable", "pointers": ["urn:decentraland:matic:collections-v2:0x11a6879861f36cbad632a4e7226816a16139fb33:0"], - "localTimestamp": 1671117456129, + "entityTimestamp": 1671117456129, "authChain": [ // ... authentication chain payloads and signatures ] @@ -44,30 +81,55 @@ A typical entry looks like this: ``` {{< info >}} -If you intend to parse a snapshot line by line, remember to skip the first one with the header, and be ready to handle an empty line at the end of the file. +If you intend to parse a snapshot line by line, remember to skip (or better still, validate) the first one with the header, and be ready to handle an empty line at the end of the file. {{< /info >}} -## Dowloading Snapshots -To locate the current set of snapshots, use the [`/snapshot`](https://decentraland.github.io/catalyst-api-specs/#tag/Content-Server/operation/getActiveEntities) endpoint. The response contains a reference to a global snapshot file for all entities, as well as individual ones for every entity type. Each entry lists the Unix UTC timestamp at the time of creation. For example: +### Starting an Entity Index {#index-start} -```json -{ - "hash": "bafybeiasjraajptih2ffc64hwnie2a7fbysalij7modahswdd54zrnsr4u", - "lastIncludedDeploymentTimestamp": 1671294282247, - - "entities": { - "wearable": { - "hash": "bafybeihfpwdtow7qickcnryunu3smb4twrqlrmhbvzj5f25xvaxudiayyy", - "lastIncludedDeploymentTimestamp": 1671294282247 - }, - // ... same for emote, profile, etc - }, -} +Clients that want to index the entire set of active entities should process all currently available snapshots, and keep the most recent [entity]({{< relref "entities" >}}) for each [pointer]({{< relref "pointers" >}}). + +The simplest strategy is to process snapshots in reverse-chronological order (i.e. most recent first), ignoring pointers that have already been discovered, in order to keep the reference to the latest entity. + +In pseudo-code: + +```py +# Download the current set of snapshots, and sort them from newest to oldest: +snapshots = get_snapshots() +snapshots.sort('timeRange.initTimestamp', DESCENDING) + +seen_pointers = set() + +# Process snapshots, keeping the newest entity for each pointer: +for snapshot in snapshots: + items = get_snapshot_items(snapshot) + + for item in items: + if any(pointer in seen_pointers for pointer in item.pointers): + discard(item) + else: + keep(item) + seen_pointers.update(item.pointers) ``` +Since individual entities can be referenced by multiple pointers (as is commonly the case with [scenes]({{< relref "entity-types/scenes" >}})), all of them must be checked before choosing to keep or discard the item. + {{< info >}} -The global and profile snapshot files are enormous. You probably don't want to download and save them locally. +Snapshot files for the longer time ranges can be very large. For development and experimentation purposes that don't require indexing the entire entity set, using the smaller snapshots is recommended. The resulting set of entities will be incomplete but valid. {{< /info >}} +### Updating an Entity Index {#index-update} + +Clients maintaining an up-to-date entity index can make periodic calls to the [`snapshots`](https://decentraland.github.io/catalyst-api-specs/#tag/Content-Server/operation/getSnapshots) endpoint, and determine whether to download each file by considering: + +* Was the snapshot identified by `hash` already downloaded? +* Is `hash` in the `replacedSnapshotHashes` list of another snapshot that was already downloaded? +* Is the `timeRange` relevant for current purposes? + +If any new snapshots must be processed, the same strategy as above can be used to update an existing dataset. + + +## Examples + +In the [practice]({{< relref "practice" >}}) section, you'll find code examples that work with the snapshot system.