-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: list participants first seen from/to #232
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Miroslav Bajtoš <[email protected]>
I ran the query for 2024Q3 manually; it took 647ms to complete. List of first-seen participant addresses (19,395 items): https://gist.github.com/bajtos/0fb8985e3f5928c5c31ff7a28e1260ab Query: WITH participant_first_seen AS (
SELECT participant_id, MIN(day) as first_seen
FROM daily_participants
GROUP BY participant_id
)
SELECT participant_address, first_seen::TEXT
FROM participant_first_seen
LEFT JOIN participants ON participant_id = id
WHERE first_seen >= '2024-07-01' AND first_seen <= '2024-09-3'
ORDER BY first_seen ASC, participant_address ASC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this is "first seen". We want to get a list of all participant addresses seen in a time range. It doesn't matter whether they have been seen before that time range as well. Therefore I think first seen doesn't make sense, correct me if I'm wrong. What we actually want is a deduplicated list of participants seen in that time span, right?
Here is my understanding:
The query I implemented finds all participant addresses we have seen during 2024Q3 for the first time. I.e. addresses that are most likely new wallets created by Station Desktop or manually by Station Core operators.
As I understand our goal, this does matter. If we have a person who installed Station Desktop in January 2024, did not run it since then, and then started to run their Station again in July, then this is not a candidate for a new wallet created in 2024Q3. Of course, one can argue that the list produced by my query is not final, and we still need to check each address to see whether it does not have any on-chain interactions before 2024Q3. In that light, your proposal works too. I see two benefits of my solution:
One more thought: Since our current plan is to run this query only once per quarter, I am unsure if it makes sense to implement it in our REST API. (Also, considering the execution cost is already >600ms and will grow linearly with the number of new participants.) @juliangruber please let me know what you prefer:
|
Ok I understand now! I assumed we wanted to know about participants onboarded through Spark for all of its duration, and not only since Q3 2024. Where does this data range come from? |
Ah, great call. We want to show our impact in our application for FIL-RetroPGF Round 2. I thought that means impact made in Q3. I double-checked the requirements, and we need to show impact in Q2 & Q3. Quoting from filecoin-project/community#714:
|
I re-run the query for April to September 2024 and found 37,231 addresses. See https://gist.github.com/bajtos/3b68c2f9a654bcc755fc5f428dfd37ba |
Thanks for finding this out! I agree that since we want to know the impact since Q2, it doesn't make sense to include addresses seen before Q2 👍 |
What do you think about repackaging this PR as a CLI? Since it's an expensive query and we don't know that someone besides us needs it, the risk as a REST endpoint is higher than the reward |
Great idea! Since the next step in space-meridian/roadmap#170 is to check if participant has transactions before Spark/Station, would you mind to take my query and place it into the tool you will build for this next step? Things to consider if you want to run this query from your machine:
Considering the complexity, it may be better to run this query via the REST API? We can limit access to this REST API by requesting an authorization header. |
Isn't this the same complexity we have for all other CLIs though? |
Links: