Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: update shlink_link to add OS and Hardware information to click… #2648

Merged
merged 2 commits into from
Sep 17, 2024

Conversation

theofficialgman
Copy link
Collaborator

… user agent

Also unify all click commands to use shlink_link. Extended shlink_link to parse an optional 3rd input argument when clicking is desirable regardless of whether analytics are enabled (or the user preference is set)

@Botspot requesting your thoughts.

@Botspot
Copy link
Owner

Botspot commented Sep 16, 2024

Please explain the purpose for the override. 100% of fresh installs will already be clicking an install analytics link, unless the user goes out of their way to clone the repo first, change settings, and then run the install script, at which point I think we should respect their wishes.

Having forced analytics that ignore user preference seems like a scandal waiting to happen.

@theofficialgman
Copy link
Collaborator Author

theofficialgman commented Sep 16, 2024

Please explain the purpose for the override. 100% of fresh installs will already be clicking an install analytics link

It's required for fresh installs as they haven't run the runonce to create the analytics preferences file. Without it, no link would be clicked. View the diff and that would be clear.

There are some alternatives to doing what I did here but as of right now this is no functional change to how the current install script works.

@Botspot
Copy link
Owner

Botspot commented Sep 16, 2024

Good point. OK then why did it take this long to notice install numbers were impossibly low?

Can this line just be changed from:

if [ "$(cat "${DIRECTORY}/data/settings/Enable analytics" 2>/dev/null)" == 'Yes' ]

to

if [ "$(cat "${DIRECTORY}/data/settings/Enable analytics" 2>/dev/null)" != 'No' ]

@theofficialgman
Copy link
Collaborator Author

Good point. OK then why did it take this long to notice install numbers were impossibly low?

Read the diff again. The install numbers are correct. We have it (right now) so that the "Enable analytics" file is not checked for the install script specifically. Since I wanted to de-duplicate the code I added the functionality to keep that.

Can this line just be changed from:

if [ "$(cat "${DIRECTORY}/data/settings/Enable analytics" 2>/dev/null)" == 'Yes' ]

to

if [ "$(cat "${DIRECTORY}/data/settings/Enable analytics" 2>/dev/null)" != 'No' ]

Yes that should work and that is one of the options I was alluding to.

@theofficialgman theofficialgman force-pushed the click-track-hardware-software branch 3 times, most recently from ad560c1 to a7b0ee1 Compare September 16, 2024 22:34
@theofficialgman
Copy link
Collaborator Author

Ok that should be fine now once the CI passes (I forgot to edit the commit message correctly a couple of times).

@theofficialgman
Copy link
Collaborator Author

theofficialgman commented Sep 16, 2024

Do you have any other suggestions for additional (user anonymous) information to include that might be interesting or helpful? We can always extend it in the future.

This is the kind of output we can expect as of right now
shlinkio/shlink-web-component#434 (comment)

Depending on how shlink's web interface evolves it might be better to further parse that info client side (eg: split up OS name and version).

I basically want this information from our users so we can best cater to them. For example, if we find that if a high percentage of our users use Armbian or Ubuntu (eg: 10%+) we might decide to focus more effort there (eg: add CI for those systems).

@theofficialgman theofficialgman marked this pull request as draft September 16, 2024 22:41
@Botspot
Copy link
Owner

Botspot commented Sep 17, 2024

Do you have any other suggestions for additional (user anonymous) information to include that might be interesting or helpful? We can always extend it in the future.

I thought of one addition to potentially make: estimating interaction with pi-apps. A count of the number of files in data/status should do, although somehow status files created by refresh_pkgapp_status would need to be accounted for. So maybe counting the number of logfiles would be more precise, but as of now those get deleted after 2 weeks.

This could answer if users tend to use pi-apps more on certain operating systems or hardware.

… user agent

Also unify all click commands to use `shlink_link`. Modify `shlink_link` to check for the presence of `No` in the `Enable analytics` preference file to disable clicks instead of the absence of `Yes`

Also make `shlink_link` return immediately and fork the process
@theofficialgman
Copy link
Collaborator Author

This could answer if users tend to use pi-apps more on certain operating systems or hardware.

Or we could just send the devices machine-id (or a non-reversable but deterministic and unique hash of it, like an sha1sum/sha256sum). Then you could simply count the clicks that include that same string.

@Botspot
Copy link
Owner

Botspot commented Sep 17, 2024

This could answer if users tend to use pi-apps more on certain operating systems or hardware.

Or we could just send the devices machine-id (or a non-reversable but deterministic and unique hash of it, like an sha1sum/sha256sum). Then you could simply count the clicks that include that same string.

IIRC, that would fail to distinguish cloned sd cards, which happens a lot in classrooms where one Pi is configured and then its ISO is flashed to all other Pies.

@theofficialgman
Copy link
Collaborator Author

theofficialgman commented Sep 17, 2024

IIRC, that would fail to distinguish cloned sd cards, which happens a lot in classrooms where one Pi is configured and then its ISO is flashed to all other Pies.

The image is not supposed to have a machine-id present. That is on the creator of the image to do. A dbus service generates the machine-id on boot whenever one is missing.

We could generate our own unique ID (as an addition) on install but that has the downside of getting re-generated if the user ever uninstalls and re-installs pi-apps (while using the machine-id doesn't).

[ ! -f "${DIRECTORY}/pi-apps/data/pi-apps-id" ] && dbus-uuidgen > "${DIRECTORY}/pi-apps/data/pi-apps-id"

@theofficialgman
Copy link
Collaborator Author

IIRC, that would fail to distinguish cloned sd cards, which happens a lot in classrooms where one Pi is configured and then its ISO is flashed to all other Pies.

Also if the image creator fails to do what I wrote above, I'm not opposed to that happening. That itself is valuable insight.

@Botspot
Copy link
Owner

Botspot commented Sep 17, 2024

I think most Pi classrooms have a teacher who sets up a master pi, then clones the sd card to all the other desks. They will all have the same machine-id.
In fact, if Pi-Apps is installed, any ID stored on disk by pi-apps would also be the same.
Then if the teacher asked everybody to use pi-apps to install, say, Scratch, we would get 20 identical requests from the same IP address.

So why not instead just develop some metric for quantity of interactions with pi-apps and include that in the user-agent? I see little value in attempting to single out individual users.

@theofficialgman
Copy link
Collaborator Author

theofficialgman commented Sep 17, 2024

I think most Pi classrooms have a teacher who sets up a master pi, then clones the sd card to all the other desks. They will all have the same machine-id.

I just explained this. What didn't you get? The image creator (the teacher in your example) is supposed to delete the /etc/machine-id and /var/lib/dbus/machine-id files when creating the master image. Once someone flashes that image to an SD card and it boots the system will generate its own own unique machine-id.

Even if they don't do that, I'm ok with that because it tells us that either
A: there is a single user being extremely active installing the same thing
or B: there are multiple users using the same machine-id

In fact, if Pi-Apps is installed, any ID stored on disk by pi-apps would also be the same.

True. We could use another commonly available ID (in addition to machine-id) that I mentioned before in discord. A hash of the device serial number if available . It is unique for each device (determined by the manufacturer, generally stored in a CPU register) and can never be changed by the user (generally... unless the user has customized the bootstack specifically to edit it).

More detail about it:

Open firmware [Device Tree] is already using the serial-number property for passing the
device's serial number from the bootloader to the kernel.

The serial number is a string that somewhat represents the device's serial
number. It might come from some form of storage (e.g. an eeprom) and be
programmed at factory-time by the manufacturer or come from identification
bits available in e.g. the SoC (note that the soc_id property in the SoC bus
should hold a full account of those bits).

The serial number is taken as-is from the bootloader, so it is up to the
bootloader to define where the serial number comes from and what length it
should be.

I see little value in attempting to single out individual users.

Actually this is where most of the value comes from. Singling out individual users allows us to determine true percentages for hardware and operating system diversity (eg: there are 5,000 id/serials logged in a day and 20% of them are running PiOS Bookworm 64bit on a Pi5, 20% PiOS Bookworm on Pi4, 10% PiOS Bookworm 32bit on Pi5, etc).

There are a LOT of questions that could be answered if we had a way to identify hardware in clicks.

How many individual users are there?
What software do users install together?
What groups open the store the most (eg: pi users but on what OSs)?
What groups install the most software?
How long is the average pi-apps users active for?
Are users hopping between distros?
Are users manually upgrading their distro between major versions (machine-id doesn't change when doing a move from bullseye->bookworm, serial never changes. if the user flashed bookworm the machine-id would change but serial wouldn't)?

Those are just some metrics that could be determined if that information (machine-id/serial) was logged all without compromising any individual user anonymity.

@Botspot
Copy link
Owner

Botspot commented Sep 17, 2024

Your reasons do make good sense. If this is something you value enough to get done, then by all means feel free to do it.

…shed) serial-number

the machine-id and serial-number are on-way hashed and cannot be reversed
@theofficialgman theofficialgman marked this pull request as ready for review September 17, 2024 20:14
@theofficialgman theofficialgman merged commit 4bb0d5e into master Sep 17, 2024
3 checks passed
@theofficialgman theofficialgman deleted the click-track-hardware-software branch September 17, 2024 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants