Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralizing Logs and Meeting Transcriptions for Easier Searchability #21

Open
moul opened this issue Jul 28, 2023 · 14 comments
Open

Centralizing Logs and Meeting Transcriptions for Easier Searchability #21

moul opened this issue Jul 28, 2023 · 14 comments
Assignees

Comments

@moul
Copy link
Member

moul commented Jul 28, 2023

Our aim is to aggregate and archive our dialogues from multiple platforms like Signal, Discord, Google Meet, Zoom, enhancing their accessibility and searchability.

For private chats, I recommend a two-phase approach to ensure sensitive information remains confidential before making them publicly available.


Please, share your suggestions, here.

@waymobetta
Copy link

waymobetta commented Jul 28, 2023

I had a conversation with Valeh yesterday about the usage of Fireflies. I am awaiting access to our account for transcription services for Zoom/Google Meet.

When you say a two-phase approach, can you elaborate on your thoughts? I would think storing any sensitive information in Github would be a bad idea, even in a private repo, but I suppose we could encrypt files with AES-256 or something and distribute the secret key internally amongst the team, but this still seems a insecure should the key get leaked or lost, though perhaps I am misunderstanding.

Do we have a sort of template or experience with doing this so far that we can borrow from and expand on?

@MichaelFrazzy
Copy link

@waymobetta if GoogleMeet or Zoom have any issues, let me know and we can look into a separate speech-to-text AI model like Whisper.

@zivkovicmilos
Copy link
Member

@waymobetta if GoogleMeet or Zoom have any issues, let me know and we can look into a separate speech-to-text AI model like Whisper.

@MichaelFrazzy
I think we tried 2 tools so far:

None of them caught on because, as I understood it, they weren't providing a good transcript / value

@MichaelFrazzy
Copy link

@zivkovicmilos makes sense, likely wouldn't make all that much of a difference unless there are specific issues we are trying to solve. Whisper is 36 cents/hour and would allow me to make a bit of a closed loop instance (I think), but even that is a stretch if there's no open issues with the current system.

@waymobetta
Copy link

waymobetta commented Aug 2, 2023

In was speaking with others (@Ticojohnny, @ValehTehranchi, @MichaelFrazzy) about our current process for this and discovered that @ccomben handles most of the summarization of these conversations currently and turns them into more coherent thoughts within "The More You Gno" newsletter.

We were thinking that Signal may not be the best platform for developer conversation given that the inherent privacy features make what we are trying to do (collect conversation history in automated fashion) a bit tricky; I looked into a bot but have come to the conclusion that the easiest way for storing entire history was an actual manual process of copying and pasting the history myself, posting it into a private repo, having a review period for redactions to take place and then ultimately moving to the public meetings repo; this is inefficient and error-prone, however.

This issue may warrant a bit more discussion before we get into actual technical work of implementing this as the effort put in may not yield the most return on time investment. I am happy to discuss ways of working around it to make this a reality, though I would ask that we discuss our approach further so we don't waste too much time determining best strategy.

@MichaelFrazzy
Copy link

@waymobetta great to know, thank you. The Signal bot does seem tricky, I just checked and without an official Signal API we'd need to build it 100% from scratch and hope Signal doesn't block the request (or worst case my data scrapers).

Otherwise we'd have to use a community REST API like this https://github.com/bbernhard/signal-cli-rest-api, but when it says receiving messages as a feature I'm not sure if it'd let us easily move that off Signal to a document.

If we do end up with some type of automatic conversation transfer as a result of this discussion, I have the bones of a localized summarization AI model ready to go at least. Currently to summarize conversation memory to output future responses and expand the AI's context beyond token limits, but I could modify it to turn the summarizations themselves into the output. That part should be quick but overall it'd be a pretty large task to build everything for Signal, definitely worth considering other alternatives/platforms to cut down on the manual interpretation/transcription.

@moul
Copy link
Member Author

moul commented Aug 2, 2023

Maybe less hacky:

  1. install Signal Desktop
  2. script that reads the local message database (sqlite/sqlcipher iirc)

I've an old laptop that could make the job.

@MichaelFrazzy
Copy link

@moul If it lets us just pull from the local message database that's even better! Wasn't sure how far they took the privacy claims haha

@waymobetta
Copy link

Just found an article detailing how this can be done: https://vmois.dev/query-signal-desktop-messages-sqlite/. Apparently the decryption key is stored in plaintext in config.json, wow..

@MichaelFrazzy
Copy link

Just found an article detailing how this can be done: https://vmois.dev/query-signal-desktop-messages-sqlite/. Apparently the decryption key is stored in plaintext in config.json, wow..

No way, that's hilarious! How very secure of them. 🤣 I mean it makes sense to store it locally... but plaintext within json definitely should make life easier haha

@thehowl
Copy link
Member

thehowl commented Aug 2, 2023

@MichaelFrazzy just a note, in signal's defense: storing the encryption key separated from the database makes sense, but if you're storing it as anything other than plaintext, either it's because you have a user passcode or password; or otherwise adding further encryption is probably just a gimmick (because the desktop client needs to be able to decrypt just as easily).

it's a gimmick because the "encryption" you could do is probably at best an AES encryption with a hardcoded key in the program's source; but if you're an attacker with half a brain, you know how to decompile the source and get the key in half an hour :)

really the "security" solution here on Signal's side would just be to be running in sandbox (ie. Windows Store, Snap, for mac possibly the ios app store is sandboxed? been a long time since I used one). and then again other applications can't access signal's data only if they themselves are run in a sandbox.

the fact that a userland program can access any other program's data is an unfortunate consequence of the programming and security models of all desktop OS, which we are only now recently with sandboxes like snap and windows store. for mobile, luckily the first iphone tackled this from day one with the app store.

@waymobetta
Copy link

I wonder why they elected not to use Mac's Keychain for storing the key. I don't know much about desktop app development so perhaps this is not possible, though I thought this was one of the purposes of Keychain.

@moul moul changed the title Store logs of conversations on this repo Centralizing Logs and Meeting Transcriptions for Easier Searchability Oct 14, 2023
@moul
Copy link
Member Author

moul commented Oct 14, 2023

Bump.


Background:
Need centralized log storage. Using GitHub can make Gno-related searches efficient and simplify potential migrations.

Challenges:

  1. Text Logs: Untapped logs from Signal.
  2. Voice Transcriptions: Need to transcribe our meetings.

Solutions:

  1. Tools:

    • Check out Fireflies for audio transcription.
    • Scripts to integrate Signal logs.
    • Use a buffer for initial drafts due to quality concerns and sensitive info. Finalize after review.
  2. Hiring:

    • Bring in a technical writer.
  3. Community Help:

    • Utilize platforms like game-of-realms for assistance.
  4. Switch Platforms:

    • Consider moving from Signal to Mattermost.

@MichaelFrazzy
Copy link

MichaelFrazzy commented Oct 14, 2023

Sounds great, this could likely be handled similarly to the other data collection bots.

Meeting notes it makes a ton of sense to just have a single database file.

For the contribution logs on the other hand, would you like us to store all contributions to a single .md doc? Opposed to one .md doc per repo before a separate script then creates a centralized summary/profile database based on them? Otherwise I'm sure we could keep things to a single database/.md file, I'd just have to create different sections within that one log to separate raw data from summarized user profile data.

Also if we ever happen to have issues with Fireflies, I've had a lot of luck with Whisper. Either one we should be able to combine with the AI being worked on too if there is ever a reason to use it outside of meetings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

5 participants