Skip to content
This repository has been archived by the owner on Apr 8, 2024. It is now read-only.

Change S4 sign-up process to use LA’s magic wormhole server (not fURL) #497

Open
Liz315 opened this issue Apr 10, 2017 · 13 comments
Open
Assignees

Comments

@Liz315
Copy link
Member

Liz315 commented Apr 10, 2017

Change S4 sign-up process to use LA’s magic wormhole server (not fURL) See: https://github.com/gridsync/gridsync/blob/master/docs/invite.rst

The current signup process involves tahoe configuration parameters emailed to the user. The shortcomings of this approach are:

  • email is essentially cleartext end-to-end making it vulnerable to snooping in transit
  • email is essentially cleartext at the recipient storage system making it vulnerable to snooping after delivery (by the operators of the storage system as well as anyone who can successfully attack or compel them)
  • the configuration parameters are long-lived
    • if they are disclosed to an attacker at any point in time (eg, many years after the email was sent) then the attacker can use them to gain access
  • the configuration parameters are many-use
    • if they are disclosed to an attacker then the attacker can use them to gain access on top of access held by the legitimate user (and on top of any other attackers)

It should be noted, though, that the cleartext tahoe configuration transferred this way only grants access to use the storage server. It does not convey the ability to read or write any data the legitimate user has uploaded to that storage server (doing so requires the caps for that data - those caps are not part of this exchange).

A magic-wormhole-enabled signup in which the tahoe configuration is conveyed to the user through a wormhole addresses some of these.

  • The wormhole code is short-lived. We can set the duration of validity to whatever we like. After that period, it becomes worthless. If it is disclosed after this period, there are no consequences.
  • The wormhole code is single-use.
    • After the legitimate user exercises it, it becomes worthless. If it is disclosed after this, there are no consequences.
    • If it is disclosed before the legitimate user exercises it, the legitimate user will be unable to exercise it which is a positive signal that an attack has been conducted. The legitimate user can be issued a new wormhole code and all resources associated with the previous code can be reclaimed. Since the legitimate user never exercised the code, no legitimate data is associated with those resources.
  • We can choose to convey the wormhole code to the user more securely than via cleartext email. To be fair, we could choose to convey the tahoe configuration more securely as well (for example, putting it into the HTTPS response to the signup request) but the small size and easy pronouncability of the wormhole code lends itself to these alternate transfer mechanisms better than does the complete tahoe configuration.

However, the use of wormhole codes involves operation of a wormhole rendezvous server. There are some operational concerns with doing so. These are discussed in the wormhole documentation itself.

@exarkun
Copy link
Contributor

exarkun commented Apr 19, 2017

From discussion with @meejah and @crwood, here's a rough outline of the steps:

  • Signup does Stripe stuff successfully.
  • Signup server talks to Wormhole server to create a wormhole code and send off the GridSync signup blob.
  • Signup server renders wormhole code into the web page.
  • Signup server sets a secure cookie identifying the subscription identifier.
  • User launches GridSync and pastes wormhole code in.
  • GridSync picks up config blob and configures a tahoe client with it.
  • Wormhole server notifies Signup server the blob was picked up and Signup server records this.
  • Happiness.

Failures can happen, though.

  • Signup server crashes after talking to Stripe but before creating wormhole.
  • Signup server crashes after creating wormhole but before sending code to the user.
  • Signup server crashes after sending code to the user but before recording that the config blob is picked up.
  • Wormhole server crashes after the wormhole is created but before the config blob is picked up.
  • Wormhole server crashes after the config blob is picked up but before the Signup server records this.
  • User web browser crashes after wormhole is created but before client receives it or the secure cookie.
  • User web browser crashes after receiving the wormhole code but before the user copies it into GridSync.

@exarkun
Copy link
Contributor

exarkun commented Apr 19, 2017

Spiffy extras:

  • Update the post-Stripe page including the wormhole code once the config blob is picked up.
  • Put some prose on that page that tells the user how long the code is going to be valid for so they can plan accordingly.
  • Put a "generate new code" button onto that page after the wormhole code expires without being picked up. This wipes the previous instance and starts again from scratch.

@exarkun
Copy link
Contributor

exarkun commented Apr 27, 2017

  • A button to copy the wormhole code to the clipboard.

@exarkun
Copy link
Contributor

exarkun commented May 3, 2017

More failure modes. In addition to the signup or wormhole servers crashing, they might also lose their connection to each other which amounts to the same thing. The connection must remain open until the exchange is complete or it most likely will fail to convey the configuration to GridSync.

@exarkun
Copy link
Contributor

exarkun commented May 3, 2017

Revised signup flow:

  • New user submits signup form
  • Stripe processes payment details
  • Signup handler submits new subscription details to subscription-manager. the subscription is in the pending state.
  • Signup handler creates a new magic-wormhole
  • Signup handler submits wormhole code to subscription-manager
  • Signup handler responds with an account-equivalent cookie and the wormhole code

At this point a few things can happen.

  • Time might pass until the wormhole code has expired.
  • The user might reload the page displaying the wormhole code.
  • The user might connect to the wormhole and pick up the configuration data.
  • An attacker might connect to the wormhole, misguess the secret, and trash it.
  • An attacker might connect to the wormhole, correctly guess the secret, and pick up the configuration data.

If time passes and the wormhole code expires, the only possible interaction which leads to any state change is the user reloading the page displaying the wormhole code.

If the user reloads the page displaying the wormhole code,

  • the server uses the account-equivalent cookie to look up the old wormhole code. If the old wormhole was marked as having delivered its payload, some kind of error is returned at this point. Otherwise (timeout or pending unused wormhole), proceed.
  • the server trashes the old wormhole associated with that code (if it still exists)
  • the server makes a new wormhole
  • the server submits the new wormhole code to the subscription-manager
  • the server responds with the new wormhole code

At this point, the user is back to the same state they were in after they initially completed signup.

If the user connects to the wormhole and picks up the configuration data:

  • the server receives notification from the wormhole server that the wormhole is done (the data was sent)
  • the server updates subscription-manager to mark the wormhole as having delivered its payload. the subscription is also marked as active.
  • the subscription-converger notices the new active subscription and provisions resources for it.

The subscription is now considered active and further signup-related interactions are not expected to take place (and none are allowed which make further state changes).

If an attacker connects to the wormhole, mis-guesses the secret, and trashes it:

  • the server notices the wormhole is trashed
  • the server updates subscription-manager to mark the wormhole as trashed

At this point, the user can re-load their page to get a new wormhole code and try again.

If an attacker connects to the wormhole, correctly guesses the secret, and picks up the configuration data:

  • the server notices the wormhole has delivered its data
  • the server updates subscription-manager to mark the wormhole as having delivered its payload. the subscription is also marked as active.
  • the subscription-converger notices the new active subscription and provisions resources for it.
  • the user eventually complains that signup is broken
  • an admin updates subscription-manager to mark the subscription as stolen.
  • subscription-converger notices the subscription is no longer active and de-provisions resources associated with it.
  • an admin tells subscription manager to derive a new subscription from the stolen one (billing, contact, account-equivalent cookie, etc. details are carried over; tahoe-lafs secrets are re-generated). the new subscription is put into the pending state.

At this point, the user is back to the same state they would be in if they lost their wormhole code or let it expire. They can reload the page to get a new wormhole code and try again.

@exarkun
Copy link
Contributor

exarkun commented May 4, 2017

Practically speaking, the bits of the above flow that involve creating a wormhole should actually move out of the signup webserver. It effectively needs its own convergence loop so that it can react to database state changes and recover from restarts. The database will end up being used as an RPC mechanism so that the signup webserver can spit a wormhole code out at the user.

@exarkun
Copy link
Contributor

exarkun commented Feb 13, 2018

Going in to some detail about that whole "move out of the signup webserver" comment:

  • The user gets a session cookie when the land on the signup form page.
  • Subscription form gets submitted to signup web server
  • Signup web server issues request to billing subscription service (Stripe, Chargebee, whatever) to create billing subscription
  • Signup web server issues request to (internal, s4) subscription manager to create state necessary to update the deployment
    • The subscription is created in the pending state (along with the rest of its details)
  • The signup web server responds to the user that provisioning is taking place.
    • Either the response includes a link to pick up the wormhole code or some ajax/whatever keeps a channel open to the user so the wormhole code can be pushed to them when it's ready.
  • The wormhole invite agent eventually notices there's a new subscription in the pending state
    • Initially, it probably just polls the subscription manager a few times a minute for such things.
    • Later we can have an event-driven mechanism, perhaps pub/sub through subscription manager or maybe just a direct notification of a new subscription from the signup web server to the wormhole invite agent.
  • The wormhole invite agent allocates a wormhole code
    • The wormhole invite agent submits the wormhole code back to the subscription manager where it is recorded with the subscription.
  • The signup web server notices the wormhole code is allocated and makes it available to the user (pushes it down an ajax response or saves it to be served up in response to a subsequent http request or something - depends on UI/UX we want and how much web dev we want to do).
  • The wormhole invite agent opens a wormhole with the code
    • When the wormhole fully opens, the tahoe/gridsync/etc config is sent down it.
    • When the wormhole closes after a successful send, the wormhole invite agents notifies the subscription manager of the state change. The wormhole code is dropped and the subscription state goes to active.

The interactions here where the subscription manager database is used as RPC between the wormhole invite agent and the signup web server do not make me particularly happy. Probably the thing to do instead is something like:

  • After the signup web server creates the subscription in the subscription manager, it issues an RPC to the wormhole invite agent giving it the subscription id.
  • The wormhole invite agent grabs just that subscription and does its wormhole task with it.
  • The wormhole invite agent response to the RPC with the allocated wormhole code.

This is in addition to keeping the wormhole code in the database and polling that database (because we still want to be able to recover from process restarts). It will significantly reduce the polling interval required to provide a good experience. It is more complexity and it is basically an optimization ... but it seems like a necessary optimization. We don't want the user waiting for minutes at the signup web form. However, if we go back to using email then potentially this is less of a concern since waiting a minute or two for the signup email is a more familiar experience. But that does require that we start sending emails again...

Overall, the whole process here is pretty complicated. There are a lot of moving parts and a lot of steps to get through the whole thing. One simplification would be to keep the wormhole invite agent inside the signup web server, where it is now. We could presumably still have process restart recovery via the exact same mechanism (ask the database) but without all of the extra RPC. This is probably something to consider (it throws a monkey-wrench into the implementation plan for the wormhole invite agent (to ues Haskell - because the signup web server is Python - possibly the whole thing should be ported to Haskell?) but the monolith is still probably less complexity than this orchestra of microservices).

@meejah
Copy link
Contributor

meejah commented Feb 20, 2018

What are the other options?

e.g. I imagine if it was "use email" a lot of the above would be substantially the same (except for the "notify the user" part becomes "send email" instead of "render a magic-wormhole code on a web page")..?

(I guess all I'm saying is: it's not obvious to me what a "way simpler" thing looks like?)

@exarkun
Copy link
Contributor

exarkun commented Feb 20, 2018

"Use email to deliver the wormhole code" would be a significant simplication, I think. That removes a lot of the interaction between the signup web server and the wormhole invite agent. The signup web server, after a successful POST to the backend, could just say "Okay you signed up check your email" and be done. The wormhole invite agent could send the email when it allocates the code. It still needs to persist it with the subscription manager but at least it doesn't have to get it back to the signup web server.

This isn't the direction I had been thinking but it's definitely worth considering.

Another simplification would be offering the user a download directly from the signup web server. This would go something like:

  • Subscription form gets submitted to signup web server
  • Signup web server issues request to billing subscription service (Stripe, Chargebee, whatever) to create billing subscription
  • Signup web server issues request to (internal, s4) subscription manager to create state necessary to update the deployment
  • Signup web server issues a response to the user including a link to some kind of artifact that includes their subscription details (furl, etc).
    • From UX perspective, this would be a pre-configured GridSync. There are technical challenges with this, though.
    • Another option would be small data file with the subscription details that can be loaded in to GridSync (there's already a "backup account" option in GridSync, if we can write that format then GridSync can already load it up, I think)

This avoids any new cross-process interactions on the backend (signup server still has to populate subscription manager but that's done already). If we make the download time-limited (or even single-use) then we retain security properties that at least superficially resemble those of magic-wormhole.

@crwood
Copy link
Member

crwood commented Feb 21, 2018

I alluded to some of these points in an earlier email, but for the sake of transparency (and since the conversation is now happening here -- which is great!), here are some of the reasons why I think providing "a pre-configured Gridsync" (which I interpret here to mean a downloadable binary distribution with the customer-specific fURL already burned in to the package) would be a Bad Idea:

  • It arguably violates the Principle of Least Authority: Customers would suddenly need to trust their service provider to not ship malicious binaries to them, whereas with the current model, customers can at least choose to build the application on their own from source (or install it, e.g., from a future Debian repo) and sign-up for S4 independently. Indeed, requiring the customer to additionally depend on the provider to build and ship "custom" applications for their machines -- as well-intentioned as that may be -- arguably removes one of the primary "selling points" of using Tahoe-LAFS-based storage/S4 in the first place: namely, that the storage provider need only be depended upon to ensure the availability of ciphertext. Consider, also, how violating this principle could make Least Authority (the company) target to a new host of attacks (since pwning your build infrastructure makes it possible to pwn your customers) including, potentially, legally mandated ones..

  • It removes the additional authenticity/integrity checks afforded by PGP signatures: If every downloaded application is being custom built such that the resultant artifacts differ, there is no quick and easy way, say, for Customer-1 to verify that their downloaded application functions identically to that of Customer-2, or that it hasn't otherwise been tampered with, etc. PGP sucks, yes (and I grant that there are some workarounds here -- like signing hashes of the files that wouldn't change, I suppose) but SSL/TLS and the hot mess of certificate authorities arguably shouldn't be the only way to verify the authenticity/integrity of a downloaded package.

  • It excludes "vanilla" Tahoe-LAFS users: There are plenty of reasons why S4 customers might prefer to use the standard Tahoe-LAFS CLI over Gridsync (e.g., for tahoe backup-centric use-cases). I'd argue that we should strive to support these users (especially considering that the next version of Tahoe-LAFS will seamlessly support grid-invites over magic-wormhole via the tahoe invite set of commands -- which might open up new opportunities for new customers); interoperability and giving customers the choice of which client to use is arguably preferable to client lock-in (and might even help to encourage the development of new and better tahoe clients in the future).

  • It will require maintaining and/or paying for additional hardware: we'll need a Mac to make/ship Apple Disk Image (.dmg) files, and, if/when the time comes, a Windows box/VPS to repack MSI/NSIS/Inno Setup installers. These will need to be integrated into the existent horde of microservices in some way that doesn't suck, will further increase attack surface, and will probably be a huge pain to set up and maintain.

All that being said, I, at least, would be strongly in favor of maintaining a wormhole-centric setup flow: setting aside the already-sunk costs, the security properties are great and the overall configuration UX is at least considerably better than it was before (plus exposing users to a wormhole code on first-run importantly helps to familiarize them with using the same mechanism later for adding additional devices, sharing magic-folders, etc.). Beyond that, it's also something that, as far as I know, is wholly unique to S4 and was well-received in past user-testing (after participants got over the initial conceptual hurdles and understood how it actually worked..). Neveretheless, I recognize that @exarkun's time is both limited and valuable as-is such that sending the invite codes over email (rather than relaying back to the signup server) sounds to me like a reasonable sacrifice or trade-off to consider.

Failing that, Gridsync can load the received/downloaded configuration settings as a file: the (unencrypted) "recovery file" format is just a simple JSON dump of the settings received previously through the wormhole (with the addition of an optional "rootcap" field that gets added later and wouldn't apply here). I like this option considerably less than wormhole-over-email, however, as it introduces additional steps for the user, potentially increases the risk of exposure, and results in a clunkier or more confusing UX (since they'll be asked to export another file shortly after loading the one they just did and may mistakenly think that the second one -- which actually contains their freshly-generated and very-important rootcap -- is unnecessary if they keep the first). There might be other ways around this, however... When I first started hacking on Gridsync, I experimented with a gridsync:// URI format (that I originally attempted to document here) which, I hoped, would provide "one-click" access to various tahoe resources. It obviously didn't pan out (IIRC, custom URIs required some convoluted Windows Registry stuff and/or administrator access to register), but it would be pretty nice if a user could just click a gridsync:// (or tahoe:// or lafs://) "link" in their browser to have their already-installed tahoe client join that grid (or magic-folder).. Perhaps I should revisit this..

@meejah
Copy link
Contributor

meejah commented Feb 21, 2018

I definitely agree that training users to do anything besides "get the software from The One True Place" should be avoided.

How about something like this? This changes the interactions after Subscription Manager has made the "pending" reply. All lines are some kind of RPC (e.g. could be HTTP requests). The Subscription Manager here is the only thing that modifies the subscription database -- it syncs before sending the "pending" back, and syncs after (and before?) allocating the wormhole from the agent. Ties the "wormhole invite agent" in, and doesn't use the database for pub-sub (because Subscription Manager explicitly calls the Wormhole Agent).

(Hmm, trouble pasting files?)

@meejah
Copy link
Contributor

meejah commented Feb 21, 2018

seqdiag {
        "client" -> "web server" [label="GET"];
        "web server" -> "subscription manager" [label="deploy", leftnote="set cookie"];
        "web server" <- "subscription manager" [label="pending: 'id'", rightnote="sync db"];
        "client" <- "web server" [label="doing stuff"];

        === Arbitrary time could pass (e.g. user finally downloads software and clicks 'ready for code' or something) ===

        "client" -> "web server" [label="get code"];
        "web server" -> "subscription manager" [label="get code: 'id'"];
        "subscription manager" -> "wormhole invite agent" [label="alloc", leftnote="sync db"];
        "subscription manager" <- "wormhole invite agent" [label="got code: 1-foo-bar", leftnote="sync db?"];
        "web server" <- "subscription manager" [label="got code: 1-foo-bar"];
        "client" <- "web server" [label="got code: 1-foo-bar"];

        === Client downloads GridSync ===

        "gridsync" -> "wormhole invite agent" [label="open wormhole: 1-foo-bar"];
        "wormhole invite agent" -> "subscription manager" [label="wormhole opened", leftnote="sync db"];
        "wormhole invite agent" <- "subscription manager" [label="JSON"];                                                                          
        "gridsync" <- "wormhole invite agent" [label="deliver JSON"];
}

@meejah
Copy link
Contributor

meejah commented Feb 21, 2018

To build, pip install seqdiag and seqdiag the_above_file.diag should spit out a similarly named PNG file. Which github won't let me attach :/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants