Add support for extensions like `citext` #250

jgaskins · 2022-04-20T03:14:25Z

This PR actually adds a few different things:

The concept of handling extensions
An API to register those extensions to prepare new connections for them
Connection-specific decoders, delegating to the global decoder map
citext as one of those extensions, including a decoder for it and a PG::CIText Crystal type

The CIText type could just return a string, but I thought it might be confusing to get a case-insensitive string out of the database only for it to do case-sensitive checking later. This type offers the case-insensitive equality checks in Crystal code and also allows you to get the raw string out.

After working on this, I discovered #89 which is quite similar to this PR, but is 5 years old and likely significantly out of date by this point. This is just a draft because it’s still unpolished, but I’m using a monkeypatched version in one of my apps (email addresses are stored as citext) and it seems to work.

This commit also adds support for registering extensions that prepare connections for using thos extensions. In this case that preparation involves registering a decoder on the connection, since extensions in different Postgres databases will most likely have different oids.

jgaskins · 2022-04-20T03:16:34Z

src/pg/extensions/citext.cr

+
+  struct CIText
+    def initialize(text : String)
+      @text = text.downcase


This downcase might be leftover from an earlier version of this code and may not be necessary. If we don’t need to downcase, that’ll save us a heap allocation.

straight-shoota · 2022-04-20T10:03:08Z

I haven't looked into the implementation of the extension mechanism and compared it to #89.

But I noticed some issues with text representation in CIText:

#hash should have the same behaviour as #==, i.e. two strings that compare equally must have the same hash value. The hash needs to reflect case-insensitivity.
A more serious issue: There is not just a single case insensitive behaviour, but different sets of folding rules for mapping upper and lower case characters to each other. In Postgres, these rules are represented as collations. The case-insensitive behaviour of citext values is defined by the collation. But the Crystal implementation doesn't reflect that. It only uses String#compare(other, case_insensitive: true) which has a fixed, generic unicode collation (there are limited options for other folding algorithms). As a result, comparing two CIText strings in Crystal could have a different result than comparing the same two strings in the database. That would be wildly confusing. The implementation should either have identical behaviour (which is the goal of using CIText instead of String in the first place), or be clear about that it isn't identical. It would be quite complex to implement this correctly, and I don't even see a good reason for that. So I'd rather keep it dumb and make that explicit. Just using String type should be fine. It should be clear that comparing that type is case sensitive.

jgaskins · 2022-04-20T19:24:14Z

#hash should have the same behaviour as #==, i.e. two strings that compare equally must have the same hash value. The hash needs to reflect case-insensitivity.

Ah, right, that's why I had the downcase in there. I knew I had a good reason for it.

It would be quite complex to implement this correctly, and I don't even see a good reason for that. So I'd rather keep it dumb and make that explicit.

I considered the differences in collation, but I've never seen anyone specify a non-default collation in 13 years of using Postgres. I assume that anyone using them understands they introduce weird inconsistencies between DB and application code in addition to complicating queries. For everyone else, it should work as expected. If collations got more usage, I'd fully agree with you that not matching the behavior exactly would likely be too misleading.

I don't really have a strong opinion either way on the CIText type, to be clear, but I added it to show that I explored the option. I use it in one of my own apps for email addresses and it does okay there. I prefer it to calling downcase everywhere I need to compare that string with another. To be fair, I could also use a converter inside my app to do this, but it's convenient not to need to.

jwoertink · 2023-09-13T00:26:38Z

Hey, just ran in to possibly needing this... I'm wanting to use a citext[], but currently it fails on decoding. Any chance we can look at reviving it @jgaskins ?

jgaskins added 2 commits April 19, 2022 22:58

Allow for converting CIText to String

a494bdc

jgaskins commented Apr 20, 2022

View reviewed changes

jgaskins added 2 commits April 19, 2022 23:18

Avoid unnecessary heap allocation

e446aee

CIText#to_s can return the string we already have

8966b77

jwoertink mentioned this pull request Sep 12, 2023

Support Array of citext citext[] column luckyframework/avram#970

Open

jwoertink mentioned this pull request Sep 13, 2023

Adding support for citext[] arrays. luckyframework/avram#972

Draft

will force-pushed the master branch from cafe135 to cafeeb2 Compare October 31, 2024 12:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for extensions like `citext` #250

Add support for extensions like `citext` #250

jgaskins commented Apr 20, 2022

jgaskins Apr 20, 2022

straight-shoota commented Apr 20, 2022

jgaskins commented Apr 20, 2022

jwoertink commented Sep 13, 2023

Add support for extensions like citext #250

Are you sure you want to change the base?

Add support for extensions like citext #250

Conversation

jgaskins commented Apr 20, 2022

jgaskins Apr 20, 2022

Choose a reason for hiding this comment

straight-shoota commented Apr 20, 2022

jgaskins commented Apr 20, 2022

jwoertink commented Sep 13, 2023

Add support for extensions like `citext` #250

Add support for extensions like `citext` #250