Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use bloom filter for quicker lookup #21

Open
patrickdemooij9 opened this issue Feb 23, 2022 · 0 comments
Open

Use bloom filter for quicker lookup #21

patrickdemooij9 opened this issue Feb 23, 2022 · 0 comments

Comments

@patrickdemooij9
Copy link
Owner

Currently we do a text search to see if the URL exists in our table. Most of the time, this is not needed as the URL isn't listed in the database. We should use a bloom filter (https://benwendt.ca/articles/a-bloom-filter-in-c/) to determine if an URL could be in the database and then do the SQL request. This way we won't waste processing power/memory on the requests that we shouldn't do anything with.

Things to take into account:

  • What to do with deletions. Do we just pull all redirects and recreate our bloom filter? Or do we use an more extensive bloom filter that records the amount of bits?
  • Same for any changing of the Urls.
  • When do we initialize the bloom filter and is it really worth it in place of SQL? (check memory)
  • How does this work with regex lookups? I assume we cannot do that with the bloom filter, so we should check if we can even use bloom filter if we are still doing a lookup for regex every time.
@patrickdemooij9 patrickdemooij9 mentioned this issue Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant