Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ordering of UTIL autopull logic needs a tweak to let IANA list be dominant #1325

Open
dnsguru opened this issue May 19, 2021 · 18 comments · Fixed by #1347
Open

Ordering of UTIL autopull logic needs a tweak to let IANA list be dominant #1325

dnsguru opened this issue May 19, 2021 · 18 comments · Fixed by #1347
Assignees

Comments

@dnsguru
Copy link
Member

dnsguru commented May 19, 2021

@cpu can we revisit the AUTOPULL Bot?

There is a modification needed - I was investimagating the issue in #1176 and think that we need a review and change in the logic for what makes the autopull so that if there is something in the IANA TLD list (which is basically a list of the root TLDs) but not in the autopull JSON that is sourced from the contracting / cancels NOR in the greater PSL, that the TLD gets added/included.

Example: .WED - in IANA list and active TLD, but not in PSL
.WED was a TLD that was picked up and now operates under EBERO, which is a program at ICANN that has an assigned provider continue operations for TLDs - helping keep TLDs operational when they close due to whatever circumstance.

The TLD appears programatically in the JSON as the other cancelled TLDs, but the TLD still remains in the root zone.

By letting the IANA list be the override, it would catch intances of new IDN TLDs for ccTLDs and include them, plus any fresh ccTLD Delegations if they occur.

These may or may not have the typical meta information that nTLDs do, but inclusion vs not having them is preferred. These could be included in-line with the alphabetized nTLDs, or aggregated at the end of the nTLDs in a small section of their own inside the #ICANN section.

@dnsguru dnsguru mentioned this issue May 19, 2021
5 tasks
@dnsguru dnsguru changed the title Ordering of UTL autopull logic needs a tweak to let IANA list be dominant Ordering of UITL autopull logic needs a tweak to let IANA list be dominant May 19, 2021
@cpu
Copy link
Contributor

cpu commented May 19, 2021

@cpu can we revisit the AUTOPULL Bot?

👋 @dnsguru Hey there - of course.

What you're proposing sounds reasonable to me. My memory on the specifics is a little vague but I believe the original version of the bot tooling that I made for github.com/zmap/zlint already looks at the tlds-alpha-by-domain.txt file you linked and uses it to augment the gTLD JSON. It shouldn't be too bad to adopt more of this logic tailored for the PSL.

Is there a timeline you'd like to see these adjustments made by? It will take me a little bit of time to page all of these details back into memory and I'm not sure I'll have a chance before the weekend at the earliest.

@dnsguru dnsguru changed the title Ordering of UITL autopull logic needs a tweak to let IANA list be dominant Ordering of UTIL autopull logic needs a tweak to let IANA list be dominant May 19, 2021
@dnsguru
Copy link
Member Author

dnsguru commented May 19, 2021

@cpu this is important but not urgent - I can invent a deadline if it helps motivation or bias for action.

Separate area of attention, while you'd be under the hood: Testing autopull in presence of "Stubs"
During the time when you were updating the UTIL to autopull, There had been a request #1159 for some stub zones (akin to what we see inside of ccTLDs) from another request that I'd deferred the requestor to place in the PRIVATE section which I would have liked to have seen the impact of if those were placed near their entry in the automation included section.

Typically we would treat these like .AERO or .PRO's registry-associated-managed stubs akin to ccTLDs as mentioned, but due to the timing of the request, I'd asked them to put those in the PRIVATE section, but really because .РУС was a 2012 round nTLD, it is in the section that is generated by the autopull.

Can we, as part of testing the iana txt in the current issue you are reading, test registry associated stubs?

@cpu
Copy link
Contributor

cpu commented May 20, 2021

@cpu this is important but not urgent - I can invent a deadline if it helps motivation or bias for action.

I will see what I can do 👍

Separate area of attention, while you'd be under the hood: Testing autopull in presence of "Stubs"
...
Can we, as part of testing the iana txt in the current issue you are reading, test registry associated stubs?

Would you mind filing a separate issue for that and tagging me (or assigning if possible)? I'm open to the idea but I don't know that I'll have time to do both at once this week.

@dnsguru
Copy link
Member Author

dnsguru commented May 21, 2021 via email

@cpu
Copy link
Contributor

cpu commented Jun 2, 2021

Haven't forgotten about this, just overestimated my ability to be at a computer after hours in the summer :-) Hoping to get going soon but also happy to hand this off if someone else is ready to take it on immediately, just shout so we don't duplicate work.

@dnsguru
Copy link
Member Author

dnsguru commented Jun 2, 2021 via email

dnsguru added a commit that referenced this issue Jun 6, 2021
Only change is to add a URL to the IANA TLD txt file and some comment lines to add context.

Doesn't solve #1325 but helps identify the resource to use.
@dnsguru
Copy link
Member Author

dnsguru commented Jun 6, 2021

@cpu I made PR #1347 for the IANA txt file - I think I oversimplified the matter ...

Looks like there would be some extra complexity for the following reasons:

  • IANA TXT is all upper case for TLDs (solveable by normalizing case)

IANA TXT WOULD include, but ICANN JSON WOULD NOT include:

  • EBERO TLDs still in the root but listed in the JSON as cancelled
  • New ccTLD entries to the root
  • New IDN ccTLD entries to the root

Some of the challenges that come with this are:

  • The IANA TXT names will not have the other information that would typically accompany the nTLDs.
  • The IANA TXT IDNs are contained as the A-Label (Punycode) vs using the U-Label (raw UTF8) - the code will need to be smart about testing these against the commented A-Label for matches or perform the IDNA conversions or there will be false positives.

Without this going full "Matryoshka" and expanding the scope too widely on the request to add the IANA TLDs which might be missing or impacting the ICANN section beyond the nTLD items that this autopull process was designed for (re: ccTLDs/IDNccTLDs).

ICANN have recently announced that they will auction .WED (the EBERO TLD that triggered this request). That might update it in the JSON (or not, this is new ground). It appears between that and the ccTLDs that might be missing, we might cross streams with the existing legacy stuff, which has all kinds of tech debt on reviewing the non-2012 round stuff that this update would change.

At the core of the request here, we want to list names that:

  • ARE NOT in the last public_suffix_list.dat; and,
  • ARE in the IANA TXT (Root zone, simplified); and,
  • ARE not present, or are listed as cancelled in the ICANN JSON (gTLDs under contract, may be superset of IANA TXT)

For these,
I propose we add a small section. This could be either just above the newGTLDs header or just after the above the end of the ICANN section, which ever is easier to output based upon the process)

(so just above either // newGTLDs or // ===END ICANN DOMAINS=== line)

I'll make up a called the Jothan Frakes Islands (.JF) and let's for example imagine it has an IDN also of XN--EXAMPLEJF, and include the .WED in the example below. (On the "IDNccTLD" let's set aside the A-Label vs U-Label challenge and just list it as the A-Label)

New Section:

// Additional IANA TLDs
// The following were included by automated review of the IANA TLDs LIST
// while pricessing the ICANN JSON File of contracted gTLDs.
// Recently added ccTLD or IDN ccTLDs, EBERO TLDs or other TLDs from
// IANA's PSL https://data.iana.org/TLD/tlds-alpha-by-domain.txt
// will appear here for inclusion until relocated within the dat file. 

// jf
jf

// xn--examplejf
xn--examplejf

// wed
wed

This expanded the scope a little but also hopefully shrunk it while increasing the coverage.

I am also not opposed to this being a separate automation, but thought it best to keep this in the same tool because it represents a minimal element of additional processing.

Hope this is helpful, and @cpu thank you ... and hit me up with any questions or clarifications.

@cpu
Copy link
Contributor

cpu commented Jun 6, 2021

@dnsguru Thanks, your last comment is very helpful. I've started working on this. I think it'll take a little bit of restructuring but should be achievable.

One immediate question, when you called out the core of the request you said:

At the core of the request here, we want to list names that:

ARE NOT in the last public_suffix_list.dat; and,
ARE in the IANA TXT (Root zone, simplified); and,
ARE not present, or are listed as cancelled in the ICANN JSON (gTLDs under contract, may be superset of IANA TXT)

For that last point when you say "cancelled in the ICANN JSON", do you mean entries with "contractTerminated" : true, or something else?

@dnsguru
Copy link
Member Author

dnsguru commented Jun 6, 2021

Made this to illstrate the various cases here, hopefully it adds clarity vs being more confusing.

  Delegated TLD (g or cc) Contracted, PreDelegation Cancelled, Delegated Cancelled, PreDelegation IETF/Special Delegated ccTLD Delegated IDN ccTLD
Entry in last PSL YES EITHER, See Note YES, See Note YES, See Note EITHER, See Note NO NO
Entry in ICANN JSON YES YES YES YES EITHER N/A N/A
JSON : "contractTerminated" : true No No YES YES NO N/A N/A
In IANA TLD TXT YES   YES No YES YES YES
               
Notes Status Quo If missing, Add Entry Currently: Remove from PSL; Revise to: Leave in PSL, change commented line Remove from PSL (Status Quo) If missing, Add Entry If missing, Add Entry If missing, Add Entry - Need to convert from punycode to UTF-8 and add that
               
Comment Line Change:     // [tld] : [Date Cancelled]; EBERO TLD Active   // [tld] : [run date] // [tld] : [run date] // [A-tld] : [run date] (note [A-tld] is the A-Label/Punycode)

@cpu
Copy link
Contributor

cpu commented Jun 6, 2021

Thanks, let me noodle with the code and see where I end up.

@dnsguru
Copy link
Member Author

dnsguru commented Jun 6, 2021 via email

@cpu
Copy link
Contributor

cpu commented Jun 6, 2021

I updated the grid in the comment above to help be more clear

Great, ty!

Here are a few updates from my side after a bit of hacking.

  1. I think it's time to make a proper Go module for this work instead of a stand-alone .go file with a main. The primary reason is that the script now has a need for an external dependency. Specifically we need golang.org/x/net/idna for converting the a-labels to unicode. Overall I think the Go ecosystem has been trying to push people away from the more adhoc go run somefile.go type usage to modules so this feels somewhat inevitable.
  2. On that note, I think it's worth adding another dependency to this tools module above the IDNA dep. I think it makes sense to pull in github.com/weppos/publicsuffix-go. Previously the script didn't need to parse the .dat file and I was able to get away with cutting out the gTLD section based on section delimiters and then just checking if a freshly templated rendering of that one section resulted in a diff or not. With this new logic we need to be able to determine if an entry is present in another section. Rather than invent another Go PSL DAT parser I'd rather use the LoadString interface from Weppos' lib here and then query the list that way. I don't think it will be an issue to depend on his lib for .dat updates while the lib depends on this repo for its built-in .dat.
  3. Cancelled gTLD comments:

// [tld] : [Date Cancelled]; EBERO TLD Active

There's unfortunately no date field corresponding to the contractTerminated field in the JSON as far as I can tell. I don't think we can include that in a comment, just that it has been cancelled.

  1. ccTLD delegation comments:

// [tld] : [run date]

As you noticed, the IANA tlds-alpha-by-domain.txt data source has no metadata (including the run date). I think the only comment we can add is with the original a-label form of the TLD if we rendered the unicode version into the .dat.

Overall this is a pretty significant lift so the diff isn't going to be a little tweak of the script as much as an overhaul. I'm happy to keep plugging at it but I wanted to make sure to set expectations in terms of the volume of code that will need to be reviewed.

@dnsguru
Copy link
Member Author

dnsguru commented Jun 7, 2021 via email

dnsguru added a commit that referenced this issue Jun 7, 2021
…1347)

* newgtlds.go : update Autopull - Added URL for the IANA PSL for #1325

Only change is to add a URL to the IANA TLD txt file and some comment lines to add context.

Doesn't solve #1325 but helps identify the resource to use.

* Update tools/newgtlds.go

Co-authored-by: Daniel McCarney <[email protected]>

Co-authored-by: Daniel McCarney <[email protected]>
@dnsguru dnsguru reopened this Jun 7, 2021
@dnsguru
Copy link
Member Author

dnsguru commented Jun 7, 2021

Was autoclosed by #1347 ; re-opening

@dnsguru dnsguru mentioned this issue Aug 12, 2021
@dnsguru
Copy link
Member Author

dnsguru commented Aug 29, 2022

@cpu could this get a revisit

@cpu
Copy link
Contributor

cpu commented Aug 30, 2022

@dnsguru Sorry, I'm not available to commit to this work right now.

@weppos
Copy link
Member

weppos commented Sep 9, 2022

@dnsguru I need some time to look into the conversation and parse all the comments. As the logic is in Go, I may be able to work on it.

Perhaps it could be beneficial if you and I have a quick chat, so that you help me with a short gist of the required changes, without me having to go through the whole conversation and construct it. That will probably help me to speed up the requirement gathering phase, and jump straight to the implementation phase.

@dnsguru
Copy link
Member Author

dnsguru commented Sep 9, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants