Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fortinet_get_system_ha_status.textfsm parse error for non "OK" HA Health Status #1859

Open
pnpestov opened this issue Oct 1, 2024 · 28 comments

Comments

@pnpestov
Copy link

pnpestov commented Oct 1, 2024

ISSUE TYPE
  • Template Issue with error and raw data
TEMPLATE USING
#
# FG Version: 5.6, 6.0, 6.2, 6.4, 7.0
# HW        : varied
#
Value HA_HEALTH (\S+|.*)
Value MODEL (\S+)
Value HA_MODE ([\S\s]+)
Value HA_GROUP (\S+)
Value CLUSTER_UPTIME ([\S\s]+)
Value CLUSTER_STATE_CHANGED_TIME ([\S\s]+)
Value HA_SESSION_PICKUP_STATUS (\S+)
Value HA_SESSION_PICKUP_DELAY (\S+)
Value HA_OVERRIDE_STATUS (\S+)
Value HA_MASTER_UNIT_NAME (\S+)
Value HA_SLAVE_UNIT_NAME (\S+)
Value HA_MASTER_UNIT_SERIAL (\S+)
Value HA_SLAVE_UNIT_SERIAL (\S+)
Value HA_MASTER_UNIT_INDEX (\S+)
Value HA_SLAVE_UNIT_INDEX (\S+)

Start
  ^HA\s+Health\s+Status:\s+${HA_HEALTH}
  ^HA\s+Health\s+Status:$$ -> UnhealthyStatus
  ^Model:\s+${MODEL}
  ^Mode:\s+${HA_MODE}
  ^Group:\s+${HA_GROUP}
  ^Debug:\s+\d+
  ^Cluster\s+Uptime:\s+${CLUSTER_UPTIME}
  ^Cluster\s+state\s+change\s+time:\s+${CLUSTER_STATE_CHANGED_TIME}
  ^(Master|Primary)\s+selected\s+using:
  ^\s*\<\S+
  ^ses_pickup:\s+${HA_SESSION_PICKUP_STATUS},\s+ses_pickup_delay=${HA_SESSION_PICKUP_DELAY}
  ^override:\s+${HA_OVERRIDE_STATUS}
  ^Configuration\s+Status: -> Configuration_Status
  # Catch old 6.0_noha with no "Configuraton Status"
  ^System\s+Usage\s+stats: ->  System_Usage_stats
  ^. -> Error "in-Start"

UnhealthyStatus
  # semicolon necessary to anchor
  ^${HA_HEALTH};$$
  ^Model:\s+${MODEL} -> Start

Configuration_Status
  ^System\s+Usage\s+stats: ->  System_Usage_stats
  ^\s*\S+\([\S\s]+\):\s\S+$$
  ^. -> Error "in-Configuration_Status"

System_Usage_stats
  ^HBDEV\s+stats: -> HBDEV_MONDEV_stats
  ^\s*\S+\([\S\s]+\):$$
  #^\s*\S+:\s+
  ^\s*sessions=
  ^. -> Error "in-System_Usage_stats"

HBDEV_MONDEV_stats
  # Combine stats, no MONDEV in older FW's
  ^\s*\S+\([\S\s]+\):$$
  ^\s*\S+:\s.+rx.+tx.+$$
  ^MONDEV\s+stats:
  ^(Master|Primary)\s*:\s+${HA_MASTER_UNIT_NAME}\s*,\s+${HA_MASTER_UNIT_SERIAL},\s+(HA\s+cluster\s+index|cluster\s+index)\s+=\s+${HA_MASTER_UNIT_INDEX}
  ^(Slave|Secondary)\s*:\s+${HA_SLAVE_UNIT_NAME}\s*,\s+${HA_SLAVE_UNIT_SERIAL},\s+(|HA)\s*cluster\s+index\s+=\s+${HA_SLAVE_UNIT_INDEX}
  ^number\s+of\s+vcluster:\s+\d+
  ^vcluster\s+\d+:
  ^(Master|Slave|Primary|Secondary)\s*:\s+\S+,\s+(operating\s+cluster\s+index|HA\s+operating\s+index)\s+=\s+\d+ -> Record
  ^\s*$$
  ^. -> Error "in-HBDEV_MONDEV_stats"
SAMPLE COMMAND OUTPUT
HA Health Status:
    WARNING: FGT40XXXXXXXXXXX has hbdev down;
    WARNING: FGT40YYYYYYYYYYY has hbdev down;
Model: FortiGate-40F
Mode: HA A-P
Group: 28
Debug: 0
Cluster Uptime: 22 days 1:17:14
Cluster state change time: 2024-09-16 13:03:51
Primary selected using:
    <2024/09/16 13:03:51> FGT40XXXXXXXXXXX is selected as the primary because its override priority is larger than peer member FGT40YYYYYYYYYYY.
    <2024/09/11 12:13:30> FGT40XXXXXXXXXXX is selected as the primary because it's the only member in the cluster.
    <2024/09/09 16:15:43> FGT40XXXXXXXXXXX is selected as the primary because it's the only member in the cluster.
ses_pickup: enable, ses_pickup_delay=disable
override: enable
Configuration Status:
    FGT40XXXXXXXXXXX(updated 2 seconds ago): in-sync
    FGT40YYYYYYYYYYY(updated 4 seconds ago): in-sync
System Usage stats:
    FGT40XXXXXXXXXXX(updated 2 seconds ago):
        sessions=207, average-cpu-user/nice/system/idle=0%/0%/0%/100%, memory=34%
    FGT40YYYYYYYYYYY(updated 4 seconds ago):
        sessions=45, average-cpu-user/nice/system/idle=0%/0%/0%/100%, memory=32%
HBDEV stats:
    FGT40XXXXXXXXXXX(updated 2 seconds ago):
        lan2: physical/00, down, rx-bytes/packets/dropped/errors=2177576918/6099638/0/0, tx=2220309718/6099753/0/0
        lan3: physical/1000auto, up, rx-bytes/packets/dropped/errors=2679778543/8081773/0/0, tx=3837812376/10670881/0/0
    FGT40YYYYYYYYYYY(updated 4 seconds ago):
        lan2: physical/00, down, rx-bytes/packets/dropped/errors=2220306078/6099743/0/0, tx=2177589056/6099672/0/0
        lan3: physical/1000auto, up, rx-bytes/packets/dropped/errors=3837917664/10670867/0/0, tx=2679410353/8081758/0/0
MONDEV stats:
    FGT40XXXXXXXXXXX(updated 2 seconds ago):
        lan1: physical/100auto, up, rx-bytes/packets/dropped/errors=77582763190/213796490/0/0, tx=208890789274/233774398/0/0
        wan: physical/100auto, up, rx-bytes/packets/dropped/errors=228208451525/255366667/0/0, tx=91861524305/227500063/0/0
    FGT40YYYYYYYYYYY(updated 4 seconds ago):
        lan1: physical/100auto, up, rx-bytes/packets/dropped/errors=403570587/3308440/0/0, tx=11595688/43842/0/0
        wan: physical/100auto, up, rx-bytes/packets/dropped/errors=41029418/507583/0/0, tx=86212/874/0/0
Primary     : ftg-fw-a       , FGT40XXXXXXXXXXX, HA cluster index = 1
Secondary   : ftg-fw-b       , FGT40YYYYYYYYYYY, HA cluster index = 0
number of vcluster: 1
vcluster 1: work 169.254.0.2
Primary: FGT40XXXXXXXXXXX, HA operating index = 0
Secondary: FGT40YYYYYYYYYYY, HA operating index = 1
SUMMARY

Version: FortiGate-40F v7.0.15,build0632,240401 (GA.M)

Traceback (most recent call last):
  File "C:\Users\Admin\Scripts_py\Netmiko\script.py", line 127, in <module>
    command_parsed = parse_output(platform = "fortinet", command = "get system ha status", data = output)
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\ntc_templates\parse.py", line 77, in parse_output
    cli_table.ParseCmd(data, attrs)
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\clitable.py", line 282, in ParseCmd
    self.table = self._ParseCmdItem(self.raw, template_file=template_files[0])
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\clitable.py", line 315, in _ParseCmdItem
    for record in fsm.ParseText(cmd_input):
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 895, in ParseText
    self._CheckLine(line)
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 944, in _CheckLine
    if self._Operations(rule, line):
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 1021, in _Operations
    raise TextFSMError('Error: %s. Rule Line: %s. Input Line: %s.'
textfsm.parser.TextFSMError: Error: "in-Start". Rule Line: 37. Input Line:     WARNING: FGT40XXXXXXXXXXX has hbdev down; .
STEPS TO REPRODUCE

Reproduce non "OK" HA Health Status in two lines. For example, disable the lan1 (HA Monitor Interface) or lan2 (HA Heartbeat Interface) work link on the slave node.

EXPECTED RESULTS

Get the current value of HA Health Status
parsed_sample:

  • ha_health:
    "WARNING: FGT40FYYYYYYYYYY has mondev down"
    or
  • ha_health:
    "WARNING: FGT40XXXXXXXXXXX has hbdev down"
    "WARNING: FGT40YYYYYYYYYYY has hbdev down"
    or other options in combination
    and continue executing the script
ACTUAL RESULTS
Traceback (most recent call last):
  File "C:\Users\Admin\Scripts_py\Netmiko\script.py", line 127, in <module>
    command_parsed = parse_output(platform = "fortinet", command = "get system ha status", data = output)
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\ntc_templates\parse.py", line 77, in parse_output
    cli_table.ParseCmd(data, attrs)
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\clitable.py", line 282, in ParseCmd
    self.table = self._ParseCmdItem(self.raw, template_file=template_files[0])
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\clitable.py", line 315, in _ParseCmdItem
    for record in fsm.ParseText(cmd_input):
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 895, in ParseText
    self._CheckLine(line)
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 944, in _CheckLine
    if self._Operations(rule, line):
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 1021, in _Operations
    raise TextFSMError('Error: %s. Rule Line: %s. Input Line: %s.'
textfsm.parser.TextFSMError: Error: "in-Start". Rule Line: 37. Input Line:     WARNING: FGT40XXXXXXXXXXX has hbdev down; .
@mjbear
Copy link
Contributor

mjbear commented Oct 1, 2024

@pnpestov
Might you also have a device with mondev that is unhealthy so the test data for mondev can be fixed?

https://github.com/networktocode/ntc-templates/blob/master/tests/fortinet/get_system_ha_status/fortinet_get_system_ha_status_7.0_unhealthy.raw

@mjbear
Copy link
Contributor

mjbear commented Oct 1, 2024

@pnpestov
Might there be invisible trailing characters at the end of the HA Health Status: line?

The parser is failing in the first in-Start State which means it didn't transition into the UnhealthyStatus State.

@pnpestov
Copy link
Author

pnpestov commented Oct 1, 2024

https://github.com/networktocode/ntc-templates/blob/master/ntc_templates/templates/fortinet_get_system_ha_status.textfsm

test data: https://github.com/networktocode/ntc-templates/tree/master/tests/fortinet/get_system_ha_status

Starting from the 6.4 branch, there is incorrect output in the tests. You can look at the 6.2 branch, everything is correct there.

@pnpestov
Copy link
Author

pnpestov commented Oct 1, 2024

@pnpestov Might you also have a device with mondev that is unhealthy so the test data for mondev can be fixed?

https://github.com/networktocode/ntc-templates/blob/master/tests/fortinet/get_system_ha_status/fortinet_get_system_ha_status_7.0_unhealthy.raw

Unfortunately, all the devices are in operation and I can't pull the cable out of the lan1 port. But you can just put four spaces each, similar to my last output when executing the command. I think it will be like this

HA Health Status:
    WARNING: FGT40FYYYYYYYYYY has mondev down;
Model: FortiGate-40F
Mode: HA A-P
Group: 172
Debug: 0
Cluster Uptime: 63 days 22:15:42
Cluster state change time: 2024-02-11 15:25:27
Primary selected using:
    <2024/02/11 15:25:27> FGT40FXXXXXXXXXX is selected as the primary because the value 0 of link-failure + pingsvr-failure is less than peer member FGT40FYYYYYYYYYY.
ses_pickup: enable, ses_pickup_delay=disable
override: enable
Configuration Status:
    FGT40FXXXXXXXXXX(updated 0 seconds ago): in-sync
    FGT40FYYYYYYYYYY(updated 0 seconds ago): in-sync
System Usage stats:
    FGT40FXXXXXXXXXX(updated 0 seconds ago):
        sessions=768, average-cpu-user/nice/system/idle=0%/0%/0%/99%, memory=35%
    FGT40FYYYYYYYYYY(updated 0 seconds ago):
        sessions=634, average-cpu-user/nice/system/idle=0%/0%/0%/100%, memory=31%
HBDEV stats:
    FGT40FXXXXXXXXXX(updated 0 seconds ago):
        lan2: physical/1000auto, up, rx-bytes/packets/dropped/errors=9997131732/27616386/0/0, tx=10080077920/27616652/0/0
        lan3: physical/1000auto, up, rx-bytes/packets/dropped/errors=11772621099/36693306/0/0, tx=26151306122/60128423/0/0
    FGT40FYYYYYYYYYY(updated 0 seconds ago):
        lan2: physical/1000auto, up, rx-bytes/packets/dropped/errors=10080077920/27616652/0/0, tx=9997131732/27616386/0/0
        lan3: physical/1000auto, up, rx-bytes/packets/dropped/errors=26151777728/60128423/0/0, tx=11771044717/36693306/0/0
MONDEV stats:
    FGT40FXXXXXXXXXX(updated 0 seconds ago):
        lan1: physical/100auto, up, rx-bytes/packets/dropped/errors=535463275509/3388288017/0/0, tx=3023591767050/4114831127/0/0
        wan: physical/100auto, up, rx-bytes/packets/dropped/errors=3314385262333/4439875482/0/0, tx=768352772861/3445252569/0/0
    FGT40FYYYYYYYYYY(updated 0 seconds ago):
        lan1: physical/00, down, rx-bytes/packets/dropped/errors=0/0/0/0, tx=0/0/0/0
        wan: physical/100auto, up, rx-bytes/packets/dropped/errors=15792718293/245544650/0/0, tx=0/0/0/0
Primary : FGT-fw-a, FGT40FXXXXXXXXXX, HA cluster index = 1
Secondary : FGT-fw-b, FGT40FYYYYYYYYYY, HA cluster index = 0
number of vcluster: 1
vcluster 1: work 169.254.0.2
Primary: FGT40FXXXXXXXXXX, HA operating index = 0
Secondary: FGT40FYYYYYYYYYY, HA operating index = 1

@pnpestov
Copy link
Author

pnpestov commented Oct 1, 2024

@pnpestov Might there be invisible trailing characters at the end of the HA Health Status: line?

The parser is failing in the first in-Start State which means it didn't transition into the UnhealthyStatus State.

No, I don't think so. If only the carriage translation character is '\n'

@mjbear
Copy link
Contributor

mjbear commented Oct 1, 2024

@pnpestov
Totally understand the point about devices in production. You need a lab device, haha. All good.

So far I haven't detected an issue other than the leading white space on the WARNING lines.

If you'd like, try the changes I've made to my feature branch for this.
(Or you could clone my fork and switch to that feature branch. Pick your poison, right? 😉)

@pnpestov
Copy link
Author

pnpestov commented Oct 1, 2024

@mjbear
I applied the new changes, but the result is the same.

@pnpestov
Copy link
Author

pnpestov commented Oct 1, 2024

@mjbear
Netmiko connection. Saved the output to the list
['H', 'A', ' ', 'H', 'e', 'a', 'l', 't', 'h', ' ', 'S', 't', 'a', 't', 'u', 's', ':', ' ', '\n', ' ', ' ', ' ', ' ', 'W', 'A', 'R', 'N', 'I', 'N', 'G', ':', ' ', 'F', 'G', 'T', '4', '0', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', ' ', 'h', 'a', 's', ' ', 'h', 'b', 'd', 'e', 'v', ' ', 'd', 'o', 'w', 'n', ';', ' ', '\n', ' ', ' ', ' ', ' ', 'W', 'A', 'R', 'N', 'I', 'N', 'G', ':', ' ', 'F', 'G', 'T', '4', '0', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', ' ', 'h', 'a', 's', ' ', 'h', 'b', 'd', 'e', 'v', ' ', 'd', 'o', 'w', 'n', ';', ' ', '\n', .....

Start
  ^HA\s+Health\s+Status:\s+ -> UnhealthyStatus
  ^HA\s+Health\s+Status:\s+${HA_HEALTH}  
  ^Model:\s+${MODEL}
  ^Mode:\s+${HA_MODE}
  ^Group:\s+${HA_GROUP}
  ^Debug:\s+\d+
  ^Cluster\s+Uptime:\s+${CLUSTER_UPTIME}
  ^Cluster\s+state\s+change\s+time:\s+${CLUSTER_STATE_CHANGED_TIME}
  ^(Master|Primary)\s+selected\s+using:
  ^\s*\<\S+
  ^ses_pickup:\s+${HA_SESSION_PICKUP_STATUS},\s+ses_pickup_delay=${HA_SESSION_PICKUP_DELAY}
  ^override:\s+${HA_OVERRIDE_STATUS}
  ^Configuration\s+Status: -> Configuration_Status
  # Catch old 6.0_noha with no "Configuraton Status"
  ^System\s+Usage\s+stats: ->  System_Usage_stats
  ^. -> Error "in-Start"

UnhealthyStatus
  # semicolon necessary to anchor
  ^\s+${HA_HEALTH};
  ^Model:\s+${MODEL} -> Start

Receive
WARNING: FGT40YYYYYYYYYYY has hbdev down
but with the OK status, the template returns a void :-). There is no error in this case and the script continues to work

@mjbear
Copy link
Contributor

mjbear commented Oct 1, 2024

^HA\s+Health\s+Status:\s+ -> UnhealthyStatus
^HA\s+Health\s+Status:\s+${HA_HEALTH}

Are you indicating you reordered those two lines?

Ah, HA_HEALTH is too loose of a regex or not anchored well enough.
Value HA_HEALTH (\S+|.*)

^HA\s+Health\s+Status:\s+${HA_HEALTH}

Receive WARNING: FGT40YYYYYYYYYYY has hbdev down but with the OK status, the template returns a void :-). There is no error in this case and the script continues to work

Which is likely why you're mentioning void (or null).

➡️ Edit:
HA_HEALTH has to be loose to capture that warning line.
Though I did anchor the "healthy" status line as such ^HA\s+Health\s+Status:\s+${HA_HEALTH}$$

What's wild though is that the existing test data functions fine (no changes to their yaml test output).
I need to modify the capture group ... I have an idea. Implemented and pushed to my feature branch.

@pnpestov
Copy link
Author

pnpestov commented Oct 1, 2024

Are you indicating you reordered those two lines?

Yes

@pnpestov
Copy link
Author

pnpestov commented Oct 1, 2024

Your recent changes result in the same error, but the content of the line is as follows: "Rule Line: 37. Input Line: HA Health Status: ."

Traceback (most recent call last):
  File "C:\Users\Pestov.P\Scripts_py\Netmiko\netbox_diagnose_lldprx_shop_v5.py", line 129, in <module>
    command_parsed = parse_output(platform = "fortinet", command = "get system ha status", data = output)
  File "C:\Users\Pestov.P\AppData\Roaming\Python\Python310\site-packages\ntc_templates\parse.py", line 77, in parse_output
    cli_table.ParseCmd(data, attrs)
  File "C:\Users\Pestov.P\AppData\Roaming\Python\Python310\site-packages\textfsm\clitable.py", line 282, in ParseCmd
    self.table = self._ParseCmdItem(self.raw, template_file=template_files[0])
  File "C:\Users\Pestov.P\AppData\Roaming\Python\Python310\site-packages\textfsm\clitable.py", line 315, in _ParseCmdItem
    for record in fsm.ParseText(cmd_input):
  File "C:\Users\Pestov.P\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 895, in ParseText
    self._CheckLine(line)
  File "C:\Users\Pestov.P\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 944, in _CheckLine
    if self._Operations(rule, line):
  File "C:\Users\Pestov.P\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 1021, in _Operations
    raise TextFSMError('Error: %s. Rule Line: %s. Input Line: %s.'
textfsm.parser.TextFSMError: Error: "in-Start". Rule Line: 37. Input Line: HA Health Status: .

@mjbear
Copy link
Contributor

mjbear commented Oct 1, 2024

Your recent changes result in the same error, but the content of the line is as follows: "Rule Line: 37. Input Line: HA Health Status: ."

Ugh.
It appears I may have taken things a few steps back. 🤦‍♂️ 😖

Edit:
Please provide the healthy raw output in a code block too. Thank you! 👏

@pnpestov
Copy link
Author

pnpestov commented Oct 2, 2024

Ugh.
It appears I may have taken things a few steps back. 🤦‍♂️ 😖

Not quite like that. After all, the OK state is still processed correctly. But if the status is not OK, then the error presented above appears

@mjbear
Copy link
Contributor

mjbear commented Oct 2, 2024

@pnpestov
I'm perplexed since the test data in ntc-templates doesn't generate any modifications or fail with an error.

I decided to change a few items (so best copy the entire template to be safe).

  • reverted the "healthy" ha status from anchoring at end of line $$
  • added flexibility of white space in the rule that state transitions to UnhealthyStatus state

Please give this a shot.
If that still errors out, please take the CLI output and put it in a plain text file (ex: Notepad, Notepad++, VS Code). Then attach that plain text file to this issue thread.

🤔 There has to be some sort of white space I'm missing within this GH issue thread.

Edit: Gah, this has issues too. Hmmmf
Edit2: Well maybe not, something got weird there with textfsm.nornir.tech, but it did parse fine on my local machine (as it has been).

@pnpestov
Copy link
Author

pnpestov commented Oct 2, 2024

I'll try it now.
For information, if you make the template as shown below, then the statuses are processed correctly with both OK and WARNING. Only WARNING one, not both, as I would like

Start
  ^HA\s+Health\s+Status:\s+${HA_HEALTH}
  ^    ${HA_HEALTH};
  # ^HA\s+Health\s+Status:$$ -> UnhealthyStatus
  ^Model:\s+${MODEL}
  ^Mode:\s+${HA_MODE}
  ^Group:\s+${HA_GROUP}
  ^Debug:\s+\d+
  ^Cluster\s+Uptime:\s+${CLUSTER_UPTIME}
  ^Cluster\s+state\s+change\s+time:\s+${CLUSTER_STATE_CHANGED_TIME}
  ^(Master|Primary)\s+selected\s+using:
  ^\s*\<\S+
  ^ses_pickup:\s+${HA_SESSION_PICKUP_STATUS},\s+ses_pickup_delay=${HA_SESSION_PICKUP_DELAY}
  ^override:\s+${HA_OVERRIDE_STATUS}
  ^Configuration\s+Status: -> Configuration_Status
  # Catch old 6.0_noha with no "Configuraton Status"
  ^System\s+Usage\s+stats: ->  System_Usage_stats
  ^. -> Error "in-Start"

Apparently, the variable ${HA_HEALTH} is overwritten several times

@pnpestov
Copy link
Author

pnpestov commented Oct 2, 2024

Your last template for the OK status works correctly, and for the WARNING status it does not return anything in the desired variable, but the error does not occur.
I saved the output to a file in the Notepad++ editor and saw the CRLF characters there. And this, as far as I know, corresponds to the string characters \r\n
Example.txt

@pnpestov
Copy link
Author

pnpestov commented Oct 2, 2024

@mjbear
I thought a little. I dare to assume that the string type of the HA_HEALTH variable is not suitable for us, since there is always a possibility that the value may be multi-string. A good solution would be to use a list type variable. Slightly tweaked the template

Value List HA_HEALTH (\S+|.*)
Value MODEL (\S+)
Value HA_MODE ([\S\s]+)
Value HA_GROUP (\S+)
Value CLUSTER_UPTIME ([\S\s]+)
Value CLUSTER_STATE_CHANGED_TIME ([\S\s]+)
Value HA_SESSION_PICKUP_STATUS (\S+)
Value HA_SESSION_PICKUP_DELAY (\S+)
Value HA_OVERRIDE_STATUS (\S+)
Value HA_MASTER_UNIT_NAME (\S+)
Value HA_SLAVE_UNIT_NAME (\S+)
Value HA_MASTER_UNIT_SERIAL (\S+)
Value HA_SLAVE_UNIT_SERIAL (\S+)
Value HA_MASTER_UNIT_INDEX (\S+)
Value HA_SLAVE_UNIT_INDEX (\S+)

Start
  ^HA\s+Health\s+Status:\s+${HA_HEALTH}
  ^\s+${HA_HEALTH};
  # ^HA\s+Health\s+Status:$$ -> UnhealthyStatus

I get the following values of the HA_HEALTH variable at the output
with OK
['OK']
with WARNING (hbdev)
['', 'WARNING: FGT40FTK20072059 has hbdev down', 'WARNING: FGT40FTK20076253 has hbdev down']

@mjbear
Copy link
Contributor

mjbear commented Oct 2, 2024

@mjbear I thought a little. I dare to assume that the string type of the HA_HEALTH variable is not suitable for us, since there is always a possibility that the value may be multi-string. A good solution would be to use a list type variable. Slightly tweaked the template

Value List HA_HEALTH (\S+|.*)

@pnpestov
Ah, you have a clustered pair of Fortinet devices and both just happened to have warning(s).
Good catch on the multi-line health statuses. I like it.

['', 'WARNING: FGT40FTK20072059 has hbdev down', 'WARNING: FGT40FTK20076253 has hbdev down']

The (\S+|.*) is still too loose in my opinion since .* means zero or more characters.

So far our test data shows we have either OK or WARNING: some other text. This means at a minimum there would be one group of non-white space strings. That's why there's the pair of empty single quotes '' at the beginning of that list (in your example).

^HA\s+Health\s+Status:$$ -> UnhealthyStatus
Gotta figure out why this rule isn't matching as that's the core to the issue you're seeing.

@mjbear
Copy link
Contributor

mjbear commented Oct 2, 2024

@pnpestov
I did push changes to the branch on my fork, but I have a feeling if you test it there will still be a failure. 😞

@pnpestov
Copy link
Author

pnpestov commented Oct 2, 2024

^HA\s+Health\s+Status:$$ -> UnhealthyStatus
Gotta figure out why this rule isn't matching as that's the core to the issue you're seeing.

It seems to me that the line
^HA\s+Health\s+Status:\s+${HA_HEALTH}
interrupts him. Since each line ends with characters \r\n . It is confirmed that the first item of the list "

@pnpestov
Copy link
Author

pnpestov commented Oct 2, 2024

I did push changes to the branch on my fork, but I have a feeling if you test it there will still be a failure.

I checked, there is no error, but the variable also outputs [] in the case of WARNING

@mjbear
Copy link
Contributor

mjbear commented Oct 2, 2024

^HA\s+Health\s+Status:$$ -> UnhealthyStatus
Gotta figure out why this rule isn't matching as that's the core to the issue you're seeing.

It seems to me that the line ^HA\s+Health\s+Status:\s+${HA_HEALTH} interrupts him.

Depends on which capture group is in use (please read my note below).

Since each line ends with characters \r\n . It is confirmed that the first item of the list "

The situation you describe is why I made the capture group require at least one non-white space character and then the .* pattern (which would be zero or more of any character).

Value List HA_HEALTH (\S+(?:.*))

I did push changes to the branch on my fork, but I have a feeling if you test it there will still be a failure.

I checked, there is no error, but the variable also outputs []

I don't expect any change in behavior even if we were to make the non-capturing group (?:.*) optional with that last question mark. 👉 You could give this a try if you like.
Value List HA_HEALTH (\S+(?:.*)?)

@pnpestov
Copy link
Author

pnpestov commented Oct 2, 2024

I don't expect any change in behavior even if we were to make the non-capturing group (?:.) optional with that last question mark. 👉 You could give this a try if you like.
Value List HA_HEALTH (\S+(?:.
)?)

I get the same [] at the output

@mjbear
Copy link
Contributor

mjbear commented Oct 3, 2024

@pnpestov
I certainly would like to solve this.
(For what it's worth I won't be able to check on this until tomorrow evening.)

So far the template changes I made were ran against the test data you provided as well as the existing test data. Those tests parsed successfully.

I haven't seen what you have configured/coded so I have an unfounded suspicion that somehow another copy of ntc-templates is being accidentally referenced.

Hopefully you can humor me by setting up a basic script (see below 👇) that should tell us for sure if it's a template syntax issue. (That is - for testing right now don't have Netmiko do the parsing via use_textfsm=True.)

🙏 I hope this helps identify the issue. 🙏

Netmiko to retrieve text, but manually parse with TextFSM

Below you'll find an untested script (don't have Fortinet gear) modified from this example.

import textfsm
from netmiko import ConnectHandler

device = {
  'device_type': 'fortinet',
  'host':   'X.X.X.X',
  'username': 'test',
  'password': 'password',
  # 'port' : 22,          # optional, defaults to 22
  # 'secret': 'secret',     # optional, defaults to ''
}

net_connect = ConnectHandler(**device)
output = net_connect.send_command("get system ha status")

template = open('/tmp/fortinet_get_system_ha_status.textfsm')

re_table = textfsm.TextFSM(template) # initialize textfsm object
fsm_results = re_table.ParseText(output) # parse output text with textfsm object.

template.close()

"""
ParseText output returns back a list of tuples.
First tuple is the header, every subsequent tuple is a row.
Let us make a dict with relevant key value pairs out of this!
"""

results = list()
for item in fsm_results:
    results.append(dict(zip(re_table.header, item)))

print(results)

Parsing with content in local files

And if you'd like to parse the raw output against the template, update the paths to reflect your local system and give it a try. (json.dumps solution for prettier output.)

import json
# from pprint import pprint

import textfsm

with open('/tmp/fortinet_example.txt') as fh:
    raw = fh.read()

tpl_fh = open('/tmp/fortinet_get_system_ha_status.textfsm')

tfsm_parser = textfsm.TextFSM(tpl_fh)

tpl_fh.close()

parsed = tfsm_parser.ParseText(raw)

# pprint(parsed)
print(json.dumps(parsed, indent=4))

@pnpestov
Copy link
Author

pnpestov commented Oct 3, 2024

@mjbear
I executed the scripts you provided with different HA_HEALTH statuses with https://raw.githubusercontent.com/mjbear/ntc-templates/refs/heads/fortinet_get_sys_ha_hbdev_issue1859/ntc_templates/templates/fortinet_get_system_ha_status.textfsm Here's what happened at the exit:

Netmiko
HA Health Status: OK

[{'HA_HEALTH': ['OK'], 'MODEL': 'FortiGate-40F', 'HA_MODE': 'HA A-P', 'HA_GROUP': '28', 'CLUSTER_UPTIME': '23 days 19:13:5', 'CLUSTER_STATE_CHANGED_TIME': '2024-10-02 20:32:30', 'HA_SESSION_PICKUP_STATUS': 'enable', 'HA_SESSION_PICKUP_DELAY': 'disable', 'HA_OVERRIDE_STATUS': 'enable', 'HA_MASTER_UNIT_NAME': 'GTA1-fw-a', 'HA_SLAVE_UNIT_NAME': 'GTA1-fw-b', 'HA_MASTER_UNIT_SERIAL': 'FGT40FTK20072059', 'HA_SLAVE_UNIT_SERIAL': 'FGT40FTK20076253', 'HA_MASTER_UNIT_INDEX': '1', 'HA_SLAVE_UNIT_INDEX': '0'}]

HA Health Status:
WARNING: FGT40FTK20072059 has hbdev down;
WARNING: FGT40FTK20076253 has hbdev down;

[{'HA_HEALTH': [], 'MODEL': 'FortiGate-40F', 'HA_MODE': 'HA A-P', 'HA_GROUP': '28', 'CLUSTER_UPTIME': '23 days 19:14:20', 'CLUSTER_STATE_CHANGED_TIME': '2024-10-02 20:32:30', 'HA_SESSION_PICKUP_STATUS': 'enable', 'HA_SESSION_PICKUP_DELAY': 'disable', 'HA_OVERRIDE_STATUS': 'enable', 'HA_MASTER_UNIT_NAME': 'GTA1-fw-a', 'HA_SLAVE_UNIT_NAME': 'GTA1-fw-b', 'HA_MASTER_UNIT_SERIAL': 'FGT40FTK20072059', 'HA_SLAVE_UNIT_SERIAL': 'FGT40FTK20076253', 'HA_MASTER_UNIT_INDEX': '1', 'HA_SLAVE_UNIT_INDEX': '0'}]

From file
HA Health Status: OK

[
    [
        [
            "OK"
        ],
        "FortiGate-40F",
        "HA A-P",
        "28",
        "23 days 19:18:14",
        "2024-10-02 20:32:30",
        "enable",
        "disable",
        "enable",
        "GTA1-fw-a",
        "GTA1-fw-b",
        "FGT40FTK20072059",
        "FGT40FTK20076253",
        "1",
        "0"
    ]
]

HA Health Status:
WARNING: FGT40FTK20072059 has hbdev down;
WARNING: FGT40FTK20076253 has hbdev down;

[
    [
        [
            "WARNING: FGT40FTK20072059 has hbdev down",
            "WARNING: FGT40FTK20076253 has hbdev down"
        ],
        "FortiGate-40F",
        "HA A-P",
        "28",
        "23 days 19:19:57",
        "2024-10-02 20:32:30",
        "enable",
        "disable",
        "enable",
        "GTA1-fw-a",
        "GTA1-fw-b",
        "FGT40FTK20072059",
        "FGT40FTK20076253",
        "1",
        "0"
    ]
]

Software versions:
bcrypt==3.2.2
certifi==2022.9.24
cffi==1.15.0
charset-normalizer==2.1.1
configparser==5.3.0
cryptography==37.0.2
future==0.18.2
idna==3.4
ipaddress==1.0.23
Jinja2==3.1.2
MarkupSafe==2.1.1
multidict==6.0.2
netmiko==4.3.0
ntc_templates==7.1.0
numpy==1.22.4
packaging==23.2
pandas==1.4.2
paramiko==2.11.0
prettytable==3.11.0
pycparser==2.21
PyNaCl==1.5.0
pynetbox==7.3.3
pyserial==3.5
python-dateutil==2.8.2
python-dotenv==0.21.0
pytz==2022.1
PyYAML==6.0
pyzabbix==1.3.1
requests==2.28.1
scp==0.14.4
six==1.16.0
tenacity==8.0.1
textfsm==1.1.3
tk==0.1.0
urllib3==1.26.12
wcwidth==0.2.13
yarl==1.8.1
The scripts and files used are attached.
Downloads.zip

@mjbear
Copy link
Contributor

mjbear commented Oct 3, 2024

@pnpestov
I couldn't help but check GH before I get going today.

[{'HA_HEALTH': [], 'MODEL': 'FortiGate-40F', ... snipped ...

I suspect this indicates there's some character that apparently isn't white space at the end of the "HA Health Status:" line when Fortinet is in an unhealthy state.

HA Health Status: WARNING: FGT40FTK20072059 has hbdev down; WARNING: FGT40FTK20076253 has hbdev down;

[
    [
        [
            "WARNING: FGT40FTK20072059 has hbdev down",
            "WARNING: FGT40FTK20076253 has hbdev down"
        ],
        "FortiGate-40F",
... snipped ...
    ]
]

This is good in that it shows us the template is functional on text in a file (where some terminal chars may have been lost) and points to there being extra characters.

Instead of a \s*, we could go with .* and still anchor it at the end of the line.
^HA\s+Health\s+Status:.*$$ -> UnhealthyStatus

Note: If the regex on the previous line doesn't work you can try making that above rule look like ^HA\s+Health\s+Status: -> UnhealthyStatus ... but that rule must be after the one that captures the healthy status.

✨ I've updated that feature branch on my fork of ntc-templates if you're feeling in the mood for another test.
(Apologies for this back-and-forth with "try this". 😅)

@pnpestov
Copy link
Author

pnpestov commented Oct 3, 2024

✨ I've updated that feature branch on my fork of ntc-templates if you're feeling in the mood for another test.
(Apologies for this back-and-forth with "try this". 😅)

Yes, there is a desire! Moreover, the ideas have not ended yet and at least this ticket improves the template.
I checked it, also in the output [].
Unfortunately, I don't have much time today. That's why I can't actively check for new changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants