Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug #70570: json_encode and var_dump ignore simplexml cdata #12163

Closed
wants to merge 1 commit into from

Conversation

nielsdos
Copy link
Member

@nielsdos nielsdos commented Sep 9, 2023

A sanity check for the behaviour would be nice :)
Sending to master because of potential BC break.

XML_CDATA_SECTION_NODE was simply not handled, handle it the same way as we handle text nodes for consistency reasons.
We introduce the php_sxe_is_inclusive_text_node() helper for text-like nodes. In DOM parlance, these 2 types are called inclusive text nodes. Unfortunately, the fact that we handle CData the same as text now has a technical BC break.

Previously this XML:

<?xml version="1.0" encoding="UTF-8"?>
<container>
    <C><![CDATA[hello]]><foo/><![CDATA[world]]></C>
</container>

resulted in this var_dump output:

object(SimpleXMLElement)#1 (1) {
  ["C"]=>
  object(SimpleXMLElement)#2 (1) {
    ["foo"]=>
    object(SimpleXMLElement)#3 (0) {
    }
  }
}

However, after this patch (as text is not an element and thus handled specially) we get the following output:

object(SimpleXMLElement)#1 (1) {
  ["C"]=>
  string(10) "helloworld"
}

XML_CDATA_SECTION_NODE was simply not handled, handle it the same way as
we handle text nodes for consistency reasons.
We introduce the php_sxe_is_inclusive_text_node() helper for text-like
nodes. In DOM parlance, these 2 types are called inclusive text nodes.
Unfortunately, the fact that we handle CData the same as text now has a
technical BC break.

Previously this XML:
```
<?xml version="1.0" encoding="UTF-8"?>
<container>
    <C><![CDATA[hello]]><foo/><![CDATA[world]]></C>
</container>
```

resulted in this var_dump output:
```
object(SimpleXMLElement)#1 (1) {
  ["C"]=>
  object(SimpleXMLElement)#2 (1) {
    ["foo"]=>
    object(SimpleXMLElement)#3 (0) {
    }
  }
}
```

However, after this patch (as text is not an element and thus handled
specially) we get the following output:
```
object(SimpleXMLElement)#1 (1) {
  ["C"]=>
  string(10) "helloworld"
}
```
@nielsdos
Copy link
Member Author

Converting to draft, as name may not be set and therefore getName() may do a null access. (I think)

@nielsdos nielsdos marked this pull request as draft September 10, 2023 16:43
Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it strange that text nodes are not displayed in the first place. I wonder what the rationale for that is.

@nielsdos
Copy link
Member Author

A general solution for the inconsistencies in simplexml would be better, and would likely need to go through an RFC because it can have BC impact.

@nielsdos nielsdos closed this Nov 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants