Skip to content

Tutorial 4: Fragments definition

Ilya Lakhin edited this page Oct 17, 2013 · 4 revisions

Before the syntax parsing stage parser selects simple syntactical Fragments between the pair of specific tokens. And uses these fragments to perform parsing results caching during the syntax parsing stage. Such fragments can be for example a code blocks between "{" and "}" tokens in C++, or maybe a function definition between keywords "def" and "end" in Ruby. Most of the programming languages usually have such pairs. The main property of the Fragment is that it's meaning(or to be more specific a syntax rule that can be applied to it) in context of language's syntax should be invariant to it's content.

Let's illustrate it in example. Here is a Java snippet:

public class FibCalculator {
    public static int fibonacci(int fibIndex) {
        if (memoized.containsKey(fibIndex)) {
            return memoized.get(fibIndex);
        } else {
            int answer = fibonacci(fibIndex - 1) + fibonacci(fibIndex - 2);
            memoized.put(fibIndex, answer);
            return answer;
        }
    }
}

There are four code fragments between curly braces "{", "}": class body, method body, success branch of conditional operator, failed branch of that operator. And each of these fragments will remain it's type regardless of it's internal content. Method body will remain as a method body even if programmer adds another one code statement, or even writes something syntactically incorrect in it. Of course meaning of the nested fragments may potentially change depending on changes made in to the parent fragment. But it doesn't make sense.

To summarise Papa Carlo needs developer to define Fragments of code with the following properties:

  1. They could be simply determined as a code sequence between two tokens.
  2. Their syntactical meaning invariant to internal content. At least in most cases.
  3. Normally it is expected that code contains a lot of such fragments.

Fortunately most of the modern programming languages meet these requirements.

Defining fragments is extremely simple. That is how it is done in the JSON parser example:

  private def contextualizer = {
    val contextualizer = new Contextualizer

    import contextualizer._

    trackContext("[", "]").allowCaching
    trackContext("{", "}").allowCaching

    contextualizer
  }

Calling of .allowCaching() method here indicates that fragments of this kind may be potentially bound to cached parts of the resulting Abstract Syntax Tree. Caching process described in more details in the next topic.

Another useful point about fragments is changing tokens flags. For example it is possible to define that specific fragment should be completely ignored during syntax parsing. It might be useful to define code comments:

trackContext("/*", "*/").forceSkip.topContext
trackContext(""", """).topContext

Here is a definition of two fragment types. First for code comments /* comments to be ignored */. And the second for string literals(potentially multiline strings). Method .forseSkip() indicates that all tokens in it should be ignored. And the .topContext() method indicates that there could not be fragments nested in it. For example first closed curly brace token in the string of { x = "abc } def"; } should not be considered as an ending token of this curly-brace fragment.

Similar to other parsing stages that utilising caching approach, fragmentation stage is not an exception. So the computational efforts that parser requires to update fragment system are proportional to the changes in the source code made by end-user. And in practice building fragments is a quite fast process. Moreover the parser is able to determine invalid fragments(with missing opened or closed tokens) and keep valid in touch. For example in this modified Java snippet:

public class FibCalculator {
    public static int fibonacci(int fibIndex) {
        if (memoized.containsKey(fibIndex)) {
            return memoized.get(fibIndex);
        } else {
            int answer = fibonacci(fibIndex - 1) + fibonacci(fibIndex - 2);
            memoized.put(fibIndex, answer);
            return answer;
damaged line
    }
}

parser will be able to still track fragments of class's body, method's body and even conditional operator's success-branch code block.