Skip to content

A javac-like compiler built over Flex & Bison software tools. It takes in a Java source code file and emits its JVM bytecode & .class files.

License

Notifications You must be signed in to change notification settings

Elzawawy/java-bytecode-generator

Repository files navigation

Yet Another Java ByteCode Generator

(A javac-like Compiler)

Overview

A java bytecode generator is a compiler built over the famous tools Flex & Bison to take in any java source code (a subset only of Java Lang is supported currently) and emits its equivalent bytecode. This is a project developed for the PLT (Programming Language Translation) course at Faculty of Engineering, Alexandria University in Spring 2020 Offering.

The goal is to practice techniques of constructing semantics rules to generate java bytecode. Generated bytecode must follow Standard bytecode instructions defined in [Java Virtual Machine Specification].(http://java.sun.com/docs/books/jvms/second_edition/html/VMSpecTOC.doc.html)

Our workflow included the following steps:

  1. Understanding the Flex & Bison tools we are going to build upon.
  2. Understanding the Semantic Actions/Rules asscoiated with a Java CFG.
  3. Understanding the Java Bytecode we are required to generate as an intermediate code represenattion.
  4. Building the Java bytecode generator upon the building blocks we understood.

Running & Testing

Input: file that includes java sourcecode that follows the subset of grammar mentioned in next section.
Output: .class file that can be run now on JVM.

To run the program, use the script run.sh as follows:

./run.sh file_name

where filename, can be any file that contains java source code.

For testing purposes while development use this,

./run.sh --debug file_name

or

./run.sh -d file_name

where file_name in this case is a file that exists in the test-cases folder. The extra argument will ensure we keep all our test files from the test-cases folder and picking the suitable unit test case for the part you're developing.

Subset of Java grammar supported (most recent)

Before using our compiler, you must consider looking at this section to understand the input that we're accepting. These isn't a full java grammar but a subset of it. It covers the requirements that were imposed on us by the project to be delivered. We make the natural assumption that input programs must follow the below rules and not just any java code. However, we develop the code to be extensible easily to the rest of the Java set of rules.

method_body: 
            %empty
            |statement_list
            ;

statement_list: 
                statement
                |statement_list statement
                ;

statement:  
        declaration 
        |if 
        |while 
        |assignment
        ;

declaration: primitive_type IDENTIFIER ';';

primitive_type: 
                INT 
                |FLOAT
                ;

if: 
    IF '(' boolean_expression ')'
    '{' statement_list '}' 
    ELSE '{' statement_list '}'
    ;

while: 
        WHILE '(' boolean_expression ')'
        '{' statement_list '}'
        ;

assignment: IDENTIFIER '=' expression ';';

expression:
            INT_NUM
            |FLOAT_NUM
            |IDENTIFIER
            |expression ARITH_OP expression
            |'(' expression ')'
            ;

boolean_expression: 
                    TRUE 
                    |FALSE
                    |expression BOOL_OP expression
                    |expression REL_OP expression

For interested readers: What are Flex & Bison?

Flex and bison are modern replacements for the classic lex and yacc that were both developed at Bell Laboratories in the 1970s. Flex and bison are tools designed for writers of compilers and interpreters, although they are also useful for many applications that will interest noncompiler writers. Any application that looks for patterns in its input or has an input or command language is a good candidate for flex and bison. Furthermore, they allow for rapid application prototyping, easy modification, and simple maintenance of programs.

To stimulate your imagination, here are a few things people have used flex and bison, or their predecessors lex and yacc, to develop:

  • The desktop calculator bc
  • The tools eqn and pic, typesetting preprocessors for mathematical equations and complex pictures
  • Many other “domain-specific languages” targeted for a particular application
  • PCC, the Portable C Compiler used with many Unix systems
  • Flex itself
  • A SQL database language translator

For interested readers: Translation of Productions to Semantic Actions/Rules.

Each production is associated with a semantic actions. This semantic actions will be generated when the production is matched by the parser. The semantic actions/rules we developed in this case is ones that produce teh equivalent java bytecode to the following statement matched by parser.

We explain next two non-terminals and their semantic actions as example to our idea and the rest can be found and traced in the code itself. Then we move to a further step into inermediate code generaton which is essential, namely, Backpatching.

Example 1: Method Body

The generated code consists of a header,footer alongside with the statment list that contains the rest of the code,where the header appends the equivalent java byte-code for the class,the main method and sets the stack and locals size. The footer generetes a return from the method.

Example 2: IF

  • Sends the true list of the boolean expression and the first marker instruction index to the backpatching function.
  • Sends the false list of the boolean expression and the second marker instrcution index to the backpatching function.
  • Perform the merging of lists logic

More Into Code Generation: Backpatching

When generating code for boolean expressions and flow-of-control statements is that of matching a jump instruction with the target of the jump. For example, the translation of the boolean expression B in if ( B ) S, the target of the jump when B is false will not be declared until S is examined. In a one-pass translation, the problem is solved by passing labels as inherited attributes to where the relevant jump instructions were generated. But a separate pass is then needed to bind labels to addresses.

Backpatching counters the problem of two-passes translation by having lists of jumps are passed as synthesized attributes. Specifically, when a jump is generated, the target of the jump is temporarily left unspecified. Each such jump is put on a list of jumps whose labels are to be filled in when the proper label can be determined.

For interested readers: Design & Code Anatomy

  • java_lexical_analyzer.l ---> The Flex input file which the tool generates from a Lexical Analayzer for our Java subset.

  • java_parser.y ---> The Bison input file which the tool generates from a Parser for our Java subset.

  • semantic_actions_utils.h ---> The file that contains all utility functions that are used in the intermediate code generation and semantic actions for each production.

  • Makefile ---> Defines rules for bash commands that is needed to build,test and run the code.

  • run.sh ---> Defines a simple way to execute Make rules easily.


References

  • Compilers, Principles, Techniques, & Tools, Second Edition by Alfred V. Aho & Monica S. Lam & Ravi Sethi & Jeffrey D. Ullman
  • Engineering a Compiler, Second Edition by Keith D. Cooper & Linda Torczon.

Understanding Flex & Bison

Understanding Java Bytecode

Made with ❤️