Overview

Graphlr: indexing antlr3 generated Java AST through a Neo4j graph

3 Comments

While working on my Sonar fork which allows to simulate refactoring without actually touching source files I have once again realized what a PITA it is to traverse the antlr-generated Abstract Syntax Tree (AST) for Java. The mechanism is absolutely cool, no doubt. But the final AST representation is not intuitive, and the corresponding traversal code always looks ugly.

While intensively working with Neo4j, I just asked myself: wouldn’t it be nice to use it as index for the Java-AST? You only need to jump to a relevant node and still can use the classic AST-traversal to get details out of it. Or you could wire the whole AST through an accompanying graph and thus use the graph to traverse the whole AST.

So, here comes Graphlr as the first result. It allows traversing the antlr3 generated Java AST using Neo4j. It’s of course open source and available through the same licence antlr is. Graphlr is nothing more than an antlr grammar/parser/lexer file based on the original one for Java and extending it with, well, code building the Neo4j graph. The graph is a sort of index, if you like, allowing quick jumps into the code spots you’re interested in.

Let’s look at an example, though I assume that you’re familiar with antlr, especially antlr3.

package org.pbit.graphlr;
 
import java.util.List;
 
import org.antlr.runtime.ANTLRFileStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.tree.Tree;
 
 
public class GraphlrTest {
    public static void main(String[] args) throws Exception {
        try {
            CharStream input = new ANTLRFileStream("/data/workspace/Graphlr/src/org/pbit/graphlr/GraphlrTest.java");
            GraphlrJavaLexer lex = new GraphlrJavaLexer(input);
            CommonTokenStream tokens = new CommonTokenStream(lex);
            GraphlrJavaParser parser = new GraphlrJavaParser(tokens);
            parser.compilationUnit();
 
            //find compilation units
            List<Tree> l = parser.runCypher("start ret=node:node_auto_index(type = 'class') return ret");
            for (Tree t : l) {
                System.out.println("Compilation unit found: " + t.getText()); 
            }
 
            //find methods of a particular class
            l = parser.runCypher("start n=node:node_auto_index(name = 'GraphlrTest') match n-[:IMPLEMENTS]->ret return ret");
            for (Tree t : l) {
                System.out.println("GraphlrTest implements: " + t.getText()); 
            }
        } catch(Throwable t) {
            t.printStackTrace();
        }
    }
 
    protected void dummy () {}
}
 
final class Dummy {
 
}

It’s very simple. First, it still works and feels like antlr3. So you use the generated classes to lex and parse your class (I’m doing it with the one including the test, so you can see the relation). Then you start with the compilation unit. From here, you can use available Neo4j nodes as well as indexes to traverse the AST. In my case, I use Cypher.

There are two traversals finding all the top level classes as well as all methods of the main class. That’s basically it.

So, there is still work to be done. Why I started this implementation is the main reason to extend the Neo4j based AST index to allow direct jumps into method calls, variable declaration etc. It’s a bit more tricky with antlr, but still doable.

If you want to check it out or maybe to contribute, feel free to get it from GitHub: https://github.com/pavlobaron/graphlr

Feedback is very welcome.

Kommentare

  • T. Parr

    20. August 2012 von T. Parr

    Hi Pavlo, have you thought how this would look with parse trees? ANTLR v4 does just parse trees not ASTs. so from rule

    stat : ‘return’ expr ‘;’ ;

    we’d get

    (stat ‘return’ (expr 100) ‘;’)

    for input “return 100;”. Not sure what the relationship links would be.

    Ter

    • Pavlo Baron

      27. August 2012 von Pavlo Baron

      Terence,

      sorry for the late reply – have been on holidays ­čÖé

      No, I didn’t think about it yet. I need to have a deeper look at v4. Even more, there are very many possible ways in a Java source file to express linkable entities – calls, declarations and whatnot. I didn’t play with this part yet, need to invest some time into it.

      pb

  • Stefan

    11. April 2015 von Stefan

    Hi Pavlo,
    How much effort would it be to update Graphlr and to build a reverse Graphlr to generate the code from the graph?

    Thanks, Stefan

Comment

Your email address will not be published. Required fields are marked *