ANTLR 3 - Customizing token numbers
> I'm not sure if lexers support vocabulary importing (I know
> parsers do), but if they do then you should be able to do it that
> way -- make a tokens file and import it into the lexer. Worth a
> try, anyway :)
Ahh, of course, I should have tried that.
And happily, it works! But I found the following caveats.
There is a bug that occurs when ANTLR imports and then exports a backslash. So if I have
parser grammar FooParser;
lexer grammar Foo;
// In Foo2.tokens
// In the generated Foo.tokens
'\\'=31 // Added by ANTLR
This causes a syntax error when compiling the parser. And I guess there is another bug in ANTLRWorks because after the syntax error, ANTLRWorks will keep repeating the same error every time you try to Generate Code, until you quit and restart the program.
By the way, I found that
Seems to work as a single backslash.
There is another important caveat: ANTLR cannot handle "holes" when importing tokens into the parser, i.e. unused numbers in the list of tokens. You must start numbering tokens at 4 and continue up from there with consecutive integers. The problem is that the token names array called tokenNames in your parser will not have any empty elements in it, so if your tokens are
then your token array will be
public static readonly string tokenNames = new string
Therefore, token name lookups will not work correctly.
On the plus side, you do not have to define all tokens in your .tokens file; ANTLR can add any additional tokens you define in the lexer and will number them correctly.
P.S. I'm using the C# target; perhaps YMMV for Java etc.