VB6Parse / Documentation / ANTLR4 Grammar

Visual Basic 6.0 ANTLR4 Grammar Specification

Overview

This document describes the ANTLR4 grammar specification for Visual Basic 6.0 used by the ProLeap VB6 Parser. The grammar is derived from the official Visual Basic 6.0 language reference and has been tested with MSDN VB6 statements and several Visual Basic 6.0 code repositories.

📝 Note: This grammar is provided for reference purposes. VB6Parse uses its own custom parsing implementation but references this ANTLR4 specification for completeness and comparative analysis.

Grammar File Location

The complete grammar specification can be found at:
VisualBasic6.g4

Grammar Statistics

Grammar Structure

The ANTLR4 grammar is organized into several major sections that correspond to the structure of Visual Basic 6.0 source files.

Module Rules

The top-level grammar rule defines the structure of a VB6 module (class, form, or standard module):

startRule
   : module EOF
   ;

module
   : WS? NEWLINE* (moduleHeader NEWLINE +)? 
     moduleReferences? NEWLINE* 
     controlProperties? NEWLINE* 
     moduleConfig? NEWLINE* 
     moduleAttributes? NEWLINE* 
     moduleOptions? NEWLINE* 
     moduleBody? NEWLINE* WS?
   ;

Module Components

  • moduleHeader: VERSION line (e.g., VERSION 1.0 CLASS)
  • moduleReferences: Object/library references
  • controlProperties: Form control definitions (for .frm files)
  • moduleConfig: BEGIN/END configuration blocks
  • moduleAttributes: Attribute statements
  • moduleOptions: Option Base, Option Explicit, Option Compare, etc.
  • moduleBody: The actual code (functions, subs, declarations)

Control Properties

Form files (.frm) contain control property definitions that are parsed using specialized rules:

controlProperties
   : WS? BEGIN WS cp_ControlType WS cp_ControlIdentifier WS? NEWLINE+
     cp_Properties+ 
     END NEWLINE*
   ;

cp_SingleProperty
   : WS? implicitCallStmt_InStmt WS? EQ WS? '$'? 
     cp_PropertyValue FRX_OFFSET? NEWLINE+
   ;

cp_NestedProperty
   : WS? BEGINPROPERTY WS ambiguousIdentifier 
     (LPAREN INTEGERLITERAL RPAREN)? (WS GUID)? NEWLINE+ 
     (cp_Properties+)? 
     ENDPROPERTY NEWLINE+
   ;
🎯 Key Feature: The grammar supports nested properties (BEGINPROPERTY/ENDPROPERTY blocks) and references to external binary resources via FRX_OFFSET markers.

Block Statements

The blockStmt rule enumerates all possible VB6 statements that can appear in code blocks:

Control Flow

  • doLoopStmt
  • forEachStmt
  • forNextStmt
  • ifThenElseStmt
  • selectCaseStmt
  • whileWendStmt
  • withStmt

File I/O

  • closeStmt
  • getStmt
  • inputStmt
  • lineInputStmt
  • openStmt
  • printStmt
  • putStmt
  • writeStmt

File System

  • chDirStmt
  • chDriveStmt
  • filecopyStmt
  • killStmt
  • mkdirStmt
  • nameStmt
  • rmdirStmt

Error Handling

  • errorStmt
  • onErrorStmt
  • resumeStmt

Declarations

VB6 supports various declaration types at the module level:

moduleBodyElement
   : moduleBlock
   | moduleOption
   | declareStmt        // External API declarations
   | enumerationStmt    // Enum definitions
   | eventStmt          // Event declarations
   | functionStmt       // Function definitions
   | propertyGetStmt    // Property Get
   | propertySetStmt    // Property Set
   | propertyLetStmt    // Property Let
   | subStmt            // Subroutine definitions
   | typeStmt           // User-defined types
   | macroIfThenElseStmt  // Conditional compilation
   ;

Statement Types

Control Flow Statements

If-Then-Else

ifThenElseStmt
   : IF WS ifConditionStmt WS THEN WS blockStmt 
     (WS ELSE WS blockStmt)?                      // Single-line form
   | ifBlockStmt ifElseIfBlockStmt* ifElseBlockStmt? 
     END_IF                                       // Block form
   ;

Select Case

selectCaseStmt
   : SELECT WS CASE WS valueStmt NEWLINE+ 
     sC_Case* 
     END_SELECT
   ;

sC_Case
   : CASE WS sC_Cond NEWLINE+ (block NEWLINE+)?
   ;

sC_Cond
   : ELSE                                         // Case Else
   | sC_CondExpr (WS? COMMA WS? sC_CondExpr)*
   ;

Loops

// Do...Loop variants
doLoopStmt
   : DO NEWLINE+ (block NEWLINE+)? LOOP
   | DO WS (WHILE | UNTIL) WS valueStmt NEWLINE+ 
     (block NEWLINE+)? LOOP
   | DO NEWLINE+ (block NEWLINE+) 
     LOOP WS (WHILE | UNTIL) WS valueStmt
   ;

// For...Next
forNextStmt
   : FOR WS iCS_S_VariableOrProcedureCall typeHint? 
     (WS asTypeClause)? WS? EQ WS? valueStmt 
     WS TO WS valueStmt (WS STEP WS valueStmt)? NEWLINE+ 
     (block NEWLINE+)? 
     NEXT (WS ambiguousIdentifier typeHint?)?
   ;

// For Each...Next
forEachStmt
   : FOR WS EACH WS ambiguousIdentifier typeHint? 
     WS IN WS valueStmt NEWLINE+ 
     (block NEWLINE+)? 
     NEXT (WS ambiguousIdentifier)?
   ;

File Operations

Open Statement

openStmt
   : OPEN WS valueStmt WS FOR WS 
     (APPEND | BINARY | INPUT | OUTPUT | RANDOM) 
     (WS ACCESS WS (READ | WRITE | READ_WRITE))? 
     (WS (SHARED | LOCK_READ | LOCK_WRITE | LOCK_READ_WRITE))? 
     WS AS WS valueStmt 
     (WS LEN WS? EQ WS? valueStmt)?
   ;
💡 Design Note: The Open statement grammar captures all the complexity of VB6's file I/O modes, access types, and locking mechanisms in a single comprehensive rule.

Variable Operations

Variable Declaration

variableStmt
   : (DIM | STATIC | visibility) WS 
     (WITHEVENTS WS)? variableListStmt
   ;

variableSubStmt
   : ambiguousIdentifier typeHint? 
     (WS? LPAREN WS? (subscripts WS?)? RPAREN WS?)? 
     (WS asTypeClause)?
   ;

Let/Set Statements

letStmt
   : (LET WS)? implicitCallStmt_InStmt WS? 
     (EQ | PLUS_EQ | MINUS_EQ) WS? valueStmt
   ;

setStmt
   : SET WS implicitCallStmt_InStmt WS? EQ WS? valueStmt
   ;

Error Handling

onErrorStmt
   : (ON_ERROR | ON_LOCAL_ERROR) WS 
     (GOTO WS valueStmt COLON? | RESUME WS NEXT)
   ;

errorStmt
   : ERROR WS valueStmt
   ;

resumeStmt
   : RESUME (WS (NEXT | ambiguousIdentifier))?
   ;

Expressions

The grammar includes comprehensive expression parsing with operator precedence:

valueStmt
   : literal                          // Literals
   | implicitCallStmt_InStmt         // Function/variable references
   | LPAREN WS? valueStmt WS? RPAREN // Parenthesized expressions
   | NEW WS valueStmt                // Object instantiation
   | valueStmt WS? POW WS? valueStmt // Exponentiation
   | MINUS WS? valueStmt             // Unary minus
   | PLUS WS? valueStmt              // Unary plus
   | valueStmt WS? MULT WS? valueStmt  // Multiplication
   | valueStmt WS? DIV WS? valueStmt   // Division
   | valueStmt WS? INTDIV WS? valueStmt // Integer division
   | valueStmt WS? MOD WS? valueStmt   // Modulo
   | valueStmt WS? PLUS WS? valueStmt  // Addition
   | valueStmt WS? MINUS WS? valueStmt // Subtraction
   | valueStmt WS? AMPERSAND WS? valueStmt // String concatenation
   | valueStmt WS? EQ WS? valueStmt    // Equality
   | valueStmt WS? NEQ WS? valueStmt   // Inequality
   | valueStmt WS? LT WS? valueStmt    // Less than
   | valueStmt WS? GT WS? valueStmt    // Greater than
   | valueStmt WS? LEQ WS? valueStmt   // Less than or equal
   | valueStmt WS? GEQ WS? valueStmt   // Greater than or equal
   | valueStmt WS? LIKE WS? valueStmt  // Pattern matching
   | valueStmt WS? IS WS? valueStmt    // Object comparison
   | NOT WS? valueStmt                 // Logical NOT
   | valueStmt WS? AND WS? valueStmt   // Logical AND
   | valueStmt WS? OR WS? valueStmt    // Logical OR
   | valueStmt WS? XOR WS? valueStmt   // Logical XOR
   | valueStmt WS? EQV WS? valueStmt   // Logical equivalence
   | valueStmt WS? IMP WS? valueStmt   // Logical implication
   ;

Lexer Rules

The grammar defines lexer rules for VB6 tokens including keywords, operators, and literals.

Keywords

The grammar recognizes all VB6 keywords including data types, control flow keywords, file operation keywords, and visibility modifiers:

DIM
PUBLIC
PRIVATE
STATIC
CONST
IF
THEN
ELSE
ELSEIF
END
FOR
NEXT
DO
LOOP
WHILE
UNTIL
SELECT
CASE
FUNCTION
SUB
PROPERTY
GET
SET
LET

Literals

literal
   : COLORLITERAL      // Color literals (&H00FF00&)
   | DATELITERAL       // Date literals (#1/1/2000#)
   | DOUBLELITERAL     // Double-precision floats
   | FILENUMBER        // File numbers (#1)
   | INTEGERLITERAL    // Integers
   | STRINGLITERAL     // String literals
   | TRUE              // Boolean True
   | FALSE             // Boolean False
   | NOTHING           // Nothing keyword
   | NULL              // Null keyword
   ;

Type Hints

VB6 supports single-character type declaration suffixes:

  • % - Integer
  • & - Long
  • ! - Single
  • # - Double
  • @ - Currency
  • $ - String

Usage Notes

⚠️ Important: This ANTLR4 grammar is provided for reference and comparative analysis. VB6Parse does not use ANTLR4 but instead implements a custom parser in Rust for better performance and control over the parsing process.

Differences from VB6Parse Implementation

Aspect ANTLR4 Grammar VB6Parse
Parser Generator ANTLR4 (Java-based) Custom hand-written parser (Rust)
Parse Tree AST (Abstract Syntax Tree) CST (Concrete Syntax Tree) via rowan
Whitespace Explicit WS tokens in grammar Preserved in CST automatically
Error Recovery ANTLR4 built-in recovery Custom error handling with ParseResult
Performance JVM overhead Native Rust performance

Why Not Use ANTLR4?

VB6Parse uses a custom parser implementation for several reasons:

  • Rust: Native Rust API instead of a Java API
  • Memory Efficiency: Fine-grained control over allocations
  • CST Preservation: Full source fidelity including whitespace and comments
  • Error Recovery: Custom error handling tailored to VB6 parsing needs
  • Integration: Seamless integration with Rust ecosystem
  • Incremental Parsing: Potential for future incremental reparsing optimizations

Grammar Reference Value

Despite not being used directly, this ANTLR4 grammar specification is valuable for:

  • Understanding the complete VB6 language syntax
  • Cross-referencing VB6Parse implementation against a formal specification
  • Identifying edge cases and language features that need testing
  • Serving as documentation for VB6 language constructs
  • Comparative analysis between different parsing approaches

Further Reading