Let’s Write a Lexer in PlayBASIC

October 12, 2025

 

Logo

Introduction

Welcome back, PlayBASIC coders!

In this live session, I set out to build something every programming language and tool needs — a lexer (or lexical scanner). If you’ve never written one before, don’t worry — this guide walks through the whole process step by step.

A lexer’s job is simple: it scans through a piece of text and classifies groups of characters into meaningful types — things like words, numbers, and whitespace. These little building blocks are called tokens, and they form the foundation for everything that comes next in a compiler or interpreter.

So, let’s dive in and build one from scratch in PlayBASIC.


Starting with a Simple String

We begin with a test string — just a small bit of text containing words, spaces, and a number:

s$ = "   1212123323      This is a message number"
Print s$

This gives us something to analyze. The plan is to loop through this string character by character, figure out what each character represents, and then group similar characters together.

In PlayBASIC, strings are 1-indexed, which means the first character is at position 1 (not 0 like in some other languages). So our loop will run from 1 to the length of the string.


Stepping Through Characters

The core of our lexer is a simple `For/Next` loop that moves through each character:

For lp = 1 To Len(s$)
    ThisCHR = Mid(s$, lp)
Next

At this stage, we’re just reading characters — no classification yet.

The next question is: how do we know what type of character we’re looking at?


Detecting Alphabetical Characters

We start by figuring out if a character is alphabetical. The simplest way is by comparing ASCII values:

If ThisCHR >= Asc("A") And ThisCHR <= Asc("Z")
    ; Uppercase
EndIf

If ThisCHR >= Asc("a") And ThisCHR <= Asc("z")
    ; Lowercase
EndIf

That works, but it’s messy to write out in full every time. So let’s clean it up by rolling it into a helper function:

Function IsAlphaCHR(ThisCHR)
    State = (ThisCHR >= Asc("a") And ThisCHR <= Asc("z")) Or _
            (ThisCHR >= Asc("A") And ThisCHR <= Asc("Z"))
EndFunction State

Now we can simply check:

If IsAlphaCHR(ThisCHR)
    Print Chr$(ThisCHR)
EndIf

That already gives us all the letters from our string — but one at a time.

To make it more useful, we’ll start grouping consecutive letters into words.


Grouping Characters into Words

Instead of reacting to each character individually, we look ahead to find where a run of letters ends. This is done with a nested loop:

If IsAlphaCHR(ThisCHR)
    For ChrLP = lp To Len(s$)
        If Not IsAlphaCHR(Mid(s$, ChrLP)) Then Exit
        EndPOS = ChrLP
    Next
    ThisWord$ = Mid$(s$, lp, (EndPOS - lp) + 1)
    Print "Word: " + ThisWord$
    lp = EndPOS
EndIf

Now our lexer can detect whole words — groups of letters treated as a single unit.

That’s the first real step toward tokenization.


Detecting Whitespace

The next type of token is whitespace — spaces and tabs.

We’ll build another helper function:

Function IsWhiteSpace(ThisCHR)
    State = (ThisCHR = Asc(" ")) Or (ThisCHR = 9)
EndFunction State

Then use the same nested-loop pattern:

If IsWhiteSpace(ThisCHR)
    For ChrLP = lp To Len(s$)
        If Not IsWhiteSpace(Mid(s$, ChrLP)) Then Exit
        EndPOS = ChrLP
    Next
    WhiteSpace$ = Mid$(s$, lp, (EndPOS - lp) + 1)
    Print "White Space: " + Str$(Len(WhiteSpace$))
    lp = EndPOS
EndIf

Now we can clearly see which parts of the string are spaces and how many characters each whitespace block contains.


Detecting Numbers

Finally, let’s detect numeric characters using another helper:

Function IsNumericCHR(ThisCHR)
    State = (ThisCHR >= Asc("0")) And (ThisCHR <= Asc("9"))
EndFunction State

And apply it just like before:

If IsNumericCHR(ThisCHR)
    For ChrLP = lp To Len(s$)
        If Not IsNumericCHR(Mid(s$, ChrLP)) Then Exit
        EndPOS = ChrLP
    Next
    Number$ = Mid$(s$, lp, (EndPOS - lp) + 1)
    Print "Number: " + Number$
    lp = EndPOS
EndIf

Now we can identify three types of tokens:

Words (alphabetical groups)

Whitespace (spaces and tabs)

Numbers (digits)


Defining a Token Structure

Up to this point, our program just prints what it finds.

Let’s store these tokens properly by defining a typed array.

Type tToken
    TokenType
    Value$
    Position
EndType
Dim Tokens(1000) As tToken

We’ll also define some constants for readability:

Constant TokenTYPE_WORD        = 1
Constant TokenTYPE_NUMERIC     = 2
Constant TokenTYPE_WHITESPACE  = 4

As we detect tokens, we add them to the array:

Tokens(TokenCount).TokenType = TokenTYPE_WORD
Tokens(TokenCount).Value$    = ThisWord$
TokenCount++

Do the same for whitespace and numbers, and our lexer now builds a real list of tokens as it runs.


Displaying Tokens by Type

To visualize the result, we can print each token in a different colour:

For lp = 0 To TokenCount - 1
    Select Tokens(lp).TokenType
        Case TokenTYPE_WORD:       c = $00FF00 ; green
        Case TokenTYPE_NUMERIC:    c = $0000FF ; blue
        Case TokenTYPE_WHITESPACE: c = $000000 ; black
        Default:                   c = $FF0000
    EndSelect

    Ink c
    Print Tokens(lp).Value$
Next

When we run this version, we see numbers printed in blue, words in green, and whitespace appearing as black gaps — exactly how a simple syntax highlighter or compiler front-end might visualize tokenized text.


Wrapping Up

And that’s it — our first lexer!

It reads through a line of text, classifies what it finds, and records each token type for later use.

The same process underpins many systems:

Compilers use it as the first step in parsing code.

Adventure games might use it to process typed player commands.

Expression evaluators or script interpreters rely on it to break down formulas and logic.

The big takeaway? A lexer doesn’t have to be complicated.

This simple approach — scanning text, detecting groups, and tagging them — is the heart of it. Once you understand that, you can expand it to handle symbols, punctuation, operators, and beyond.

If you’d like to see more about extending this lexer or turning it into a parser, let me know in the comments — or check out the full live session on YouTube.

Links:

  • PlayBASIC,com
  • Learn to basic game programming (on Amazon)
  • Learn to code for beginners (on Amazon)