Let’s Write a Lexer in PlayBASIC

October 12, 2025

 

Logo

Introduction

Welcome back, PlayBASIC coders!

In this live session, I set out to build something every programming language and tool needs — a lexer (or lexical scanner). If you’ve never written one before, don’t worry — this guide walks through the whole process step by step.

A lexer’s job is simple: it scans through a piece of text and classifies groups of characters into meaningful types — things like words, numbers, and whitespace. These little building blocks are called tokens, and they form the foundation for everything that comes next in a compiler or interpreter.

So, let’s dive in and build one from scratch in PlayBASIC.


Starting with a Simple String

We begin with a test string — just a small bit of text containing words, spaces, and a number:

s$ = "   1212123323      This is a message number"
Print s$

This gives us something to analyze. The plan is to loop through this string character by character, figure out what each character represents, and then group similar characters together.

In PlayBASIC, strings are 1-indexed, which means the first character is at position 1 (not 0 like in some other languages). So our loop will run from 1 to the length of the string.


Stepping Through Characters

The core of our lexer is a simple `For/Next` loop that moves through each character:

For lp = 1 To Len(s$)
    ThisCHR = Mid(s$, lp)
Next

At this stage, we’re just reading characters — no classification yet.

The next question is: how do we know what type of character we’re looking at?


Detecting Alphabetical Characters

We start by figuring out if a character is alphabetical. The simplest way is by comparing ASCII values:

If ThisCHR >= Asc("A") And ThisCHR <= Asc("Z")
    ; Uppercase
EndIf

If ThisCHR >= Asc("a") And ThisCHR <= Asc("z")
    ; Lowercase
EndIf

That works, but it’s messy to write out in full every time. So let’s clean it up by rolling it into a helper function:

Function IsAlphaCHR(ThisCHR)
    State = (ThisCHR >= Asc("a") And ThisCHR <= Asc("z")) Or _
            (ThisCHR >= Asc("A") And ThisCHR <= Asc("Z"))
EndFunction State

Now we can simply check:

If IsAlphaCHR(ThisCHR)
    Print Chr$(ThisCHR)
EndIf

That already gives us all the letters from our string — but one at a time.

To make it more useful, we’ll start grouping consecutive letters into words.


Grouping Characters into Words

Instead of reacting to each character individually, we look ahead to find where a run of letters ends. This is done with a nested loop:

If IsAlphaCHR(ThisCHR)
    For ChrLP = lp To Len(s$)
        If Not IsAlphaCHR(Mid(s$, ChrLP)) Then Exit
        EndPOS = ChrLP
    Next
    ThisWord$ = Mid$(s$, lp, (EndPOS - lp) + 1)
    Print "Word: " + ThisWord$
    lp = EndPOS
EndIf

Now our lexer can detect whole words — groups of letters treated as a single unit.

That’s the first real step toward tokenization.


Detecting Whitespace

The next type of token is whitespace — spaces and tabs.

We’ll build another helper function:

Function IsWhiteSpace(ThisCHR)
    State = (ThisCHR = Asc(" ")) Or (ThisCHR = 9)
EndFunction State

Then use the same nested-loop pattern:

If IsWhiteSpace(ThisCHR)
    For ChrLP = lp To Len(s$)
        If Not IsWhiteSpace(Mid(s$, ChrLP)) Then Exit
        EndPOS = ChrLP
    Next
    WhiteSpace$ = Mid$(s$, lp, (EndPOS - lp) + 1)
    Print "White Space: " + Str$(Len(WhiteSpace$))
    lp = EndPOS
EndIf

Now we can clearly see which parts of the string are spaces and how many characters each whitespace block contains.


Detecting Numbers

Finally, let’s detect numeric characters using another helper:

Function IsNumericCHR(ThisCHR)
    State = (ThisCHR >= Asc("0")) And (ThisCHR <= Asc("9"))
EndFunction State

And apply it just like before:

If IsNumericCHR(ThisCHR)
    For ChrLP = lp To Len(s$)
        If Not IsNumericCHR(Mid(s$, ChrLP)) Then Exit
        EndPOS = ChrLP
    Next
    Number$ = Mid$(s$, lp, (EndPOS - lp) + 1)
    Print "Number: " + Number$
    lp = EndPOS
EndIf

Now we can identify three types of tokens:

Words (alphabetical groups)

Whitespace (spaces and tabs)

Numbers (digits)


Defining a Token Structure

Up to this point, our program just prints what it finds.

Let’s store these tokens properly by defining a typed array.

Type tToken
    TokenType
    Value$
    Position
EndType
Dim Tokens(1000) As tToken

We’ll also define some constants for readability:

Constant TokenTYPE_WORD        = 1
Constant TokenTYPE_NUMERIC     = 2
Constant TokenTYPE_WHITESPACE  = 4

As we detect tokens, we add them to the array:

Tokens(TokenCount).TokenType = TokenTYPE_WORD
Tokens(TokenCount).Value$    = ThisWord$
TokenCount++

Do the same for whitespace and numbers, and our lexer now builds a real list of tokens as it runs.


Displaying Tokens by Type

To visualize the result, we can print each token in a different colour:

For lp = 0 To TokenCount - 1
    Select Tokens(lp).TokenType
        Case TokenTYPE_WORD:       c = $00FF00 ; green
        Case TokenTYPE_NUMERIC:    c = $0000FF ; blue
        Case TokenTYPE_WHITESPACE: c = $000000 ; black
        Default:                   c = $FF0000
    EndSelect

    Ink c
    Print Tokens(lp).Value$
Next

When we run this version, we see numbers printed in blue, words in green, and whitespace appearing as black gaps — exactly how a simple syntax highlighter or compiler front-end might visualize tokenized text.


Wrapping Up

And that’s it — our first lexer!

It reads through a line of text, classifies what it finds, and records each token type for later use.

The same process underpins many systems:

Compilers use it as the first step in parsing code.

Adventure games might use it to process typed player commands.

Expression evaluators or script interpreters rely on it to break down formulas and logic.

The big takeaway? A lexer doesn’t have to be complicated.

This simple approach — scanning text, detecting groups, and tagging them — is the heart of it. Once you understand that, you can expand it to handle symbols, punctuation, operators, and beyond.

If you’d like to see more about extending this lexer or turning it into a parser, let me know in the comments — or check out the full live session on YouTube.

Links:

  • PlayBASIC,com
  • Learn to basic game programming (on Amazon)
  • Learn to code for beginners (on Amazon)




  • Taming Memory in PlayBasic with the AMA Library

    August 11, 2025

     

    Logo

    Taming Memory in PlayBasic with the AMA Library

    When you’re writing games or tools in PlayBasic, performance isn’t just about the flashy stuff you see on screen. Behind the scenes, the way you manage memory can make or break your frame rate — and your sanity.

    That’s where my Array Memory Allocation (AMA) library comes in. It’s a home-grown system that manages all your allocations inside a single, giant array. Think of it like having a huge storage unit that you divide into smaller lockers for your stuff, instead of renting a new storage unit every time you buy a box of cables.


    The Problem with Dynamic Memory

    PlayBasic, like most high-level languages, can allocate arrays and memory chunks on the fly. That’s fine for occasional use, but when you’re doing hundreds or thousands of small allocations in a game loop, it can become painfully slow.

    The original inspiration for AMA came from some old DarkBasic code I wrote years ago. It worked, but it had some ugly performance quirks — I’m talking seconds-long delays for just a few hundred allocations. Not great when you’re trying to keep your game running at 60 FPS.


    The AMA Approach

    The AMA library flips the normal approach on its head:

  • One Big Array - Instead of lots of little allocations, everything lives inside a single giant array.
  • Chunk Management – The big array is treated like a heap of variable-sized blocks.
  • Minimal Shuffling – When you free memory, the space is just marked as available. If things get too fragmented, a defrag routine tidies it up.
  • This lets AMA skip the expensive “create a new array” step over and over, because the big array already exists — we’re just reassigning parts of it.

    Logo


    Why AMA still matters (even in PlayBASIC)

    You’re right that PlayBASIC supports pointers. That said, AMA remains useful for several reasons:

  • Cross-dialect portability: The AMA pattern is directly applicable to BASIC dialects that don’t support pointers, array-passing, or dynamic array creation. The article’s goal is to share ideas usable across those environments.
  • Shared container - serialization: A single heap-like container makes it easy to share, snapshot, or serialize many small data blocks as one contiguous structure.
  • Deterministic behavior and profiling: A manual allocator gives predictable allocation behavior and makes fragmentation/debug visualization simpler.
  • Centralized debug & visualization: Heatmaps, allocation stats, and defrag animations are naturally easier when all data lives in one array.
  • Performance guarantees: Even with pointer support, avoiding repeated allocations and deallocations (and garbage / VM overhead if present) can be a win — especially on constrained runtimes.

  • Seeing It in Action

    I’ve built in a color-coded heatmap so you can literally see what the allocator is doing:

  • Green = Free space
  • White = Large free chunks
  • Other colors = Allocated blocks
  • When you watch it run, you can see allocations, frees, and defrags happening in real time at 20 FPS — even with 2,000 allocations and 66MB of data in pure PlayBasic code.


    The Performance Payoff

    In testing, AMA crushed the old brute-force method:

  • Old method – ~25 seconds for 1,000 allocations (ouch)
  • AMA method – Real-time allocation & defrag without breaking a sweat
  • The magic here is using a sorted list for quick free-space lookups and only moving data when absolutely necessary. That combination delivers a big net gain without overcomplicating things.


    Next Steps

    I’m looking at squeezing even more speed out of the library by improving the copy routines — unrolling loops, copying larger words/blocks, or generating specialized copy code where beneficial. Every little gain adds up when you’re chasing performance.

    Final Thought: Memory management might not be as flashy as a new shader or sprite effect, but when your game runs smoothly, you’ll be glad you gave it some love.


    Is XOR Decryption in PlayBASIC as Fast as Assembly?

    July 07, 2025

     

    Logo

    🔍 Is XOR Decryption in PlayBASIC as Fast as Assembly?

    Every now and then, a forum question pops up that really catches my attention — and this one did just that. A PlayBASIC user recently asked:

    > "Is using XOR decryption when loading media from memory in PlayBASIC as fast as doing it in assembly?"

    At first, I was a little puzzled. Why? Because the function in question is written in assembly — it's already doing exactly what the user thought might be a separate optimization path. So, let's unpack what's really going on behind the scenes when you XOR encrypted media in memory using PlayBASIC.


    🔐 XOR Media Loading: A Quick Recap

    Years ago, PlayBASIC added support for loading media directly from memory. Earlier versions relied on external packer tools to encrypt and wrap media, but these days, you can load and decode encrypted content entirely from within your program.

    The basic workflow is:

    1. 1. Load your file into memory.
    2. 2. Call the `XORMemory` function with a key.
    3. 3. The content is decrypted and ready to use.

    You can use any XOR key you like. While XOR encryption is relatively simple and easily reversible, it’s still useful for basic protection against casual asset ripping.


    🧠 What Happens Internally?

    When you call `XORMemory`, PlayBASIC doesn’t interpret the data — it pushes the work down to the engine’s internal rendering system. Specifically, it uses the XOR ink mode inside the `Box` drawing function.

    This function writes color data onto a surface by XOR’ing it with the existing pixels. Here’s what makes it cool: that surface isn’t necessarily a visible screen — it's just treated as raw memory.

    To decrypt, the engine:

  • Creates a temporary 32-bit image buffer (must be 32-bit to handle raw data correctly).
  • Loads the encrypted file data into that buffer.
  • Applies the XOR key using the `Box` command in XOR mode.
  • Copies the result back to memory.
  • That’s it.


    💥 But Is It Fast?

    Yes. Very fast — because under the hood, this process is powered by raw MMX assembly.

    When the engine detects MMX support, it uses MMX instructions to process 64 bits (two 32-bit pixels) at a time:

  • Data is loaded into MMX registers.
  • XOR is performed at the hardware level.
  • Results are written back immediately.
  • Here’s the inner loop in plain terms:

  • Load two pixels from memory.
  • Load XOR key into a register.
  • XOR them.
  • Write them back.
  • Repeat in a tight loop.
  • We’re talking near cycle-per-pixel speeds here — hardware-level performance. If MMX isn't available, it gracefully falls back to optimized C code. Either way, you're getting a performance-optimized routine.


    🕰 Legacy Notes

    Older machines or systems using 16-bit display modes may encounter issues unless you force a 32-bit surface. That’s why the engine explicitly creates a 32-bit buffer in the decoding routine — it ensures consistent behavior across different environments.

    Also worth noting: drawing directly to the screen (especially in older systems where the screen buffer lives in VRAM) would be very slow due to the read/write overhead. But modern systems (e.g., Windows 10/11) emulate these surfaces in system memory, allowing direct blending without penalty.


    ✅ Final Thoughts

    So, to answer the original question:

    Yes — XOR decryption in PlayBASIC is as fast as it can be. It’s literally done in machine code.

    This is just one example of how PlayBASIC leans on low-level optimizations to make higher-level features accessible and fast. You get the convenience of a BASIC command, but the performance of assembly behind the scenes.


    Got more technical questions?

    Join the conversation on the forums, or check out the help files for more info about ink modes, memory banks, and low-level drawing operations.


    Tags:

    `#PlayBASIC` `#GameDev` `#Encryption` `#Assembly` `#MMX` `#XOR` `#RetroCoding` `#Performance`