hexo-sam-reader: Building a SAM Text-to-Speech Plugin for Hexo
Intro
Motivation
I wanted to use a Text to speech to read these blog posts aloud, the only condition being that, it would have low-latency and locally compute all tokens. The reasoning behind this? because of the fact that I do not own a server! They are extremely expensive.
A very wise person once said, when resources for modern solutions become infeasibly expensive to you, rely on old solutions. Who said that? I did, but that part does not matter.
Thus, SAM came to mind: the Software Automatic Mouth[1] is a speech synthesiser originally written in 1982 by Mark Barton of Don’t Ask Software for the Commodore 64. Christian Schiffler (@discordier) ported SAM to JavaScript as SamJs[2], preserving the original’s distinctive robotic phoneme engine in a looooong ca. 3000-line browser-compatible library. SAM does not stream audio from a server! It runs entirely in the browser, synthesising speech from raw text via the Web Audio API[3]. Therefore there exist no latency beyond local computation. Helpful considering the new state of my internet after moving in! (It is awful)
The product is hexo-sam-reader, which is a Hexo plugin that adds a SAM-powered TTS reader widget to any blog post, made for Hexo. This document traces the full development from a hardcoded prototype to a published npm package. In the future I plan to have a smoother sounding voice, but I would like to talk about SAM first!
Scope of This Document
This post covers the evolution of hexo-sam-reader across two distinct phases: the sam-v1 prototype (a looong Hexo helper script) and the published hexo-sam-reader package (a npm plugin for any hexo user to use). It examines the text processing pipeline, the playback engine, the configuration refactor that removed all hardcoding, and the documentation process.
I: The sam-v1 Prototype
The Helper
The project began as a single file: sam-reader.js, ca. 600 lines, dropped directly into the Hexo blog’s scripts/ directory. It registered itself as a Hexo EJS helper using hexo.extend.helper.register() and returned an HTML string containing the widget markup, inline CSS, and the entire client-side JavaScript.
The registration was:
1 | hexo.extend.helper.register("sam_reader", function () { |
This is the Hexo helper pattern: a function that receives the current page context via this.page and returns raw HTML. The conditional if (!page.sam) return '' ensures the widget only renders on posts with sam: true in their front matter.
Hardcoded Values
Every configurable value was baked into the source. The colour scheme was embedded in CSS literal strings:
1 | .sam-reader-widget { |
The voice parameters were fixed in the HTML markup:
1 | <label |
The abbreviation list was hardcoded inside the cleanForSam() function as a sequence of individual regex replacements:
1 | text = text.replace(/\bNL\b/g, "Netherlands"); |
Eighteen abbreviations, each a separate text.replace() call. Adding a new one meant editing the helper source. The content selector was hardcoded as .mypage. The SAM library path was hardcoded as /js/sam.js. The pause duration was var PAUSE_MS = 400. The chunk length limit was 200.
This worked for my blog specifically. However, could it could work for anyone else’s? No.
How the Prototype Functioned
Despite the hardcoding, the prototype established the complete processing pipeline that survived into the published package:
- Conditional rendering: Check
page.samin front matter. - Widget injection: Return inline HTML with controls (Play, Pause, Stop), a progress bar, voice setting sliders, and a status indicator.
- Async SAM loading: Dynamically create a
<script>tag pointing tosam.js, initialise on load. - Text extraction: Clone the post’s DOM subtree, remove non-readable elements, walk the tree to produce segments.
- Text cleaning: Apply regex transformations to produce SAM-compatible ASCII text.
- Chunking: Split cleaned text into chunks of at most 200 characters, respecting sentence and comma boundaries.
- Playback: Feed each chunk to
SamJs, convert to a Web Audio buffer, play sequentially with pauses between sections.
The prototype was functional. The question was whether it could become distributable.
Phase II: Package Initialisation
The npm Structure
At 10:51 on 2026-04-02, the first commit (0e7d294) created the package structure:
1 | hexo-sam-reader/ |
The package.json declared the package with standard npm metadata:
1 | { |
The files array is critical for npm distribution: it declares which files are included when the package is installed. Without it, npm includes everything, potentially shipping development artifacts. With it, the installed package contains only index.js, the lib/ directory, and the assets/ directory (which holds the bundled sam.js library).
Splitting the Monolith
The prototype’s single file was decomposed into three modules with distinct responsibilities.
index.js: Registration and Configuration
The entry point handles Hexo plugin registration. Hexo automatically loads index.js from any package in node_modules/ whose name starts with hexo-:
1 | /* global hexo */ |
Two Object.assign() calls perform the configuration merge. The first merges the user’s sam_reader block from _config.yml over the plugin defaults. The second merges the user’s style sub-object over the default colour scheme. The merge order — defaults first, user config second — ensures user values override defaults while preserving any defaults the user did not specify.
lib/generator.js: Virtual Asset Serving
The generator module solves a distribution problem. In the prototype, sam.js had to be manually placed in the blog’s source/js/ directory. The generator eliminates this by serving the bundled sam.js as a virtual Hexo asset:
1 | ; |
This is Hexo’s generator API: return an object with a path (the URL path) and a data function (the content). The data function returns a readable stream rather than loading the entire 3,276-line file into memory. When Hexo builds the site, the generator creates a virtual file at /js/hexo-sam-reader/sam.js (by default) without the user copying anything.
lib/helper.js: The Widget Renderer
The helper module contains the widget HTML, CSS, and client-side JavaScript. It was extracted from the prototype with one structural change: instead of registering itself, it exports a factory function that receives the hexo instance and returns the helper function:
1 | ; |
The closure over hexo gives the helper access to hexo.config.sam_reader without global state. The front matter key is now configurable: instead of checking page.sam, it checks page[config.front_matter_key], allowing users to use any key they prefer.
Phase III: The Configuration Refactor
The Architectural Shift
Commit cfc40c2 (“feat: remove hardcoding from the library, push to _config.yml instead”) was the most significant change in the project’s history. Executed fifty minutes after the initial commit, it transformed hexo-sam-reader from a personal script into a distributable plugin.
The core insight was that every value a user might want to change should live in _config.yml, not in source code. This applied to thirteen categories of configuration:
| Category | Prototype | Package |
|---|---|---|
| Front matter key | Hardcoded sam |
config.front_matter_key |
| Content selector | Hardcoded .mypage |
config.content_selector |
| Asset path | Hardcoded /js/sam.js |
config.asset_path |
| Voice speed | Hardcoded 72 |
config.speed |
| Voice pitch | Hardcoded 64 |
config.pitch |
| Voice mouth | Hardcoded 128 |
config.mouth |
| Voice throat | Hardcoded 128 |
config.throat |
| Pause duration | Hardcoded 400 |
config.pause_ms |
| Chunk max length | Hardcoded 200 |
config.chunk_max_length |
| Abbreviations | 18 hardcoded regexes | config.abbreviations |
| Skip selectors | Hardcoded list | config.skip_selectors |
| Widget colours | 13 hardcoded hex values | config.style.* |
| Font family | Hardcoded string | config.style.font_family |
Abbreviation Handling: Case Sensitivity
The most nuanced part of the refactor was the abbreviation system. In the prototype, each abbreviation was a separate text.replace() call with manually chosen flags (/g for case-sensitive, /gi for case-insensitive). The package needed a general mechanism.
The solution uses the casing of the abbreviation key to determine match behaviour:
1 | var keys = Object.keys(ABBREVIATIONS); |
If the key is entirely uppercase (SSH, CLI, API), the regex uses the g flag only — case-sensitive matching. If the key contains any lowercase character (LaTeX, CMake, libssh), the regex uses gi — case-insensitive matching. The rationale: an all-caps acronym like SSH should not match ssh in a URL path or code snippet where the casing is meaningful, but a mixed-case term like LaTeX should match regardless of how the author capitalised it.
The key is also escaped with a regex-safe replacement before being compiled into a RegExp constructor, preventing injection of regex metacharacters through the configuration.
Configuration in Practice
On my blog, the _config.yml block declares thirty abbreviations, thirteen style properties, and five voice parameters:
1 | sam_reader: |
A user with a different theme would only need to change the selectors and colours. A user writing in a domain with different acronyms would only need to change the abbreviations. The voice parameters remain adjustable via sliders at runtime; the _config.yml values set the initial positions.
The Trade-Off: Batteries Not Included
The prototype shipped with eighteen hardcoded abbreviations: every acronym I used in my posts. The published package ships with an empty abbreviation map. This was a deliberate choice. Distributing my personal abbreviation list as defaults would cause incorrect speech for users who do not write about ICPC or use FTXUI. An empty default forces users to declare their own terms, which is the correct behaviour for a general-purpose plugin.
The same logic applies to the content selector. The prototype hardcoded .mypage, which is specific to the theme I use. The package defaults to .mypage but documents the override for other themes (e.g., .e-content for the default Landscape theme). The default is a suggestion, not an assumption.
Phase IV: The Text Processing Pipeline
Overview
The text processing pipeline converts a rendered HTML blog post into an array of SAM-compatible text chunks. It operates in four stages:
- Extraction (
getPostSegments()): DOM walking to produce raw text segments with pause markers. - Cleaning (
cleanForSam()): Regex transformations to produce ASCII text SAM can pronounce. - Table reading (
readTable()): Structured extraction of tabular data with column headers. - Chunking (
buildChunks()): Splitting cleaned text into chunks within SAM’s character limit.
Text Extraction: Walking the DOM
The extraction function clones the post’s content container, removes non-readable elements, and walks the DOM tree to produce an array of segments. Each segment is either a string (speakable text) or null (a pause marker):
1 | function getPostSegments() { |
Several design decisions are embedded here. The clone operation prevents the extraction from modifying the visible page. Code blocks (pre, .highlight) are excluded because SAM cannot meaningfully pronounce source code. Headings receive pause markers on both sides, creating audible section breaks. Block-level elements receive trailing pauses, producing natural breathing points in the speech.
The skip selector list is extensible via EXTRA_SKIP (from config.skip_selectors), allowing users to exclude custom widgets, advertisement banners, or other non-content elements without modifying the plugin source.
Text Cleaning: The Regex Pipeline
The cleanForSam() function applies over forty regex transformations in a specific order. The ordering matters: operator replacements must occur before angle bracket stripping, and abbreviation replacements must occur before non-ASCII removal.
Stage 1: Remove unparseable content.
1 | // URLs |
Stage 2: Replace operators with spoken equivalents.
1 | text = text.replace(/>=/g, " greater than or equal to "); |
The lookahead and lookbehind assertions on > and < prevent false positives inside HTML tags. Without them, a stray <div> remnant would become “less than div greater than”.
Stage 3: Handle slashes contextually.
1 | text = text.replace(/ \/ /g, " OR "); // " / " → "OR" |
A spaced slash (/) typically indicates alternatives (“Linux / macOS”) and reads naturally as “or”. An unspaced slash (“TCP/IP”) is a literal separator and reads as “slash”.
Stage 4: Strip code artifacts and apply abbreviations.
1 | text = text.replace(/[{}()\[\]<>]/g, " "); |
After stripping, the configured abbreviations are applied using the case-sensitive logic described in Phase III.
Stage 5: Remove non-ASCII characters.
1 | text = text.replace(/[\x80-\xFF]/g, " "); |
SAM was designed for 7-bit ASCII English. Any character outside the printable ASCII range (0x20–0x7E) would produce either garbage phonemes or a synthesis error. The pipeline aggressively strips everything SAM cannot handle.
Table Reading: Row-Wise with Headers
Tables require special handling. Naive text extraction of a table produces an incoherent sequence of cell values without context. The readTable() function reads tables row-wise, prepending each cell’s value with its column header:
1 | function readTable(table) { |
Given this table:
| Language | Year |
|---|---|
| C | 1972 |
| Python | 1991 |
The function produces: “Language C, Year 1972. Language Python, Year 1991.” This is intelligible when spoken aloud, unlike the flat extraction “C 1972 Python 1991” which loses all structure.
Chunking: Sentence and Comma Boundaries
SAM has practical limits on input length. The buildChunks() function splits text into chunks of at most CHUNK_MAX characters (default 200), respecting natural language boundaries:
1 | function buildChunks(segments, maxLen) { |
The two-pass approach is deliberate. The first pass splits on sentence endings (., !, ? followed by whitespace, using a lookbehind assertion). If a single sentence still exceeds the character limit, the second pass splits on commas. This produces chunks that align with natural speech prosody: SAM pauses between chunks, and those pauses coincide with where a human would pause.
Consecutive null pause markers are compressed to a single pause, preventing excessive silence from nested block elements.
Phase V: The Playback Engine
Web Audio API Integration
The playback engine converts text chunks into audible speech using the Web Audio API. For each chunk, it instantiates a new SamJs object with the current slider values, generates a Float32Array of audio samples, and plays them through an AudioBufferSourceNode:
1 | function playChunk(index) { |
The function is recursive: each chunk’s onended callback invokes playChunk(index + 1). This creates a sequential playback chain without blocking the main thread. The AudioContext is created lazily on first play, complying with browser autoplay policies that require audio contexts to be initialised from user gesture handlers.
The sample rate of 22050 Hz matches SAM’s native output frequency. The buf32() method returns a Float32Array of PCM samples that map directly into a single-channel AudioBuffer.
The State Machine
Playback state is managed through four boolean flags (playing, paused, stopped) and two resource references (currentSource, pauseTimer):
1 | function setButtons(state) { |
The state transitions are:
- Ready → Playing: User clicks Play. Segments are extracted, cleaned, chunked.
playChunk(0)starts the chain. - Playing → Paused: User clicks Pause.
stopAudio()halts the current chunk.currentChunkis preserved. Clicking Play resumes from the same index. - Playing → Stopped: User clicks Stop.
stopped = truecausesplayChunk()to exit on its next invocation. Progress resets to zero. - Paused → Playing: User clicks Play.
paused = false, thenplayChunk(currentChunk)resumes. - Playing → Done: The chunk index exceeds
chunks.length. Progress bar fills to 100%.
Voice settings (Speed, Pitch, Mouth, Throat) are read from the sliders at the moment each chunk is synthesised, not when playback starts. This means adjusting a slider mid-playback takes effect on the next chunk. To hear the change immediately, the user can pause and resume.
Progress Tracking
Progress is tracked by counting speakable chunks (non-null entries) rather than all entries:
1 | function countSpeakable() { |
This prevents pauses from inflating the total count and making the progress bar advance non-uniformly. A post with many headings (and therefore many pause markers) reports the same total as a post of equivalent text length without headings.
Phase VI: Documentation and Publication
The README
Commit d6ed17b (1:14 PM) added a 224-line README covering installation, usage, configuration, architecture, and credits. The documentation was written with a specific goal: a user should be able to install and configure the plugin without reading the source code.
The “Inside out” section described the three-component architecture:
- Generator (
lib/generator.js) serves the bundledsam.jslibrary as a virtual Hexo asset. - Helper (
lib/helper.js) renders the widget HTML/CSS/JS when<%- sam_reader() %>is called in a template. - Client-side, the widget extracts text, cleans it, chunks it, and plays it via the Web Audio API.
The README included a configuration block with every option documented, a table of text cleaning transformations, examples of abbreviation configuration with the case-sensitivity rules explained, and a styling example showing how to create a green-themed widget with only five overrides.
Version History
The project went through three versions in a single day:
- v1.0.0 (10:51 AM): Initial commit. Functional but hardcoded.
- v1.0.1 (11:44 AM): Configuration refactor. All hardcoding removed. Abbreviation system generalised. Fifty minutes of development.
- v1.0.2 (7:25 PM): Documentation complete. README published. pnpm support added. Attribution comments added. Live demo link included.
The version numbering follows semver: the jump from 1.0.0 to 1.0.1 marked a backwards-compatible feature addition (configuration support). The jump to 1.0.2 marked documentation and metadata patches.
Commit History
The full ten-commit history tells the development story concisely:
1 | 0e7d294 init: desire to publish package |
The pattern is typical of a rapid development sprint: initial implementation, immediate typo cleanup, the substantial feature commit, a version bump, documentation in bulk, then small fixes and a final version bump.
Architecture Summary
The complete data flow from Hexo build to audible speech:
1 | hexo generate |
Conclusion
hexo-sam-reader was built in a single day. The prototype existed before that day, but the transformation from a personal script to a distributable package — the modularisation, the configuration framework, the virtual asset serving, the documentation — was completed in ten commits across eight and a half hours.
The technical constraints shaped the design. SAM operates on 7-bit ASCII, so the text pipeline strips everything else. SAM has practical input length limits, so the chunker splits on sentence and comma boundaries. SAM runs synchronously in the browser, so the playback engine chains chunks through onended callbacks to avoid blocking the main thread. The Web Audio API requires user gesture initialisation, so the AudioContext is created lazily on first play.
The configuration refactor was the pivotal decision. A plugin with hardcoded colours and abbreviations is a personal script. A plugin that reads its configuration from _config.yml, merges user overrides with sensible defaults, and lets users specify their own abbreviation maps and colour schemes is a package that other people can use.
The source is available at github.com/trintlermint/hexo-sam-reader. Install with npm install hexo-sam-reader, add sam: true to a post, and call <%- sam_reader() %> in your theme’s sidebar partial.
- 1.SAM (Software Automatic Mouth) was created by Mark Barton of Don't Ask Software in 1982, originally for the Commodore 64. It was one of the first commercially available speech synthesisers for home computers. ↩
- 2.SamJs v0.3.0 by Christian Schiffler (discordier). A JavaScript port of SAM that preserves the original phoneme engine. Source: github.com/discordier/sam. ↩
- 3.The Web Audio API provides the
AudioContext,AudioBuffer, andAudioBufferSourceNodeinterfaces used for PCM playback. SAM outputs audio at 22050 Hz, which is passed directly toAudioContext.createBuffer(). ↩