hexo-sam-reader is a Hexo plugin that adds a SAM (Software Automatic Mouth) text-to-speech reader widget to blog posts. This document traces the full development of configuring the SAM TTS for my blog posts, and publishing on npm.

Intro

Motivation

I wanted to use a Text to speech to read these blog posts aloud, the only condition being that, it would have low-latency and locally compute all tokens. The reasoning behind this? because of the fact that I do not own a server! They are extremely expensive.

A very wise person once said, when resources for modern solutions become infeasibly expensive to you, rely on old solutions. Who said that? I did, but that part does not matter.

Thus, SAM came to mind: the Software Automatic Mouth^[1] is a speech synthesiser originally written in 1982 by Mark Barton of Don’t Ask Software for the Commodore 64. Christian Schiffler (@discordier) ported SAM to JavaScript as SamJs^[2], preserving the original’s distinctive robotic phoneme engine in a looooong ca. 3000-line browser-compatible library. SAM does not stream audio from a server! It runs entirely in the browser, synthesising speech from raw text via the Web Audio API^[3]. Therefore there exist no latency beyond local computation. Helpful considering the new state of my internet after moving in! (It is awful)

The product is hexo-sam-reader, which is a Hexo plugin that adds a SAM-powered TTS reader widget to any blog post, made for Hexo. This document traces the full development from a hardcoded prototype to a published npm package. In the future I plan to have a smoother sounding voice, but I would like to talk about SAM first!

Scope of This Document

This post covers the evolution of hexo-sam-reader across two distinct phases: the sam-v1 prototype (a looong Hexo helper script) and the published hexo-sam-reader package (a npm plugin for any hexo user to use). It examines the text processing pipeline, the playback engine, the configuration refactor that removed all hardcoding, and the documentation process.

I: The sam-v1 Prototype

The Helper

The project began as a single file: sam-reader.js, ca. 600 lines, dropped directly into the Hexo blog’s scripts/ directory. It registered itself as a Hexo EJS helper using hexo.extend.helper.register() and returned an HTML string containing the widget markup, inline CSS, and the entire client-side JavaScript.

The registration was:

hexo.extend.helper.register("sam_reader", function () {
  var page = this.page;
  if (!page || !page.sam) return "";

  var safeTitle = (page.title || "")
    .replace(/"/g, "&quot;")
    .replace(/</g, "&lt;");

  return `
<div class="meta-widget sam-reader-widget" id="sam-reader" data-post-title="${safeTitle}">
  <!-- 500+ lines of HTML, CSS, and JS -->
</div>`;
});

This is the Hexo helper pattern: a function that receives the current page context via this.page and returns raw HTML. The conditional if (!page.sam) return '' ensures the widget only renders on posts with sam: true in their front matter.

Hardcoded Values

Every configurable value was baked into the source. The colour scheme was embedded in CSS literal strings:

.sam-reader-widget {
  background: #000;
  border: 1px dashed #924a41;
  font-family: DOS, SimHei, Monaco, Menlo, Consolas, "Courier New", monospace;
}
.sam-reader-widget .sam-title {
  color: #c08179;
}
.sam-reader-widget .sam-controls button {
  background: #352b42;
  color: #c08179;
  border: 1px dashed #924a41;
}

The voice parameters were fixed in the HTML markup:

<label
  >Speed <input type="range" id="sam-speed" min="20" max="200" value="72"
/></label>
<label
  >Pitch <input type="range" id="sam-pitch" min="0" max="255" value="64"
/></label>
<label
  >Mouth <input type="range" id="sam-mouth" min="0" max="255" value="128"
/></label>
<label
  >Throat <input type="range" id="sam-throat" min="0" max="255" value="128"
/></label>

The abbreviation list was hardcoded inside the cleanForSam() function as a sequence of individual regex replacements:

text = text.replace(/\bNL\b/g, "Netherlands");
text = text.replace(/\bKIT\b/g, "K I T");
text = text.replace(/\bTUD\b/g, "T U D");
text = text.replace(/\bICPC\b/g, "I C P C");
text = text.replace(/\bTAPC\b/g, "T A P C");
text = text.replace(/\bBAPC\b/g, "B A P C");
text = text.replace(/\bNWERC\b/g, "N W E R C");
text = text.replace(/\bSSH\b/g, "S S H");
text = text.replace(/\bJOSS\b/g, "J O S S");
text = text.replace(/\bCLI\b/g, "C L I");
text = text.replace(/\bAPI\b/g, "A P I");
text = text.replace(/\bLaTeX\b/gi, "lay-tech");
text = text.replace(/\bCMake\b/gi, "see-make");
text = text.replace(/\be\.g\./g, "for example");
text = text.replace(/\bi\.e\./g, "that is");

Eighteen abbreviations, each a separate text.replace() call. Adding a new one meant editing the helper source. The content selector was hardcoded as .mypage. The SAM library path was hardcoded as /js/sam.js. The pause duration was var PAUSE_MS = 400. The chunk length limit was 200.

This worked for my blog specifically. However, could it could work for anyone else’s? No.

How the Prototype Functioned

Despite the hardcoding, the prototype established the complete processing pipeline that survived into the published package:

Conditional rendering: Check page.sam in front matter.
Widget injection: Return inline HTML with controls (Play, Pause, Stop), a progress bar, voice setting sliders, and a status indicator.
Async SAM loading: Dynamically create a <script> tag pointing to sam.js, initialise on load.
Text extraction: Clone the post’s DOM subtree, remove non-readable elements, walk the tree to produce segments.
Text cleaning: Apply regex transformations to produce SAM-compatible ASCII text.
Chunking: Split cleaned text into chunks of at most 200 characters, respecting sentence and comma boundaries.
Playback: Feed each chunk to SamJs, convert to a Web Audio buffer, play sequentially with pauses between sections.

The prototype was functional. The question was whether it could become distributable.

Phase II: Package Initialisation

The npm Structure

At 10:51 on 2026-04-02, the first commit (0e7d294) created the package structure:

hexo-sam-reader/
├── package.json
├── index.js
├── lib/
│   ├── generator.js
│   └── helper.js
└── assets/
    └── sam.js

The package.json declared the package with standard npm metadata:

{
  "name": "hexo-sam-reader",
  "version": "1.0.0",
  "description": "SAM (Software Automatic Mouth) text-to-speech reader widget for Hexo blog posts",
  "main": "index.js",
  "files": ["lib/", "assets/", "index.js"],
  "keywords": [
    "hexo",
    "hexo-plugin",
    "sam",
    "tts",
    "text-to-speech",
    "voice",
    "reader",
    "accessibility"
  ],
  "author": "Niladri Adhikary (trintlermint)",
  "license": "MIT",
  "engines": { "node": ">=14" }
}

The files array is critical for npm distribution: it declares which files are included when the package is installed. Without it, npm includes everything, potentially shipping development artifacts. With it, the installed package contains only index.js, the lib/ directory, and the assets/ directory (which holds the bundled sam.js library).

Splitting the Monolith

The prototype’s single file was decomposed into three modules with distinct responsibilities.

index.js: Registration and Configuration

The entry point handles Hexo plugin registration. Hexo automatically loads index.js from any package in node_modules/ whose name starts with hexo-:

/* global hexo */
"use strict";

const path = require("path");

var defaultStyle = {
  background: "#000",
  border_color: "#924a41",
  text_color: "#c08179",
  button_bg: "#352b42",
  button_hover_bg: "#924a41",
  button_active_bg: "#493aa5",
  button_active_border: "#867ade",
  progress_bg: "#252525",
  progress_bar: "#867ade",
  progress_border: "#3a3a3a",
  status_color: "#bbb",
  config_accent: "#867ade",
  font_family: "DOS, SimHei, Monaco, Menlo, Consolas, 'Courier New', monospace",
};

hexo.config.sam_reader = Object.assign(
  {
    front_matter_key: "sam",
    content_selector: ".mypage",
    asset_path: "/js/hexo-sam-reader",
    speed: 72,
    pitch: 64,
    mouth: 128,
    throat: 128,
    pause_ms: 400,
    chunk_max_length: 200,
    abbreviations: {},
    skip_selectors: "",
    style: {},
  },
  hexo.config.sam_reader,
);

hexo.config.sam_reader.style = Object.assign(
  {},
  defaultStyle,
  hexo.config.sam_reader.style || {},
);

hexo.extend.helper.register("sam_reader", require("./lib/helper")(hexo));
hexo.extend.generator.register(
  "sam_reader_assets",
  require("./lib/generator")(hexo),
);

Two Object.assign() calls perform the configuration merge. The first merges the user’s sam_reader block from _config.yml over the plugin defaults. The second merges the user’s style sub-object over the default colour scheme. The merge order — defaults first, user config second — ensures user values override defaults while preserving any defaults the user did not specify.

lib/generator.js: Virtual Asset Serving

The generator module solves a distribution problem. In the prototype, sam.js had to be manually placed in the blog’s source/js/ directory. The generator eliminates this by serving the bundled sam.js as a virtual Hexo asset:

"use strict";

const path = require("path");
const fs = require("fs");

module.exports = function (hexo) {
  return function () {
    const config = hexo.config.sam_reader;
    const assetPath = config.asset_path.replace(/^\//, "").replace(/\/$/, "");
    const samFile = path.join(__dirname, "..", "assets", "sam.js");

    return {
      path: assetPath + "/sam.js",
      data: function () {
        return fs.createReadStream(samFile);
      },
    };
  };
};

This is Hexo’s generator API: return an object with a path (the URL path) and a data function (the content). The data function returns a readable stream rather than loading the entire 3,276-line file into memory. When Hexo builds the site, the generator creates a virtual file at /js/hexo-sam-reader/sam.js (by default) without the user copying anything.

The helper module contains the widget HTML, CSS, and client-side JavaScript. It was extracted from the prototype with one structural change: instead of registering itself, it exports a factory function that receives the hexo instance and returns the helper function:

"use strict";

module.exports = function (hexo) {
  return function () {
    var page = this.page;
    var config = hexo.config.sam_reader;
    var key = config.front_matter_key;

    if (!page || !page[key]) return "";

    // ... widget rendering
  };
};

The closure over hexo gives the helper access to hexo.config.sam_reader without global state. The front matter key is now configurable: instead of checking page.sam, it checks page[config.front_matter_key], allowing users to use any key they prefer.

Phase III: The Configuration Refactor

The Architectural Shift

Commit cfc40c2 (“feat: remove hardcoding from the library, push to _config.yml instead”) was the most significant change in the project’s history. Executed fifty minutes after the initial commit, it transformed hexo-sam-reader from a personal script into a distributable plugin.

The core insight was that every value a user might want to change should live in _config.yml, not in source code. This applied to thirteen categories of configuration:

Category	Prototype	Package
Front matter key	Hardcoded `sam`	`config.front_matter_key`
Content selector	Hardcoded `.mypage`	`config.content_selector`
Asset path	Hardcoded `/js/sam.js`	`config.asset_path`
Voice speed	Hardcoded `72`	`config.speed`
Voice pitch	Hardcoded `64`	`config.pitch`
Voice mouth	Hardcoded `128`	`config.mouth`
Voice throat	Hardcoded `128`	`config.throat`
Pause duration	Hardcoded `400`	`config.pause_ms`
Chunk max length	Hardcoded `200`	`config.chunk_max_length`
Abbreviations	18 hardcoded regexes	`config.abbreviations`
Skip selectors	Hardcoded list	`config.skip_selectors`
Widget colours	13 hardcoded hex values	`config.style.*`
Font family	Hardcoded string	`config.style.font_family`

Abbreviation Handling: Case Sensitivity

The most nuanced part of the refactor was the abbreviation system. In the prototype, each abbreviation was a separate text.replace() call with manually chosen flags (/g for case-sensitive, /gi for case-insensitive). The package needed a general mechanism.

The solution uses the casing of the abbreviation key to determine match behaviour:

var keys = Object.keys(ABBREVIATIONS);
for (var a = 0; a < keys.length; a++) {
  var k = keys[a];
  var flags = k === k.toUpperCase() ? "g" : "gi";
  text = text.replace(
    new RegExp(
      "\\b" + k.replace(/[.*+?^\/\\|()[\]{}]/g, "\\$&") + "\\b",
      flags,
    ),
    ABBREVIATIONS[k],
  );
}

If the key is entirely uppercase (SSH, CLI, API), the regex uses the g flag only — case-sensitive matching. If the key contains any lowercase character (LaTeX, CMake, libssh), the regex uses gi — case-insensitive matching. The rationale: an all-caps acronym like SSH should not match ssh in a URL path or code snippet where the casing is meaningful, but a mixed-case term like LaTeX should match regardless of how the author capitalised it.

The key is also escaped with a regex-safe replacement before being compiled into a RegExp constructor, preventing injection of regex metacharacters through the configuration.

Configuration in Practice

On my blog, the _config.yml block declares thirty abbreviations, thirteen style properties, and five voice parameters:

sam_reader:
  front_matter_key: sam
  content_selector: ".mypage"
  speed: 72
  pitch: 64
  mouth: 128
  throat: 128
  pause_ms: 400
  chunk_max_length: 200
  abbreviations:
    NL: "Netherlands"
    DE: "Germany"
    KIT: "K I T"
    ICPC: "I C P C"
    TAPC: "T A P C"
    SSH: "S S H"
    CLI: "C L I"
    API: "A P I"
    HTML: "H T M L"
    CSS: "C S S"
    FTXUI: "F T X U I"
    LaTeX: "lay-tech"
    CMake: "see-make"
    libssh: "lib S S H"
    "e.g.": "for example"
    "i.e.": "that is"
  style:
    background: "#000"
    border_color: "#924a41"
    text_color: "#c08179"
    button_bg: "#352b42"
    progress_bar: "#867ade"
    config_accent: "#867ade"
    font_family: "DOS, SimHei, Monaco, Menlo, Consolas, 'Courier New', monospace"

A user with a different theme would only need to change the selectors and colours. A user writing in a domain with different acronyms would only need to change the abbreviations. The voice parameters remain adjustable via sliders at runtime; the _config.yml values set the initial positions.

The Trade-Off: Batteries Not Included

The prototype shipped with eighteen hardcoded abbreviations: every acronym I used in my posts. The published package ships with an empty abbreviation map. This was a deliberate choice. Distributing my personal abbreviation list as defaults would cause incorrect speech for users who do not write about ICPC or use FTXUI. An empty default forces users to declare their own terms, which is the correct behaviour for a general-purpose plugin.

The same logic applies to the content selector. The prototype hardcoded .mypage, which is specific to the theme I use. The package defaults to .mypage but documents the override for other themes (e.g., .e-content for the default Landscape theme). The default is a suggestion, not an assumption.

Phase IV: The Text Processing Pipeline

Overview

The text processing pipeline converts a rendered HTML blog post into an array of SAM-compatible text chunks. It operates in four stages:

Extraction (getPostSegments()): DOM walking to produce raw text segments with pause markers.
Cleaning (cleanForSam()): Regex transformations to produce ASCII text SAM can pronounce.
Table reading (readTable()): Structured extraction of tabular data with column headers.
Chunking (buildChunks()): Splitting cleaned text into chunks within SAM’s character limit.

Text Extraction: Walking the DOM

The extraction function clones the post’s content container, removes non-readable elements, and walks the DOM tree to produce an array of segments. Each segment is either a string (speakable text) or null (a pause marker):

function getPostSegments() {
  var segments = [];

  // read the post title first
  var postTitle = widget.getAttribute("data-post-title") || "";
  if (postTitle) {
    segments.push(cleanForSam(postTitle) + ".");
    segments.push(null); // pause after title
  }

  var el = document.querySelector(CONTENT_SEL);
  if (!el) return segments;
  var clone = el.cloneNode(true);

  // remove non-readable elements
  var baseSel =
    "pre, script, style, .highlight, img, svg, " +
    ".sam-reader-widget, .alert, figure, .article-footer-copyright, " +
    "noscript, iframe, video, audio, canvas, .gist";
  var skipSel = EXTRA_SKIP ? baseSel + ", " + EXTRA_SKIP : baseSel;
  var remove = clone.querySelectorAll(skipSel);
  for (var i = 0; i < remove.length; i++) {
    remove[i].parentNode.removeChild(remove[i]);
  }

  function walk(node) {
    if (node.nodeType === 3) {
      // text node
      var t = node.textContent;
      if (t && t.trim()) segments.push(cleanForSam(t));
      return;
    }
    if (node.nodeType !== 1) return;
    var tag = node.tagName;

    if (tag === "TABLE") {
      var tableText = readTable(node);
      if (tableText) segments.push(cleanForSam(tableText));
      return;
    }

    if (/^H[1-6]$/.test(tag)) {
      segments.push(null);
      var hText = node.textContent.trim();
      if (hText) segments.push(cleanForSam(hText) + ".");
      segments.push(null);
      return;
    }

    var children = node.childNodes;
    for (var c = 0; c < children.length; c++) {
      walk(children[c]);
    }

    if (/^(P|DIV|BLOCKQUOTE|LI|SECTION|ARTICLE|HR)$/.test(tag)) {
      segments.push(null);
    }
  }

  walk(clone);
  return segments;
}

Several design decisions are embedded here. The clone operation prevents the extraction from modifying the visible page. Code blocks (pre, .highlight) are excluded because SAM cannot meaningfully pronounce source code. Headings receive pause markers on both sides, creating audible section breaks. Block-level elements receive trailing pauses, producing natural breathing points in the speech.

The skip selector list is extensible via EXTRA_SKIP (from config.skip_selectors), allowing users to exclude custom widgets, advertisement banners, or other non-content elements without modifying the plugin source.

Text Cleaning: The Regex Pipeline

The cleanForSam() function applies over forty regex transformations in a specific order. The ordering matters: operator replacements must occur before angle bracket stripping, and abbreviation replacements must occur before non-ASCII removal.

Stage 1: Remove unparseable content.

// URLs
text = text.replace(/https?:\/\/[^\s]+/g, "");
// protocol-like patterns (van://-, ---://-, etc.)
text = text.replace(/[a-zA-Z0-9_-]*:\/\/[^\s]*/g, "");
// email addresses
text = text.replace(/[\w.-]+@[\w.-]+/g, "");
// arrow functions and code constructs
text = text.replace(/\([^)]*=>\s*\{[^}]*\}[^)]*\)/g, "");

Stage 2: Replace operators with spoken equivalents.

text = text.replace(/>=/g, " greater than or equal to ");
text = text.replace(/<=/g, " less than or equal to ");
text = text.replace(/!=/g, " not equal to ");
text = text.replace(/===/g, " strictly equals ");
text = text.replace(/==/g, " equals ");
text = text.replace(/(?<![<>])>(?![<>])/g, " greater than ");
text = text.replace(/(?<![<>])<(?![<>])/g, " less than ");

The lookahead and lookbehind assertions on > and < prevent false positives inside HTML tags. Without them, a stray <div> remnant would become “less than div greater than”.

Stage 3: Handle slashes contextually.

1 2	text = text.replace(/ \/ /g, " OR "); // " / " → "OR" text = text.replace(/\//g, " slash "); // "/" → "slash"

A spaced slash (/) typically indicates alternatives (“Linux / macOS”) and reads naturally as “or”. An unspaced slash (“TCP/IP”) is a literal separator and reads as “slash”.

Stage 4: Strip code artifacts and apply abbreviations.

1
2
3

text = text.replace(/[{}()\[\]<>]/g, " ");
text = text.replace(/[*_~#]+/g, "");
text = text.replace(/\[\^\w+\]/g, ""); // footnote markers like ^[1]

After stripping, the configured abbreviations are applied using the case-sensitive logic described in Phase III.

Stage 5: Remove non-ASCII characters.

text = text.replace(/[\x80-\xFF]/g, " ");
text = text.replace(/[\u0100-\uFFFF]/g, " ");
text = text.replace(/[€£¥©®™°±×÷=+|\\^~&]/g, " ");
text = text.replace(/[^\x20-\x7E]/g, " ");

SAM was designed for 7-bit ASCII English. Any character outside the printable ASCII range (0x20–0x7E) would produce either garbage phonemes or a synthesis error. The pipeline aggressively strips everything SAM cannot handle.

Table Reading: Row-Wise with Headers

Tables require special handling. Naive text extraction of a table produces an incoherent sequence of cell values without context. The readTable() function reads tables row-wise, prepending each cell’s value with its column header:

function readTable(table) {
  var headers = [];
  var ths = table.querySelectorAll("thead th, thead td, tr:first-child th");
  for (var i = 0; i < ths.length; i++) {
    headers.push(ths[i].textContent.trim());
  }
  var rows = table.querySelectorAll("tbody tr, tr");
  var text = "";
  for (var r = 0; r < rows.length; r++) {
    var cells = rows[r].querySelectorAll("td");
    if (cells.length === 0) continue; // skip header row
    var rowParts = [];
    for (var c = 0; c < cells.length; c++) {
      var label = c < headers.length ? headers[c] : "";
      var val = cells[c].textContent.trim();
      if (label && val) rowParts.push(label + " " + val);
      else if (val) rowParts.push(val);
    }
    if (rowParts.length > 0) text += rowParts.join(", ") + ". ";
  }
  return text;
}

Given this table:

Language	Year
C	1972
Python	1991

The function produces: “Language C, Year 1972. Language Python, Year 1991.” This is intelligible when spoken aloud, unlike the flat extraction “C 1972 Python 1991” which loses all structure.

Chunking: Sentence and Comma Boundaries

SAM has practical limits on input length. The buildChunks() function splits text into chunks of at most CHUNK_MAX characters (default 200), respecting natural language boundaries:

function buildChunks(segments, maxLen) {
  var result = [];
  for (var s = 0; s < segments.length; s++) {
    if (segments[s] === null) {
      if (result.length === 0 || result[result.length - 1] !== null) {
        result.push(null);
      }
      continue;
    }
    var text = segments[s];
    if (!text) continue;

    // first pass: split on sentence boundaries
    var parts = text.split(/(?<=[.!?])\s+/);
    var current = "";
    for (var i = 0; i < parts.length; i++) {
      var p = parts[i].trim();
      if (!p) continue;
      if (current.length + p.length + 1 > maxLen && current.length > 0) {
        result.push(current.trim());
        current = p;
      } else {
        current += (current ? " " : "") + p;
      }
    }
    if (current.trim()) result.push(current.trim());
  }

  // second pass: split remaining long chunks on commas
  var final = [];
  for (var j = 0; j < result.length; j++) {
    if (result[j] === null) {
      if (final.length === 0 || final[final.length - 1] !== null) {
        final.push(null);
      }
      continue;
    }
    if (result[j].length <= maxLen) {
      final.push(result[j]);
    } else {
      var subparts = result[j].split(/,\s*/);
      var sub = "";
      for (var k = 0; k < subparts.length; k++) {
        if (sub.length + subparts[k].length + 2 > maxLen && sub.length > 0) {
          final.push(sub.trim());
          sub = subparts[k];
        } else {
          sub += (sub ? ", " : "") + subparts[k];
        }
      }
      if (sub.trim()) final.push(sub.trim());
    }
  }

  // strip leading/trailing pauses
  while (final.length > 0 && final[0] === null) final.shift();
  while (final.length > 0 && final[final.length - 1] === null) final.pop();

  return final;
}

The two-pass approach is deliberate. The first pass splits on sentence endings (., !, ? followed by whitespace, using a lookbehind assertion). If a single sentence still exceeds the character limit, the second pass splits on commas. This produces chunks that align with natural speech prosody: SAM pauses between chunks, and those pauses coincide with where a human would pause.

Consecutive null pause markers are compressed to a single pause, preventing excessive silence from nested block elements.

Phase V: The Playback Engine

Web Audio API Integration

The playback engine converts text chunks into audible speech using the Web Audio API. For each chunk, it instantiates a new SamJs object with the current slider values, generates a Float32Array of audio samples, and plays them through an AudioBufferSourceNode:

function playChunk(index) {
  if (stopped || index >= chunks.length) {
    progressBar.style.width = stopped ? "0%" : "100%";
    statusEl.textContent = stopped ? "Stopped" : "Done!";
    setButtons("ready");
    playing = false;
    paused = false;
    stopped = false;
    currentChunk = 0;
    return;
  }
  if (paused) return;

  currentChunk = index;
  updateProgress();

  // null = silent pause
  if (chunks[index] === null) {
    pauseTimer = setTimeout(function () {
      pauseTimer = null;
      if (!stopped && !paused) playChunk(index + 1);
    }, PAUSE_MS);
    return;
  }

  if (!audioCtx)
    audioCtx = new (window.AudioContext || window.webkitAudioContext)();

  var sam = new SamJs({
    speed: parseInt(speedInput.value, 10),
    pitch: parseInt(pitchInput.value, 10),
    mouth: parseInt(mouthInput.value, 10),
    throat: parseInt(throatInput.value, 10),
  });

  var buf32;
  try {
    buf32 = sam.buf32(chunks[index]);
  } catch (e) {
    playChunk(index + 1);
    return;
  }
  if (!buf32 || buf32.length === 0) {
    playChunk(index + 1);
    return;
  }

  var audioBuffer = audioCtx.createBuffer(1, buf32.length, 22050);
  audioBuffer.getChannelData(0).set(buf32);

  var source = audioCtx.createBufferSource();
  source.buffer = audioBuffer;
  source.connect(audioCtx.destination);
  currentSource = source;

  source.onended = function () {
    currentSource = null;
    if (!stopped && !paused) {
      playChunk(index + 1);
    }
  };
  source.start();
}

The function is recursive: each chunk’s onended callback invokes playChunk(index + 1). This creates a sequential playback chain without blocking the main thread. The AudioContext is created lazily on first play, complying with browser autoplay policies that require audio contexts to be initialised from user gesture handlers.

The sample rate of 22050 Hz matches SAM’s native output frequency. The buf32() method returns a Float32Array of PCM samples that map directly into a single-channel AudioBuffer.

The State Machine

Playback state is managed through four boolean flags (playing, paused, stopped) and two resource references (currentSource, pauseTimer):

function setButtons(state) {
  btnPlay.disabled = state === "playing";
  btnPause.disabled = state !== "playing";
  btnStop.disabled = state === "ready";
  btnPlay.classList.toggle("active", state === "playing");
  btnPause.classList.toggle("active", state === "paused");
}

function stopAudio() {
  if (pauseTimer) {
    clearTimeout(pauseTimer);
    pauseTimer = null;
  }
  if (currentSource) {
    try {
      currentSource.stop();
    } catch (e) {}
    currentSource = null;
  }
}

The state transitions are:

Ready → Playing: User clicks Play. Segments are extracted, cleaned, chunked. playChunk(0) starts the chain.
Playing → Paused: User clicks Pause. stopAudio() halts the current chunk. currentChunk is preserved. Clicking Play resumes from the same index.
Playing → Stopped: User clicks Stop. stopped = true causes playChunk() to exit on its next invocation. Progress resets to zero.
Paused → Playing: User clicks Play. paused = false, then playChunk(currentChunk) resumes.
Playing → Done: The chunk index exceeds chunks.length. Progress bar fills to 100%.

Voice settings (Speed, Pitch, Mouth, Throat) are read from the sliders at the moment each chunk is synthesised, not when playback starts. This means adjusting a slider mid-playback takes effect on the next chunk. To hear the change immediately, the user can pause and resume.

Progress Tracking

Progress is tracked by counting speakable chunks (non-null entries) rather than all entries:

function countSpeakable() {
  var n = 0;
  for (var i = 0; i < chunks.length; i++) {
    if (chunks[i] !== null) n++;
  }
  return n;
}

function updateProgress() {
  var total = countSpeakable();
  var done = countSpoken();
  var pct = total > 0 ? (done / total) * 100 : 0;
  progressBar.style.width = pct + "%";
  statusEl.textContent = done + " / " + total;
}

This prevents pauses from inflating the total count and making the progress bar advance non-uniformly. A post with many headings (and therefore many pause markers) reports the same total as a post of equivalent text length without headings.

Phase VI: Documentation and Publication

The README

Commit d6ed17b (1:14 PM) added a 224-line README covering installation, usage, configuration, architecture, and credits. The documentation was written with a specific goal: a user should be able to install and configure the plugin without reading the source code.

The “Inside out” section described the three-component architecture:

Generator (lib/generator.js) serves the bundled sam.js library as a virtual Hexo asset.
Helper (lib/helper.js) renders the widget HTML/CSS/JS when <%- sam_reader() %> is called in a template.
Client-side, the widget extracts text, cleans it, chunks it, and plays it via the Web Audio API.

The README included a configuration block with every option documented, a table of text cleaning transformations, examples of abbreviation configuration with the case-sensitivity rules explained, and a styling example showing how to create a green-themed widget with only five overrides.

Version History

The project went through three versions in a single day:

v1.0.0 (10:51 AM): Initial commit. Functional but hardcoded.
v1.0.1 (11:44 AM): Configuration refactor. All hardcoding removed. Abbreviation system generalised. Fifty minutes of development.
v1.0.2 (7:25 PM): Documentation complete. README published. pnpm support added. Attribution comments added. Live demo link included.

The version numbering follows semver: the jump from 1.0.0 to 1.0.1 marked a backwards-compatible feature addition (configuration support). The jump to 1.0.2 marked documentation and metadata patches.

Commit History

The full ten-commit history tells the development story concisely:

0e7d294  init: desire to publish package
3b5d5c2  chore: fix typos and inconcise comments
cfc40c2  feat: remove hardcoding from the library, push to _config.yml instead
ec40955  version: update to v1.0.1
d6ed17b  docs: publish README documentation describing process, with asset sam.png
f0dba3f  docs: fix readme toc link
f795c36  update JS functions to add author and credit
47d93d6  docs: add pnpm, improve usage guidelines (css wrapper)
f52e824  Merge branch 'main'
3651782  update to v1.0.2

The pattern is typical of a rapid development sprint: initial implementation, immediate typo cleanup, the substantial feature commit, a version bump, documentation in bulk, then small fixes and a final version bump.

Architecture Summary

The complete data flow from Hexo build to audible speech:

hexo generate
    │
    ├── index.js
    │   ├── Merge _config.yml with defaults (Object.assign)
    │   ├── Register helper: sam_reader()
    │   └── Register generator: sam_reader_assets
    │
    ├── lib/generator.js
    │   └── Serve assets/sam.js at /js/hexo-sam-reader/sam.js
    │
    └── lib/helper.js (called per post where sam: true)
        └── Returns HTML string:
            ├── Widget markup (controls, progress bar, sliders)
            ├── Inline CSS (colours from config.style)
            └── Inline IIFE JavaScript:
                │
                ├── Load sam.js dynamically
                ├── On click Play:
                │   ├── getPostSegments()
                │   │   ├── Clone content DOM
                │   │   ├── Remove skip elements
                │   │   ├── Walk tree → segments[]
                │   │   │   ├── Text nodes → cleanForSam(text)
                │   │   │   ├── Tables → readTable() → cleanForSam()
                │   │   │   ├── Headings → null, text, null
                │   │   │   └── Block elements → trailing null
                │   │   └── Return [string | null, ...]
                │   ├── buildChunks(segments, 200)
                │   │   ├── Split by sentences (.!?)
                │   │   ├── Split by commas if still too long
                │   │   └── Compress consecutive nulls
                │   └── playChunk(0)
                │       ├── null → setTimeout(PAUSE_MS) → next
                │       └── string → SamJs.buf32() → AudioBuffer → play → onended → next
                │
                ├── On click Pause: stopAudio(), preserve index
                └── On click Stop: reset all state

Conclusion

hexo-sam-reader was built in a single day. The prototype existed before that day, but the transformation from a personal script to a distributable package — the modularisation, the configuration framework, the virtual asset serving, the documentation — was completed in ten commits across eight and a half hours.

The technical constraints shaped the design. SAM operates on 7-bit ASCII, so the text pipeline strips everything else. SAM has practical input length limits, so the chunker splits on sentence and comma boundaries. SAM runs synchronously in the browser, so the playback engine chains chunks through onended callbacks to avoid blocking the main thread. The Web Audio API requires user gesture initialisation, so the AudioContext is created lazily on first play.

The configuration refactor was the pivotal decision. A plugin with hardcoded colours and abbreviations is a personal script. A plugin that reads its configuration from _config.yml, merges user overrides with sensible defaults, and lets users specify their own abbreviation maps and colour schemes is a package that other people can use.

The source is available at github.com/trintlermint/hexo-sam-reader. Install with npm install hexo-sam-reader, add sam: true to a post, and call <%- sam_reader() %> in your theme’s sidebar partial.

1.SAM (Software Automatic Mouth) was created by Mark Barton of Don't Ask Software in 1982, originally for the Commodore 64. It was one of the first commercially available speech synthesisers for home computers. ↩
2.SamJs v0.3.0 by Christian Schiffler (discordier). A JavaScript port of SAM that preserves the original phoneme engine. Source: github.com/discordier/sam. ↩
3.The Web Audio API provides the AudioContext, AudioBuffer, and AudioBufferSourceNode interfaces used for PCM playback. SAM outputs audio at 22050 Hz, which is passed directly to AudioContext.createBuffer(). ↩

This blog is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

Prev Home Next

hexo-sam-reader: Building a SAM Text-to-Speech Plugin for Hexo