Header menu logo Xanthos

Text Module

Text encoding and decoding utilities for JV-Link data processing.

JV-Link returns data encoded in Shift-JIS (code page 932), which is the standard encoding for Japanese text in legacy Windows applications. This module provides functions to decode Shift-JIS to .NET strings and encode back for transmission.

Caching: Frequently decoded/encoded strings are cached in memory to improve performance when processing large volumes of data. Cache size is limited to prevent unbounded memory growth (max 1024 entries per cache).

Environment Variables:

  • XANTHOS_DISABLE_TEXT_CACHE - Set to any non-empty value to disable caching. Useful for memory-constrained environments or debugging.

Functions and values

Function or value Description

clearCaches ()

Full Usage: clearCaches ()

Parameters:
    () : unit

Clears both encode and decode caches.

Use this function to free memory when processing is complete or when switching between different data sources. The caches will be rebuilt automatically as new data is processed.

This function is thread-safe but may have race conditions with concurrent encode/decode operations (which will simply repopulate the cache).

() : unit

decodeShiftJis bytes

Full Usage: decodeShiftJis bytes

Parameters:
    bytes : byte[] - The Shift-JIS encoded byte array. May be null or empty.

Returns: string The decoded string with trailing null characters removed. Returns Empty for null or empty input.

Decodes a Shift-JIS encoded byte array to a .NET string.

This function first attempts strict Shift-JIS decoding. If that fails (invalid byte sequences), it falls back to lenient Shift-JIS, then strict UTF-8, then lenient UTF-8 as a last resort.

Results are cached for performance unless:

  • XANTHOS_DISABLE_TEXT_CACHE is set
  • The byte array exceeds 512 bytes

bytes : byte[]

The Shift-JIS encoded byte array. May be null or empty.

Returns: string

The decoded string with trailing null characters removed. Returns Empty for null or empty input.

ArgumentException Never thrown; fallback ensures a result.

decodeShiftJisBstrBytesIfNeeded text

Full Usage: decodeShiftJisBstrBytesIfNeeded text

Parameters:
    text : string

Returns: string

Decodes JV-Link Shift-JIS bytes that were mistakenly marshalled as a BSTR string.

Some JV-Link COM APIs populate out-parameters using Shift-JIS bytes, but the COM marshaller may expose them to .NET as a UTF-16 string without decoding. This function attempts to recover the original bytes from common BSTR layouts and decode them as Shift-JIS.

If the input already looks like readable Japanese text, it is returned as-is.

text : string
Returns: string

encodeShiftJis text

Full Usage: encodeShiftJis text

Parameters:
    text : string - The string to encode. May be null or empty.

Returns: byte array The Shift-JIS encoded byte array. Returns an empty array for null or empty input.

Encodes a .NET string to Shift-JIS byte array.

Results are cached for performance unless:

  • XANTHOS_DISABLE_TEXT_CACHE is set
  • The string exceeds 512 characters

text : string

The string to encode. May be null or empty.

Returns: byte array

The Shift-JIS encoded byte array. Returns an empty array for null or empty input.

EncoderFallbackException Thrown if the string contains characters that cannot be encoded in Shift-JIS.

looksGarbledJvText text

Full Usage: looksGarbledJvText text

Parameters:
    text : string

Returns: bool

Heuristically detects obviously garbled (mojibake) text returned by JV-Link.

This is intentionally conservative and only flags strings that contain a high amount of private-use characters or control characters, which should not appear in normal Japanese explanations.

text : string
Returns: bool

normalizeJvText text

Full Usage: normalizeJvText text

Parameters:
    text : string - The text to normalize. May be null or empty.

Returns: string The normalized string with fullwidth digits and letters converted to ASCII. Returns the input unchanged if null or empty.

Normalizes JV-Link text by converting fullwidth characters to ASCII equivalents.

JV-Link data often contains fullwidth (全角) characters that should be normalized for consistent processing:

  • Fullwidth digits (0-9) → ASCII digits (0-9)
  • Fullwidth uppercase (A-Z) → ASCII uppercase (A-Z)
  • Fullwidth lowercase (a-z) → ASCII lowercase (a-z)

Also applies Unicode NormalizationForm.FormKC for compatibility decomposition.

text : string

The text to normalize. May be null or empty.

Returns: string

The normalized string with fullwidth digits and letters converted to ASCII. Returns the input unchanged if null or empty.

Type something to start searching.