Hardening RUT acceptance: strict mode and safe processing

A permissive Chilean RUT validator is a foothold. When the acceptance layer treats "looks like a RUT" as "is a valid RUT", it lets placeholders bypass identity checks, duplicate accounts pile up, and analytics pipelines accumulate garbage silently. The hardening recipe is short and explicit: strict mode at the boundary, and safe-mode helpers everywhere downstream where you cannot guarantee the input has already been vetted.

The acceptance pipeline in three lines#

TypeScript

import { isRutLike, validate } from "rut.ts";
 
export const acceptRutStrict = (input: unknown): string | null =>
  typeof input === "string" && isRutLike(input) && validate(input, { strict: true })
    ? input
    : null;

Type guard, shape check, strict semantic gate — composed into a single zero-dependency function. Drop this into every API endpoint that accepts a RUT, and the rest of the post explains why each layer is load-bearing.

Threat model#

Real production systems encounter several categories of RUT abuse, none of them theoretical.

Placeholder RUTs are the most common. Values like 11.111.111-1, 22.222.222-2, and 99.999.999-9 have valid Modulo 11 verifier digits — bots, test scripts, and users who want to skip identity checks all reach for them. The plain validate() call returns true for all of them.

Encoding games are subtler. A string that passes a naive regex can contain unicode dashes (‐, –, —) instead of the ASCII hyphen, zero-width characters between digits, or trailing whitespace that breaks an exact match against a stored canonical form.

Oversized input is the denial-of-service angle. A hand-rolled validator built on an unbounded regex ([0-9]+…) can be driven into catastrophic backtracking — ReDoS — by a crafted, very long string, stalling the event loop for every other request on the box. rut.ts caps input length before any parsing or regex runs (a fixed 64-character security bound; a real RUT is ~9 characters), so oversized input is rejected up front instead of burning CPU in the matcher. validate('9'.repeat(1_000_000)) returns false immediately.

Type confusion happens at API boundaries. "rut": 12345678 in a JSON body is a number. Calling .toUpperCase() or .replace() on it without a type guard throws. So does null, undefined, or an array value.

Case and verifier confusion round out the picture. The letter verifier K is sometimes entered as k, and whether that passes or fails depends on where normalization runs relative to validation. These categories combine: a single submission using k, a unicode dash, and a placeholder body defeats any validator that does not address all three layers.

The acceptance pipeline#

The reliable pattern is three layers applied in order, each cheap enough that the cost is negligible.

The first layer is a type check. Reject anything that is not a string. This protects every downstream call from type errors and ensures your validator never touches null, undefined, or a numeric body parsed from JSON.

The second layer is a shape check using isRutLike. It confirms the string has RUT structure — digits, optional dot grouping, a hyphen, a verifier character — without computing the checksum. Encoding garbage and clearly malformed strings are eliminated before the heavier check runs.

The third layer is the semantic gate: validate with { strict: true }. This verifies the Modulo 11 checksum and rejects repeated-digit placeholders. A value that passes all three layers is a real, canonical, non-placeholder RUT.

TypeScript

import { isRutLike, validate } from "rut.ts";
 
export function acceptRutStrict(input: unknown): string | null {
  if (typeof input !== "string") return null;
  if (!isRutLike(input)) return null;
  if (!validate(input, { strict: true })) return null;
  return input;
}

Each rejection is a null return rather than a throw. The function is designed for contexts where invalid input is expected and handled without exception overhead. When you need an exception, throw at the call site, not inside the gate.

Return the branded type, not a bare string#

The gate above returns string | null, which throws away information the moment it returns: the caller gets back a string that looks no different from the unvalidated one. isValidRut() is the same strict check expressed as a type guard — inside the if, the value is narrowed to rut.ts's branded Rut type, and you can hand that back so the type system itself carries "this was validated" forward.

TypeScript

import { isValidRut } from "rut.ts";
import type { Rut } from "rut.ts";
 
export function acceptRut(input: unknown): Rut | null {
  return isValidRut(input, { strict: true }) ? input : null;
}

Now a function that persists or charges against a RUT can declare (rut: Rut), and TypeScript rejects any path that tries to pass a string that never went through the gate. The acceptance boundary stops being a convention you have to remember and becomes a compile-time error when you forget it — which is exactly the posture you want for an identity value.

Safe-mode helpers for downstream code#

Once a RUT clears the acceptance pipeline, clean() and decompose() normalize it for storage or display. In those code paths the input is trusted, and throwing on invalid input is reasonable — the value should never have reached that point.

But not all code paths start from a trusted source. Log parsers, message-queue consumers, ETL pipelines, and reporting jobs often receive RUT-shaped strings from external systems that were not validated by your acceptance layer. In those contexts you want normalization that degrades gracefully rather than throwing.

All rut.ts decomposition helpers accept { throwOnError: false }. With that option, an invalid or malformed input returns null instead of raising an exception. The caller branches on null and continues processing other records.

TypeScript

import { clean, decompose } from "rut.ts";
 
export function safeNormalize(input: string) {
  const cleaned = clean(input, { throwOnError: false });
  if (!cleaned) return null;
  return decompose(cleaned, { throwOnError: false });
}

The pattern is: attempt to clean, short-circuit on null, then decompose. If either step fails the function returns null and the caller decides whether to skip the record, log a warning, or increment a counter — no try/catch needed for expected failure modes.

Logging without leaking PII#

RUT is personally identifiable information under Ley 19.628, Chile's data protection law. That means logging a raw RUT in error messages, request traces, or analytics events is a compliance issue, not just a hygiene preference.

The practical rule: never log the raw value. If you need to correlate a log entry with a specific RUT for debugging, mask it to the verifier digit plus the last two digits of the body — something like ••••••78-5. That pattern is unique enough to find the record in a database but reveals nothing on its own. If you'd rather not hand-roll the redaction, rut.ts ships mask(): mask('12.345.678-5') returns '12.***.***-5'. It keeps the leading group and verifier instead of the trailing digits, but the goal is the same — a correlatable token that leaks nothing.

The library is built to not work against you here. Every rut.ts helper that throws raises a typed InvalidRutError whose message is the constant Invalid RUT input — the offending value is never interpolated into it. So even an unhandled exception that bubbles into your error tracker carries no national ID. Catch it by class or code, never by message text:

TypeScript

import { clean, InvalidRutError } from "rut.ts";
 
try {
  return clean(userInput); // default (throwing) mode
} catch (err) {
  if (err instanceof InvalidRutError) {
    // err.code === "INVALID_RUT"; err.message leaks nothing
    metrics.increment("rut.invalid");
  }
  return null;
}

Error reporters like Sentry require explicit configuration. The two safe approaches are scrubbing the field at the SDK layer using beforeSend, or stripping the value before it is serialized into an error object. Relying on Sentry's default scrubbing is not sufficient: the defaults target passwords and credit card numbers, not RUTs. A raw rut property on any error object will be captured.

The same principle applies to structured logging. A logger that serializes the full request body will include the RUT. Mask before passing values to any logging call.

Pitfalls#

Forgetting strict and accepting placeholders. validate('11.111.111-1') returns true. The Modulo 11 algorithm does not distinguish real bodies from repeated-digit sequences. Without strict: true, your signup form, your identity verification flow, and your deduplication logic will all see 11.111.111-1 as a valid RUT and create real records for it.

Client-only validation. The client-side check is a UX aid. Any caller with curl or a browser DevTools console can send an arbitrary body. Re-validate with strict: true on every server endpoint that accepts a RUT, regardless of what the client already checked.

Throwing in code paths where invalid input is expected by design. Log scanners, queue consumers, and ETL jobs process input from systems that may not have validated it. Using the default throwing behavior in those contexts means one malformed record halts processing of everything that follows it. Use { throwOnError: false } and branch on null.

Caching invalid normalizations. If you cache the result of clean() keyed on the raw input string, and the raw input is invalid, you will store a null value in the cache and serve it to subsequent callers. Run validate() first; only cache the cleaned result when validation passes.

The acceptance pipeline in three lines#

Threat model#

The acceptance pipeline#

Return the branded type, not a bare string#

Safe-mode helpers for downstream code#

Logging without leaking PII#

Pitfalls#

Further reading#