The right way to store Chilean RUTs in your database
Pick a canonical layout for Chilean RUTs (Postgres, MySQL, SQLite), normalize at the write boundary with rut.ts, and add UNIQUE + CHECK to block duplicates.
Storing Chilean RUTs as the user typed them (Postgres, MySQL, SQLite — same problem) — dots, dashes, lowercase k, leading zeros, whitespace — is how you get duplicate customers and broken joins. The same person arrives as 12.345.678-5, 12345678-5, and 12345678k, and a naïve equality check treats every one as a different identity. The bug is not in the query; it is in the decision to tolerate multiple representations. Normalize once at the write boundary and pick a canonical column layout up front. Everything downstream becomes simpler: lookups are point reads, uniqueness constraints work as expected, and no ad-hoc REPLACE() gymnastics appear in SQL to paper over the inconsistency.
How should I store a Chilean RUT in the database?#
Pick Layout B — a single column, cleaned to digits-plus-verifier (123456785) — for most product databases. Normalize at the write boundary with clean() from rut.ts, add a UNIQUE index, and a Postgres CHECK constraint for defense-in-depth. Reach for Layout C (body + verifier columns) only when the body-only query pattern is frequent.
| Layout | What it stores | Pick when |
|---|---|---|
| A | Formatted (12.345.678-5) | Almost never — only when the column is purely for display |
| B | Cleaned digits (123456785) | Default for product DBs; one index, predictable joins |
| C | body + verifier columns | Analytics or joins against government datasets keyed on the body |
The three layouts (and which to pick)#
Layout A — single column, formatted stores the RUT as it appears on a Chilean ID card: 12.345.678-5. It is human-readable, but a poor choice for joins and indexes. Every writer must format the value identically, or you end up with the same multi-representation problem you were trying to avoid.
Layout B — single column, cleaned stores only digits plus the verifier character, no punctuation: 123456785. This is the recommended layout for most applications. It is compact, sorts predictably, requires a single index, and integrates cleanly with any ORM or raw SQL equality lookup. The verifier is appended directly to the body digits, so the value is self-contained and re-formattable at the display layer whenever needed.
Layout C — two columns, body and verifier splits the RUT into a body column (e.g. 12345678) and a verifier column ('5' or 'K', one character). This layout pays off when you frequently query the body without the verifier — common in analytics pipelines and joins against government datasets that key on the body alone.
For most product databases, Layout B is the right starting point. Reach for Layout C only when the body-only query pattern is frequent enough to justify managing two columns.
Normalize on write with clean() and decompose()#
Before any RUT value reaches an INSERT or UPDATE, it must pass the strict acceptance gate. Call validate(input, { strict: true }) and reject the request if it returns false. This blocks malformed values, bad check digits, and placeholder patterns like 11.111.111-1 that are technically self-consistent but never correspond to a real identity.
clean(input, { throwOnError: false }) strips all punctuation and uppercases the verifier character, producing the canonical digit string for Layout B. decompose(input, { throwOnError: false }) returns { body, verifier } for Layout C. The throwOnError: false option returns null on failure instead of throwing, so you can branch on the result without a try/catch.
import { clean, decompose, validate } from "rut.ts";
export function canonicalizeRut(input: string): string | null {
if (!validate(input, { strict: true })) return null;
return clean(input, { throwOnError: false }); // e.g. "123456785"
}
export function splitRut(input: string) {
if (!validate(input, { strict: true })) return null;
return decompose(input, { throwOnError: false }); // { body, verifier }
}Keep these helpers at the service boundary. The service layer owns the validation decision; the repository layer receives values that are already canonical.
Indexes and constraints#
Always add a UNIQUE index on the canonical RUT column. Duplicate customers are expensive to clean up, and the constraint is cheap insurance. On PostgreSQL, pair it with a CHECK constraint for defense-in-depth:
ALTER TABLE customers
ADD CONSTRAINT customers_rut_format
CHECK (rut ~ '^[0-9]{7,8}[0-9K]$');This catches accidental writes that bypass the application layer — direct psql sessions, migration scripts, or a service that skips the validation helper. The regex is simple and anchored; it will not affect write performance.
The K verifier must be stored uppercase. clean() already uppercases it unconditionally — do not use CITEXT to paper over a lowercase k. It costs comparison performance on every index scan and masks an application bug that should be fixed at the source.
For Layout C, index (body, verifier) as a composite unique index. If your database supports generated columns, a generated verifier = calculate_verifier(body) column prevents a body and verifier from different RUTs from being stored in the same row. In PostgreSQL, a trigger achieves the same effect when generated columns are not available.
Migrating existing dirty data#
If you are retrofitting normalization onto an existing table, do not apply the UNIQUE constraint first. Canonicalize the data, detect collisions, and resolve them before constraining.
A migration script that calls validate() and clean() over every existing row will flag values that do not survive strict validation — malformed entries that were never real RUTs. Once every surviving row has a canonical value, run a collision detection query:
SELECT rut_canonical, COUNT(*) AS n
FROM customers
GROUP BY rut_canonical
HAVING COUNT(*) > 1;A collision means two formatted strings like 12.345.678-5 and 12345678-5 canonicalize to the same identity. The library cannot decide which row to keep — that is a business decision. Pick a survivor based on data completeness or recency, merge dependent records (orders, addresses, documents) to the survivor's primary key, then delete the duplicates. Only after that step is the table safe to constrain.
Pitfalls#
Storing a lowercase k. Some input sources produce the verifier as lowercase k. clean() returns uppercase unconditionally — trust it and do not second-guess the casing in SQL after the fact.
Trimming punctuation with SQL REPLACE(). A REPLACE(rut, '.', '') chain in SQL does not handle unicode dashes (‐, –, —) or non-breaking whitespace. These characters are rare but not impossible in data that arrived from a copy-paste or an external API. Normalize in the application layer with clean(), which handles the full range of separators the library is designed for, then write the result to the database. rut.ts ships zero runtime dependencies, so wiring it into a repository layer adds nothing to your transitive package graph.
Joining a formatted column against a cleaned column. If one table stores 12.345.678-5 and another stores 123456785, an equality join returns zero rows. No error is raised and tests that use matching literal values in fixtures will not catch it. The silent failure can persist for months in production. The fix is to normalize both sides to the same canonical form before any join runs — or, better, to normalize at write time so the mismatch never exists in the first place.
Letting two sources of truth drift. A RUT that lives in the customers table and also in a denormalized event payload stored as JSON will drift if the payload was written before normalization was enforced. Establish a single canonical source and treat any other occurrence as derived. When a discrepancy appears, always trust the source of truth and correct the derived copy.
Further reading#
- Install rut.ts —
pnpm add rut.ts, zero runtime dependencies - Quick start — wire
validate+cleaninto a repository in five minutes - Security guide — PII handling and Ley 19.628 obligations
clean()referencedecompose()reference- Validate Chilean RUT in NestJS with class-validator
- Hardening RUT acceptance: strict mode and safe processing — what to enforce before a RUT reaches the write boundary
- Migrating from rut.js to rut.ts in 5 minutes — canonicalizing legacy data during a library swap