elibaum.com

on the incomplete redaction of sensitive information

24 Nov 2024

You wake up, dazed, in a dusty cubicle. Silence, save a dull throbbing in your head. Foggy memories of a suburban office park trickle into your consciousness. Across the floor are strewn thousands of sheets of papers – you reach for a few, straining to focus your vision – all identical, a letter, corporate stationary, the signature line redacted with heavy black ink. You become suddenly convinced, inexplicably, that if you could decipher this hidden name, everything would be made clear. You could leave this place, return to… where, exactly?1

alt text Image credit Gemini

Looking closely, you realize the censor has been sloppy, perhaps in a rush to vacate this haunted place of commerce. An arm pokes out of the right corner of the blackened rectangle, a beckoning clue. That must be a K, an X? No, a Y – no leg to match. And the left side isn’t fully redacted, either. There’s the edge of a full-height vertical bar. That would belong to the first letter – it must be one of:

B D E F H I K L M N P R

Ah, no, not an M. Elsewhere on the page you see the slight slant in this font’s capital.

Rifling through a desk drawer you find some standard-issue office supplies, including a micrometer. That’ll come in handy. You scribble down the clues you’ve amassed so far using a very-nearly dead ballpoint pen:

1. A Last Name, written in all capitals
2. Ends in a "Y"
3. Starts with one of "B D E F H I K L N P R"
4. Probably about 8-12 letters long.

There are just too many names this could be – there’s no way you could narrow this down any further, is there?

A pang of hunger in your stomach. When was the last time you ate? Perhaps in the supply closet… but alas, just more office supplies. Blank papers. Decomposing styrofoam coffee cups. And mysteriously, a moving box labeled 2010 U.S. Census in loopy cursive. That’s strange – has 2010 even occurred yet? You seem to remember Reagan being president.

Inside the box, neat stacks of papers, it must be thousands of sheets. You peer at one brittle sheet: FINANCIAL CHARACTERISTICS FOR HOUSING UNITS WITHOUT A MORTGAGE. A shriek down the hallway. A rusty office chair, surely. You blink twice. Tossing the paper aside, you extract a thick packet entitled Frequently Occurring Surnames from the 2010 Census. Much more useful. But you don’t have all day. There must be a hundred thousand surnames here.

Bringing the faded packet out into the main office floor, the omnipresent gray light, you collapse into a lopsided chair. A smashed computer terminal before you, glass shards on the desk. But wait - is it broken? A dim line pulses, weak, like the breaths of a sick animal. Shielding the screen with your hand you can just make out some text:

Python 3.10.2 (v3.10.2:a58ebcc701, Jan 13 2022, 14:50:16) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> _

How inviting, unexpected, the cursor. Three arrows promising to propel you forwards, towards, towards…

You get to work. There are far too many names in the Census data to consider by hand, but a simple computer program can make quick work of them. Grabbing the micrometer and a fresh copy of the letter you begin to take measurements.

The redacted name is about 1190 thou wide. Elsewhere you measure that letters range from I (only 60 thou) up to M (180 thou), with about 20 thou of kerning space. You can’t find every letter of the alphabet – not a Q in sight – but presumably the font is relatively consistent, so you make a few estimates. Laboriously you punch a program into the computer.

#!/usr/bin/env python
import math
import pandas as pd

MYSTERY_LEN_PX = 1191

FLAT_LEFT = "ERIPDFHKLBN"

# includes one kerning space ~ 20 thou
WIDTHS = {
    "A": 117, "B": 120, "C": 125, "D": 120, "E":  95, "F":  84, "G": 130,
    "H": 100, "I":  60, "J": 105, "K": 112, "L": 128, "M": 180, "N": 100,
    "O": 150, "P": 105, "Q": 150, "R": 112, "S":  93, "T": 128,"U": 131,
    "V": 117, "W": 180, "X": 117, "Y": 128, "Z": 125,
}

# last letter doesn't have kerning after it. per measurement, "I" is
# about 1/3 letter and 2/3 space.
SPACE_WIDTH = round(WIDTHS["I"] * 2/3)
MYSTERY_LEN_PX -= SPACE_WIDTH

max_letters = math.ceil(MYSTERY_LEN_PX / WIDTHS["I"])
min_letters = math.floor(MYSTERY_LEN_PX / WIDTHS["M"])

df = pd.read_csv("~/Downloads/names/names.csv", header=1)

names = df[df.columns[0]].dropna().tolist()

L = filter(lambda x:
    (len(x) < max_letters and len(x) > min_letters and
       x.endswith('Y') and any(x.startswith(s) for s in FLAT_LEFT)),
    names)

C = map(lambda name: (name, sum(WIDTHS[k] for k in name)), L)

result = list(C)
result.sort(key=lambda x: abs(MYSTERY_LEN_PX - x[1]))

print("Target:", MYSTERY_LEN_PX)
for i, r in enumerate(result):
    n, L = r
    diff = L - MYSTERY_LEN_PX
    print(f"{n:16} {L:4} {diff:3}")

    if i+1 >= 100:
        break

By the power of deus ex machina, you realize the census data is in fact already loaded onto this computer’s hard drive. With the program keyed in, breath held, you press ENTER to run the program. A door slams shut upstairs. A few anxious seconds, and output begins to appear, slowly, one line at a time. You strain to recognize any names. Nothing clicks. Who are these people?

Target: 1151 thou
HUNTZBERRY       1151   0
ROTENBERRY       1152   1
NAKONECZNY       1152   1
BOURJOLLY        1152   1
FOXWORTHY        1149  -2
NEMIROVSKY       1147  -4
BEREZOVSKY       1147  -4
KOSLOWSKY        1146  -5
RUTKOWSKY        1146  -5
LANDAVERRY       1146  -5
DEMONTINEY       1156   5
HUMENANSKY       1156   5
PALANISAMY       1145  -6
ROMANOSKY        1142  -9
KAMENETSKY       1160   9
BRAUDAWAY        1142  -9
FORTENBURY       1160   9
BROADAWAY        1161  10
NOSWORTHY        1141 -10
LEVANDUSKY       1141 -10
DILLAHUNTY       1140 -11
KINSWORTHY       1163  12
LEVENDOSKY       1138 -13
DOMBROSKY        1165  14
HOCKENBERY       1137 -14
BOBROWSKY        1165  14
KOSTELECKY       1166  15
HOLTSBERRY       1166  15
ECHEAGARAY       1136 -15
DERRYBERRY       1134 -17
PENDLEBURY       1134 -17
LOUNSBERRY       1169  18
...

You suddenly feel very sleepy. Just a quick nap, a̷n̴d̴ ̴t̴h̸i̷n̴g̵s̵ ̴w̷i̵l̵l̷ m̵̜̎̎̾a̴͔͒͘̕k̴̤͊͗̓ë̶̝́̈́͌ ̴̤̅̄̽̕m̵̧̖͓͖̂̑ó̵̯̬̥̱̌̕r̸̭̋̎͊͛e̶͍̾̈́́ ̷͔͖̜̔͌̇͜s̷̟̲͂͊e̴͈̩͙̜͛n̶͔̗̜͙̅͘s̷̞̊̒̽̓e̵̪̓̄̊.


Updates to follow in a few weeks.

  1. Immediately after writing this paragraph, I whacked my head on my kitchen cabinet. This is what they call method writing