ROME / MEMIT (Rank-One Model Editing / Mass-Editing Memory In a Transformer)

Experimental

Safety Mechanism

Targeted techniques for editing specific factual associations stored in a transformer's parameters — surgically modifying what a model "knows" without retraining, by intervening on the MLP layers that function as key-value memory stores.

ROME (Meng et al., 2022) treats MLP layers in transformers as key-value stores: the first MLP weight matrix maps input to a 'key' and the second maps that key to a 'value' (the factual association). To edit a fact, ROME computes a rank-one update to the value matrix that changes the output for the target key while minimizing perturbation to other keys. MEMIT extends this to batch editing — modifying thousands of facts simultaneously by distributing updates across multiple MLP layers. Implementation: identify the 'critical layers' for a fact via causal tracing (activation patching), then apply targeted weight updates.

Why Does This Exist?

The most direct current technique for removing specific factual associations from model parameters — demonstrates that targeted knowledge editing is possible

ROME's causal tracing methodology reveals where facts are stored in the network — contributing to the interpretability research program as a byproduct of editing

If you can identify and edit specific facts, you can potentially identify which claims are parametric vs. inferred — a step toward mapping the model's knowledge boundaries

ROME's causal tracing reveals that factual associations are spatially localized in mid-layer MLPs — empirical evidence that knowledge CAN be modular, informing the design of architectures with explicit knowledge boundaries