Namespacing

We like our abstractions like our offshore banks: leaky.

Entity ID namespaces are a security mechanism related to the Aleph search index.

Aleph allows the user (via mappings or the API) to create arbitary entity IDs. Entity IDs that are controlled by the user and not the system are unusual. However, this makes it possible to generate bulk data outside Aleph, and then load entities into the system as a continuous Streams.

The problem is that having user controlled entity IDs increases the chance of conflict in the search index.

Namespacing works around this by making each entity ID consist of two parts: one controlled by the client, the other controlled by the system. The second part of the ID is called its signature:

entity_id.a40a29300ac6bb79dd2f911e77bbda7a3b502126

The signature is generated as hmac(entity_id, dataset_id). This guarantees that the combined ID is specific to a dataset, without needing an (expensive) index look up of each ID first. It can also be generated on the client or the server without compromising isolation.

class followthemoney.namespace.Namespace(name=None)

Namespaces are used to partition entity IDs into different units, which traditionally represent a dataset, collection or source.

See module docstring for details.

SEP = '.'
apply(proxy, shallow=False)

Rewrite an entity proxy so all IDs mentioned are limited to the namespace.

classmethod make(name)
classmethod parse(entity_id)

Split up an entity ID into the plain ID and the namespace signature. If either part is missing, return None instead.

sign(entity_id)

Apply a namespace signature to an entity ID, removing any previous namespace marker.

signature(entity_id)

Generate a namespace-specific signature.

classmethod strip(entity_id)
verify(entity_id)

Check if the signature matches the current namespace.