Entity and schema API

The core interfaces of followthemoney are simple: each running instance of the library has a Model singleton, which holds a set of Schema definitions (e.g. Person). Each schema defines a set of properties (e.g. name, birthDate) which give meaning to how values can be associated with entities of a given schema.

The model is also used to instantiate entity proxies - objects that allow the creation and use of entity data, based on the rules defined by an associated schema.

Example

For an illustration of how these objects interact, imagine the following script:

# Load the standard instance of the model
from followthemoney import model

## Schema metadata
# Access a schema metadata object
schema = model.get('Person')

# Access a property metadata object
prop = schema.get('birthDate')

# You can also import the type registry that lets you access type info easily:
from followthemoney.types import registry
assert prop.type == registry.date

## Working with entities and entity proxies
# Next, let's instantiate a proxy object for a new Person entity:
entity = model.make_entity(schema)

# First, you'll want to assign an ID to the entity. You can do this directly:
entity.id = 'john-smith'

# Or you can use a hashing function to make a safe ID:
entity.make_id('John Smith', '1979')

# Now, let's assign this entity a birthDate property (see above):
entity.add(prop, '1979-08-23')

# You can also assign properties by name:
entity.add('firstName', 'John')
entity.add('lastName', 'Smith')
entity.add('name', 'John Smith')

# Adding a property value will perform some validation:
entity.add('nationality', 'Atlantis')
assert not entity.has('nationality')
entity.add('nationality', 'Germani', fuzzy=True)
assert 'de' == entity.first('nationality')

# Lets make a second entity, this time for a passport:
passport_entity = model.make_entity('Passport')
passport_entity.make_id(entity.id, 'C716818')
passport_entity.add('number', 'C716818')

# Entities can link to other entities like this:
passport_entity.add('holder', entity)
# Which is the same as:
passport_entity.add('holder', entity.id)

# Finally, you can turn the contents of the entity proxy into a plain dictionary
# that is suitable for JSON serialization or storage in a database:
data = entity.to_dict()
assert data.get('id') == entity.id

# If you want to turn this back into an entity proxy:
entity2 = model.get_proxy(data)
assert entity2 == entity

The library offers a much more complex set of operations - but entity proxies, schemata, properties, and the model are the key elements to understand.

Entity proxy

The entity proxy is a wrapper object for FtM data. It can be used as a factory in order to build an entity, or as a simple abstraction to query the properties of an existing entity.

class followthemoney.proxy.EntityProxy(model, data, key_prefix=None, cleaned=True)

A wrapper object for an entity, with utility functions for the introspection and manipulation of its properties.

This is the main working object in the library, used to generate, validate and emit data.

add(prop, values, cleaned=False, quiet=False, fuzzy=False)

Add the given value(s) to the property if they are valid for the type of the property.

Parameters
  • prop – can be given as a name or an instance of Property.

  • values – either a single value, or a list of values to be added.

  • cleaned – should the data be normalised before adding it.

  • quiet – a reference to an non-existent property will return an empty list instead of raising an error.

  • fuzzy – when normalising the data, should fuzzy matching be allowed.

property caption

The user-facing label to be used for this entity. This checks a list of properties defined by the schema (caption) and returns the first available value. If no caption is available, return the schema label.

clone()

Make a deep copy of the current entity proxy.

context

If the input dictionary for the entity proxy contains fields other than id, schema or properties, they will be kept in here and re-added upon serialization.

property countries

Get the set of all country-type values set of the entity.

property country_hints

Some property types, such as phone numbers and IBAN codes imply a country that may be associated with the entity. This list can be used for a more generous matching approach than the actual country values.

edgepairs()

Return all the possible pairs of values for the edge source and target if the schema allows for an edge representation of the entity.

first(prop, quiet=False)

Get only the first value set for the property, in no particular order.

Parameters
  • prop – can be given as a name or an instance of Property.

  • quiet – a reference to an non-existent property will return an empty list instead of raising an error.

Returns

A value, or None.

classmethod from_dict(model, data, cleaned=True)

Instantiate a proxy based on the given model and serialised dictionary.

Use followthemoney.model.Model.get_proxy() instead.

get(prop, quiet=False)

Get all values of a property.

Parameters
  • prop – can be given as a name or an instance of Property.

  • quiet – a reference to an non-existent property will return an empty list instead of raising an error.

Returns

A list of values.

get_type_inverted(matchable=False)

Return all the values of the entity arranged into a mapping with the group name of their property type. These groups include countries, addresses, emails, etc.

get_type_values(type_, matchable=False)

All values of a particular type associated with a the entity. For example, this lets you return all countries linked to an entity, rather than manually checking each property to see if it contains countries.

Parameters
  • type – The type object to be searched.

  • matchable – Whether to return only property values marked as matchable.

has(prop, quiet=False)

Check to see if the given property has at least one value set.

Parameters
  • prop – can be given as a name or an instance of Property.

  • quiet – a reference to an non-existent property will return an empty list instead of raising an error.

Returns

a boolean.

id

A unique identifier for this entity, usually a hashed natural key, a UUID, or a very simple slug. Can be signed using a Namespace.

iterprops()

Iterate across all the properties for which a value is set in the proxy (but do not return their values).

itervalues()

Iterate across all values in the proxy one by one, each given as a tuple of the property and the value.

key_prefix

When using make_id() to generate a natural key for this entity, the prefix will be added to the ID as a salt to make it easier to keep IDs unique across datasets. This is somewhat redundant following the introduction of Namespace.

make_id(*parts)

Generate a (hopefully unique) ID for the given entity, composed of the given components, and the key_prefix defined in the proxy.

merge(other)

Merge another entity proxy into this one. This will try and find the common schema between both entities and then add all property values from the other entity into this one.

property names

Get the set of all name-type values set of the entity.

pop(prop, quiet=True)

Remove all the values from the given property and return them.

Parameters
  • prop – can be given as a name or an instance of Property.

  • quiet – a reference to an non-existent property will return an empty list instead of raising an error.

Returns

a list of values, possibly empty.

property properties

Return a mapping of the properties and set values of the entity.

remove(prop, value, quiet=True)

Remove a single value from the given property. If it is not there, no action takes place.

Parameters
  • prop – can be given as a name or an instance of Property.

  • value – will not be cleaned before checking.

  • quiet – a reference to an non-existent property will return an empty list instead of raising an error.

schema

The schema definition for this entity, which implies the properties That can be set on it.

set(prop, values, cleaned=False, quiet=False)

Replace the values of the property with the given value(s).

Parameters
  • prop – can be given as a name or an instance of Property.

  • values – either a single value, or a list of values to be added.

  • cleaned – should the data be normalised before adding it.

  • quiet – a reference to an non-existent property will return an empty list instead of raising an error.

to_dict()

Serialise the proxy into a dictionary with the defined properties, ID, schema and any contextual values that were handed in initially. The resulting dictionary can be used to make a new proxy, and it is commonly written to disk or a database.

to_full_dict(matchable=False)

Return a serialised version of the entity with inverted type groups mixed in. See get_type_inverted().

triples(qualified=True)

Serialise the entity into a set of RDF triple statements. The statements include the property values, an RDF#type definition that refers to the entity schema, and a SKOS#prefLabel with the entity caption.

Schema management

class followthemoney.schema.Schema(model, name, data)

A type definition for a class of entities that have certain properties.

Schemata are arranged in a multi-rooted hierarchy: each schema can have multiple parent schemata from which it inherits all of their properties. A schema can also have descendant child schemata, which, in turn, add further properties. Schemata are usually accessed via the model, which holds all available definitions.

abstract

Do not store or emit entities of this type, it is used only for inheritance.

caption

Mark a set of properties to be used for the entity’s caption. They will be checked in order and the first existant value will be used.

descendants

Inverse of schemata, all derived child types of this schema and their children.

property description

A longer description of the semantics of the schema.

edge

Flag to indicate if this schema should be represented by an edge (rather than a node) when the data is converted into a property graph.

edge_directed

Flag to indicate if the edge should be presented as directed to the user, e.g. by showing an error at the target end of the edge.

property edge_label

Description label for edges derived from entities of this schema.

extends

Direct parent schemata of this schema.

featured

Mark a set of properties as important, i.e. they should be shown first, or in an abridged view of the entity. In Aleph, these properties are included in tabular entity listings.

generate()

While loading the schema, this function will validate and load the hierarchy, properties, and flags of the definition.

generated

Entities with this type are generated by the system - for example, via ingest-file. The user should not be offered an option to create them in the interface.

get(name)

Retrieve a property defined for this schema by its name.

hidden

Hide this schema in listings.

is_a(other)

Check if the schema or one of its parents is the same as the given candidate other.

property label

User-facing name of the schema.

matchable

Try to perform fuzzy matching. Fuzzy similarity search does not make sense for entities which have a lot of similar names, such as land plots, assets etc.

property matchable_schemata

Return the set of schemata to which it makes sense to compare with this schema. For example, it makes sense to compare a legal entity with a company, but it does not make sense to compare a car and a person.

name

Machine-readable name of the schema, used for identification.

names

All names of schemata.

property plural

Name of the schema to be used in plural constructions.

properties

The full list of properties defined for the entity, including those inherited from parent schemata.

required

Mark a set of properties as required. This is applied only when an entity is created by the user - bulk created entities will slip through even if it is technically invalid.

schemata

All parents of this schema (including indirect parents and the schema itself).

property sorted_properties

All properties of the schema in the order in which they should be shown to the user (alphabetically, with captions and featured properties first).

property source_prop

The entity property to be used as an edge source.

property target_prop

The entity property to be used as an edge target.

to_dict()

Return schema metadata, including all properties, in a serializable form.

uri

RDF identifier for this schema when it is transformed to a triple term.

validate(data)

Validate a dictionary against the given schema. This will also drop keys which are not valid as properties.

class followthemoney.property.Property(schema, name, data)

A definition of a value-holding field on a schema. Properties define the field type and other possible constraints. They also serve as entity to entity references.

RESERVED = ['id', 'caption', 'schema', 'schemata']

Invalid property names.

property description

A longer description of the semantics of this property.

generate()

Setup method used when loading the model in order to build out the reverse links of the property.

hidden

This property should not be shown or mentioned in the user interface.

property label

User-facing title for this property.

matchable

Whether this property should be used for matching and cross-referencing.

name

Machine-readable name for this property.

qname

Qualified property name, which also includes the schema name.

range

If the property is of type entity, the set of valid schema to be added in this property can be constrained. For example, an asset can be owned, but a person cannot be owned.

reverse

When a property points to another schema, a stub reverse property is added as a place to store metadata to help display the link in inverted views.

schema

The schema which the property is defined for. This is always the most abstract schema that has this property, not the possible child schemata that inherit it.

specificity(value)

Return a measure of how precise the given value is.

stub

When a property points to another schema, a reverse property is added for various administrative reasons. These properties are, however, not real and cannot be written to. That’s why they are marked as stubs and adding values to them will raise an exception.

to_dict()

Return property metadata in a serializable form.

type

The data type for this property.

uri

RDF term for this property (i.e. the predicate URI).

validate(data)

Validate that the data should be stored.

Since the types system doesn’t really have validation, this currently tries to normalize the value to see if it passes strict parsing.

class followthemoney.model.Model(path)

A collection of all the schemata available in followthemoney. The model provides some helper functions to find schemata, properties or to instantiate entity proxies based on the schema metadata.

common_schema(left, right)

Select the most narrow of two schemata.

When indexing data from a dataset, an entity may be declared as a LegalEntity in one query, and as a Person in another. This function will select the most specific of two schemata offered. In the example, that would be Person.

generate()

Loading the model is a weird process because the schemata reference each other in complex ways, so the generation process cannot be fully run as schemata are being instantiated. Hence this process needs to be called once all schemata are loaded to finalise dereferencing the schemata.

get(name)

Get a schema object based on a schema name. If the input is already a schema object, it will just be returned.

get_proxy(data, cleaned=True)

Create an entity proxy to reflect the entity data in the given dictionary. If cleaned is disabled, all property values are fully re-validated and normalised. Use this if handling input data from an untrusted source.

get_qname(qname)

Get a property object based on a qualified name (i.e. schema:property).

get_type_schemata(type_)

Return all the schemata which have a property of the given type.

make_entity(schema, key_prefix=None)

Instantiate an empty entity proxy of the given schema type.

make_mapping(mapping, key_prefix=None)

Parse a mapping that applies (tabular) source data to the model.

map_entities(mapping, key_prefix=None)

Given a mapping, yield a series of entities from the data source.

properties

All properties defined in the model.

schemata

A mapping with all schemata, organised by their name.

to_dict()

Return metadata for all schemata and properties, in a serializable form.

Helper utilities

followthemoney.helpers.entity_filename(proxy, base_name=None, extension=None)

Derive a safe filename for the given entity.

followthemoney.helpers.inline_names(entity, related)

Attempt to solve a weird UI problem. Imagine we are showing a list of payments between a sender and a beneficiary to a user. They may now conduct a search for a term present in the sender or recipient name, but there will be no result, because the name is only indexed with the parties, but not in the payment. This is part of a partial work-around to that.

This is really bad in theory, but really useful in practice. Shoot me.

followthemoney.helpers.name_entity(entity)

If an entity has multiple names, pick the most central one and set all the others as aliases. This is awkward given that names are not special and may not always be the caption.

followthemoney.helpers.remove_checksums(proxy)

When accepting entities via a web API, it would consistute a security risk to allow a user to submit checksum-type properties. These can be traded in for access to said files if they exist in the underlying content-addressed storage. It seems safest to just remove all checksums from entities when they are untrusted user input.

followthemoney.helpers.remove_prefix_date_values(values)

See remove_prefix_dates.

followthemoney.helpers.remove_prefix_dates(entity)

If an entity has multiple values for a date field, you may want to remove all those that are prefixes of others. For example, if a Person has both a birthDate of 1990 and of 1990-05-01, we’d want to drop the mention of 1990.

followthemoney.helpers.simplify_provenance(proxy)

If there are multiple dates given for some of the provenance fields, we can logically conclude which one is the most meaningful.

followthemoney.util.get_entity_id(obj: Any) Optional[str]

Given an entity-ish object, try to get the ID.

followthemoney.util.key_bytes(key: Any) bytes

Convert the given data to a value appropriate for hashing.

followthemoney.util.merge_context(left: Dict[followthemoney.util.K, followthemoney.util.V], right: Dict[followthemoney.util.K, followthemoney.util.V]) Dict[followthemoney.util.K, List[followthemoney.util.V]]

When merging two entities, make lists of all the duplicate context keys.