Property types

Each property defined on a given schema has one of the data types below. The data types are all string-based, but define the semantics of the property.

Type API

class followthemoney.types.common.PropertyType

Base class for all property types.

caption(value: str) str

Return a label for the given property value. This is often the same as the value, but for types like countries or languages, it would return the label, while other values like phone numbers can be formatted to be nicer to read.

clean(text: Any, **kwargs) Optional[str]

Create a clean version of a value of the type, suitable for storage in an entity proxy.

clean_text(text: str, **kwargs) Optional[str]

Specific types can apply their own cleaning routines here (this is called by clean after the value has been converted to a string and null values have been filtered).

compare(left: str, right: str) float

Comparisons are a float between 0 and 1. They can assume that the given data is cleaned, but not normalised.

compare_safe(left: Optional[str], right: Optional[str]) float

Compare, but support None values on either side of the comparison.

compare_sets(left: Sequence[str], right: Sequence[str], func: Callable[[Sequence[float]], float] = <built-in function max>) float

Compare two sets of values and select the highest-scored result.

country_hint(value: str) Optional[str]

Determine if the given value allows us to infer a country that it may be related to (e.g. using a country prefix on a phone number or IBAN).

group: Optional[str] = None

Groups are used to invert all the properties of an entity that have a given type into a single list before indexing them. This way, in Aleph, you can query for countries:gb instead of having to make a set of filters like properties.jurisdiction:gb OR properties.country:gb OR ....

join(values: Sequence[str]) str

Helper function for converting multi-valued FtM data into formats that allow only a single value per field (e.g. CSV). This is not fully reversible and should be used as a last option.

label: Optional[str] = None

A name for this type to be shown to users.

matchable: bool = True

Matchable types allow properties to be compared with each other in order to assess entity similarity. While it makes sense to compare names, countries or phone numbers, the same isn’t true for raw JSON blobs or descriptive text snippets.

max_size: Optional[int] = None

Some types have overall size limitations in place in order to avoid generating entities that are very large (upstream ElasticSearch has a 100MB document limit). Once the total size of all properties of this type has exceed the given limit, an entity will refuse to add further values.

name: str = 'any'

A machine-facing, variable safe name for the given type.

node_id(value: str) str

Return an ID suitable to identify this entity as a typed node in a graph representation of some FtM data. It’s usually the same as the the RDF form.

node_id_safe(value: Optional[str]) Optional[str]

Wrapper for node_id to handle None values.

pivot: bool = False

Pivot property types are like a stronger form of matchable types: they will be used when value-based lookups are used to find commonalities between entities. For example, pivot typed-properties are used to show all the other entities that mention the same phone number, email address or name as the one currently seen by the user.

plural: Optional[str] = None

A plural name for this type which can be used in appropriate places in a user interface.

rdf(value: str) rdflib.term.Identifier

Return an RDF term to represent the given value - either a string literal, or a URI reference.

specificity(value: Optional[str]) float

Return a score for how specific the given value is. This can be used as a weighting factor in entity comparisons in order to rate matching property values by how specific they are. For example: a longer address is considered to be more specific than a short one, a full date more specific than just a year number, etc.

to_dict() Dict[str, Any]

Return a serialisable description of this data type.

validate(text: Any, **kwargs) bool

Returns a boolean to indicate if the given value is a valid instance of the type.

class followthemoney.types.common.EnumType(*args)

Enumerated type properties are used for types which have a defined set of possible values, like languages and countries.

caption(value)

Given a code value, return the label that should be shown to a user.

clean_text(code, guess=False, **kwargs)

All code values are cleaned to be lowercase and trailing whitespace is removed.

property names

Return a mapping from property values to their labels in the current locale.

to_dict()

When serialising the model to JSON, include all values.

validate(code, **kwargs)

Make sure that the given code value is one of the supported set.

class followthemoney.types.registry.Registry

This registry keeps the processing helpers for all property types in the system. They are instantiated as singletons when the system is first loaded. The registry can be used to get a type, which can itself then clean, validate or format values of that type.

add(clazz: Type[followthemoney.types.common.PropertyType]) None

Add a singleton class.

get(name: Union[str, followthemoney.types.common.PropertyType]) Optional[followthemoney.types.common.PropertyType]

For a given property type name, get its type object. This can also be used via getattr, e.g. registry.phone.

get_types(names: List[Union[str, followthemoney.types.common.PropertyType]]) List[followthemoney.types.common.PropertyType]

Get a list of all type names.

Address

address: Addresses

A geographic address used to describe a location of a residence or post box. There is no specified order for the sub-parts of an address (e.g. street, city, postal code), and we should consider introducing an Address schema type to retain fidelity in cases where address parts are specified.

group: addresses

see followthemoney.types.common.PropertyType.group

matchable: True

see followthemoney.types.common.PropertyType.matchable

pivot: True

see followthemoney.types.common.PropertyType.pivot

Checksum

checksum: Checksums

Content hashes calculated using SHA1. Checksum references are used by document-typed entities in Aleph to refer to raw data in the archive (e.g. the document from which the entity is extracted).

Unfortunately, this has some security implications: in order to avoid people getting access to documents for which they know the checksum, properties of this type are scrubbed when submitted via the normal API. Checksums can only be defined by uploading a document to be ingested.

group: checksums

see followthemoney.types.common.PropertyType.group

matchable: True

see followthemoney.types.common.PropertyType.matchable

pivot: True

see followthemoney.types.common.PropertyType.pivot

Country

country: Countries

Properties to define countries and territories. This is completely descriptive and needs to deal with data from many origins, so we support a number of unusual and controversial designations (e.g. the Soviet Union, Transnistria, Somaliland, Kosovo).

group: countries

see followthemoney.types.common.PropertyType.group

matchable: True

see followthemoney.types.common.PropertyType.matchable

Code value

Label

ac

Ascension Island

ad

Andorra

ae

United Arab Emirates

af

Afghanistan

ag

Antigua & Barbuda

ai

Anguilla

al

Albania

am

Armenia

ao

Angola

aq

Antarctica

ar

Argentina

as

American Samoa

at

Austria

au

Australia

aw

Aruba

ax

Åland Islands

az

Azerbaijan

az-nk

Nagorno-Karabakh

ba

Bosnia & Herzegovina

bb

Barbados

bd

Bangladesh

be

Belgium

bf

Burkina Faso

bg

Bulgaria

bh

Bahrain

bi

Burundi

bj

Benin

bl

St. Barthélemy

bm

Bermuda

bn

Brunei

bo

Bolivia

bq

Caribbean Netherlands

br

Brazil

bs

Bahamas

bt

Bhutan

bv

Bouvet Island

bw

Botswana

by

Belarus

bz

Belize

ca

Canada

cc

Cocos (Keeling) Islands

cd

Congo - Kinshasa

cf

Central African Republic

cg

Congo - Brazzaville

ch

Switzerland

ci

Côte d’Ivoire

ck

Cook Islands

cl

Chile

cm

Cameroon

cn

China

cn-xz

Tibet

co

Colombia

cp

Clipperton Island

cr

Costa Rica

csxx

Serbia and Montenegro

cu

Cuba

cv

Cape Verde

cw

Curaçao

cx

Christmas Island

cy

Cyprus

cy-trnc

Northern Cyprus

cz

Czech Republic

dd

East Germany

de

Germany

dg

Diego Garcia

dj

Djibouti

dk

Denmark

dm

Dominica

do

Dominican Republic

dz

Algeria

ea

Ceuta & Melilla

ec

Ecuador

ee

Estonia

eg

Egypt

eh

Western Sahara

er

Eritrea

es

Spain

et

Ethiopia

eu

European Union

ez

Eurozone

fi

Finland

fj

Fiji

fk

Falkland Islands

fm

Micronesia

fo

Faroe Islands

fr

France

ga

Gabon

gb

United Kingdom

gb-nir

Northern Ireland

gb-sct

Scotland

gb-wls

Wales

gd

Grenada

ge

Georgia

ge-ab

Abkhazia

gf

French Guiana

gg

Guernsey

gg-srk

Sark

gh

Ghana

gi

Gibraltar

gl

Greenland

gm

Gambia

gn

Guinea

gp

Guadeloupe

gq

Equatorial Guinea

gr

Greece

gs

South Georgia & South Sandwich Islands

gt

Guatemala

gu

Guam

gw

Guinea-Bissau

gy

Guyana

hk

Hong Kong SAR China

hm

Heard & McDonald Islands

hn

Honduras

hr

Croatia

ht

Haiti

hu

Hungary

ic

Canary Islands

id

Indonesia

ie

Ireland

il

Israel

im

Isle of Man

in

India

io

British Indian Ocean Territory

iq

Iraq

ir

Iran

is

Iceland

it

Italy

je

Jersey

jm

Jamaica

jo

Jordan

jp

Japan

ke

Kenya

kg

Kyrgyzstan

kh

Cambodia

ki

Kiribati

km

Comoros

kn

St. Kitts & Nevis

kp

North Korea

kr

South Korea

kw

Kuwait

ky

Cayman Islands

kz

Kazakhstan

la

Laos

lb

Lebanon

lc

St. Lucia

li

Liechtenstein

lk

Sri Lanka

lr

Liberia

ls

Lesotho

lt

Lithuania

lu

Luxembourg

lv

Latvia

ly

Libya

ma

Morocco

mc

Monaco

md

Moldova

md-pmr

Transnistria

me

Montenegro

mf

St. Martin

mg

Madagascar

mh

Marshall Islands

mk

North Macedonia

ml

Mali

mm

Myanmar (Burma)

mn

Mongolia

mo

Macao SAR China

mp

Northern Mariana Islands

mq

Martinique

mr

Mauritania

ms

Montserrat

mt

Malta

mu

Mauritius

mv

Maldives

mw

Malawi

mx

Mexico

my

Malaysia

mz

Mozambique

na

Namibia

nc

New Caledonia

ne

Niger

nf

Norfolk Island

ng

Nigeria

ni

Nicaragua

nl

Netherlands

no

Norway

np

Nepal

nr

Nauru

nu

Niue

nz

New Zealand

om

Oman

pa

Panama

pe

Peru

pf

French Polynesia

pg

Papua New Guinea

ph

Philippines

pk

Pakistan

pl

Poland

pm

St. Pierre & Miquelon

pn

Pitcairn Islands

pr

Puerto Rico

ps

Palestinian Territories

pt

Portugal

pw

Palau

py

Paraguay

qa

Qatar

qo

Outlying Oceania

re

Réunion

ro

Romania

rs

Serbia

ru

Russia

rw

Rwanda

sa

Saudi Arabia

sb

Solomon Islands

sc

Seychelles

sd

Sudan

se

Sweden

sg

Singapore

sh

St. Helena

si

Slovenia

sj

Svalbard & Jan Mayen

sk

Slovakia

sl

Sierra Leone

sm

San Marino

sn

Senegal

so

Somalia

so-som

Somaliland

sr

Suriname

ss

South Sudan

st

São Tomé & Príncipe

suhh

Soviet Union

sv

El Salvador

sx

Sint Maarten

sy

Syria

sz

Eswatini

ta

Tristan da Cunha

tc

Turks & Caicos Islands

td

Chad

tf

French Southern Territories

tg

Togo

th

Thailand

tj

Tajikistan

tk

Tokelau

tl

Timor-Leste

tm

Turkmenistan

tn

Tunisia

to

Tonga

tr

Turkey

tt

Trinidad & Tobago

tv

Tuvalu

tw

Taiwan

tz

Tanzania

ua

Ukraine

ug

Uganda

um

U.S. Outlying Islands

un

United Nations

us

United States

uy

Uruguay

uz

Uzbekistan

va

Vatican City

vc

St. Vincent & Grenadines

ve

Venezuela

vg

British Virgin Islands

vi

U.S. Virgin Islands

vn

Vietnam

vu

Vanuatu

wf

Wallis & Futuna

ws

Samoa

x-so

South Ossetia

xa

Pseudo-Accents

xb

Pseudo-Bidi

xk

Kosovo

ye

Yemen

yt

Mayotte

yucs

Yugoslavia

za

South Africa

zm

Zambia

zr

Zaire

zw

Zimbabwe

zz

Global

Date

date: Dates

A date or time stamp. This is based on ISO 8601, but meant to allow for different degrees of precision by specifying a prefix. This means that 2021, 2021-02, 2021-02-16, 2021-02-16T21, 2021-02-16T21:48 and 2021-02-16T21:48:52 are all valid values, with an implied precision.

The timezone is always expected to be UTC and cannot be specified otherwise. There is no support for calendar weeks (2021-W7) and date ranges (2021-2024).

group: dates

see followthemoney.types.common.PropertyType.group

matchable: True

see followthemoney.types.common.PropertyType.matchable

Domain

domain: Domains

DNS domain names. Not really used and might be thrown out.

matchable: True

see followthemoney.types.common.PropertyType.matchable

E-Mail Address

email: E-Mail Addresses

Internet mail address (e.g. user@example.com). These are notoriously hard to validate, but we use an irresponsibly simple rule and hope for the best.

pattern: ^[^@\s]+@[^@\s]+\.\w+$
group: emails

see followthemoney.types.common.PropertyType.group

matchable: True

see followthemoney.types.common.PropertyType.matchable

pivot: True

see followthemoney.types.common.PropertyType.pivot

Entity

entity: Entities

A reference to another entity via its ID. This is how entities in FtM become a graph: by pointing at each other using References.

Entity IDs can either be namespaced or plain, depending on the context. When setting properties of this type, you can pass in an entity proxy or dict of the entity, the ID will then be extracted and stored.

pattern: ^[0-9a-zA-Z]([0-9a-zA-Z\.\-]*[0-9a-zA-Z])?$
group: entities

see followthemoney.types.common.PropertyType.group

matchable: True

see followthemoney.types.common.PropertyType.matchable

pivot: True

see followthemoney.types.common.PropertyType.pivot

HTML

html: HTMLs

Properties that contain raw hypertext markup (HTML).

User interfaces rendering properties of this type need to take extreme care not to allow attacks such as cross-site scripting. It is recommended to perform server-side sanitisation, or to not render this property at all.

max_size: 31457280

see followthemoney.types.common.PropertyType.max_size

IBAN

iban: IBANs

An international bank account number, as defined in ISO 13616. IBANs are managed by SWIFT used in the European SEPA payment system.

A noteable aspect of IBANs is that they share a country prefix and validation mechanism, but the specific length of an IBAN is dependent on the country code defined in the first two characters: NO8330001234567 and CY21002001950000357001234567 are both valid values.

group: ibans

see followthemoney.types.common.PropertyType.group

matchable: True

see followthemoney.types.common.PropertyType.matchable

pivot: True

see followthemoney.types.common.PropertyType.pivot

Identifier

identifier: Identifiers

Used for registration numbers and other codes assigned by an authority to identify an entity. This might include tax identifiers and statistical codes.

Since identifiers are high-value criteria when comparing two entities, numbers should only be modelled as identifiers if they are long enough to be meaningful. Four- or five-digit industry classifiers create more noise than value.

group: identifiers

see followthemoney.types.common.PropertyType.group

matchable: True

see followthemoney.types.common.PropertyType.matchable

pivot: True

see followthemoney.types.common.PropertyType.pivot

IP-Address

ip: IP-Addresses

Internet protocol addresses. This supports both addresses used by the protocol versions 4 (e.g. 192.168.1.143) and 6 (e.g. 0:0:0:0:0:ffff:c0a8:18f).

group: ips

see followthemoney.types.common.PropertyType.group

matchable: True

see followthemoney.types.common.PropertyType.matchable

pivot: True

see followthemoney.types.common.PropertyType.pivot

Nested data

json: None

An encoded JSON object. This is used to store raw HTTP headers for documents and some other edge cases. It’s a really bad idea and we should try to get rid of JSON properties.

Language

language: Languages

A human written language. This list is arbitrarily limited for some weird upstream technical reasons, but we’ll happily accept pull requests for additional languages once there is a specific need for them to be supported.

group: languages

see followthemoney.types.common.PropertyType.group

Code value

Label

afr

Afrikaans

ara

Arabic

aze

Azerbaijani

bel

Belarusian

bos

Bosnian

bul

Bulgarian

cat

Catalan

ces

Czech

dan

Danish

deu

German

ell

Greek

eng

English

est

Estonian

fas

Persian

fil

Filipino

fin

Finnish

fra

French

heb

Hebrew

hin

Hindi

hrv

Croatian

hun

Hungarian

hye

Armenian

ind

Indonesian

isl

Icelandic

ita

Italian

jpn

Japanese

kan

Kannada

kat

Georgian

kaz

Kazakh

kir

Kyrgyz

kor

Korean

lav

Latvian

lit

Lithuanian

ltz

Luxembourgish

mkd

Macedonian

mlt

Maltese

mon

Mongolian

msa

Malay

mya

Burmese

nep

Nepali

nld

Dutch

nor

Norwegian

pol

Polish

por

Portuguese

ron

Romanian

rus

Russian

slk

Slovak

slv

Slovenian

spa

Spanish

sqi

Albanian

srp

Serbian

swa

Swahili

swe

Swedish

tgk

Tajik

tgl

Tagalog

tuk

Turkmen

tur

Turkish

ukr

Ukrainian

urd

Urdu

uzb

Uzbek

zho

Chinese

MIME-Type

mimetype: MIME-Types

A MIME media type are a specification of a content type on a network. Each MIME type is assinged by IANA and consists of two parts: the type and sub-type. Common examples are: text/plain, application/json and application/pdf.

MIME type properties do not contain parameters as used in HTTP headers, like charset=UTF-8.

group: mimetypes

see followthemoney.types.common.PropertyType.group

Name

name: Names

A name used for a person or company. This is assumed to be as complete a name as available - when a first name, family name or patronymic are given separately, these are stored to string-type properties instead.

No validation rules apply, and things having multiple names must be considered a perfectly ordinary case.

group: names

see followthemoney.types.common.PropertyType.group

matchable: True

see followthemoney.types.common.PropertyType.matchable

pivot: True

see followthemoney.types.common.PropertyType.pivot

Number

number: Numbers

A numeric value, like the size of a piece of land, or the value of a contract. Since all property values in FtM are strings, this is also a string and there is no specified format (e.g. 1,000.00 vs. 1.000,00).

In the future we might want to enable annotations for format, units, or even to introduce a separate property type for monetary values.

Phone number

phone: Phone numbers

A phone number in E.164 format. This means that phone numbers always include an international country prefix (e.g. +38760183628). The cleaning and validation functions for this try to be smart about by accepting a list of countries as an argument in order to add the number prefix.

When adding a property of this type to an entity, any country-type properties defined for the entity are considered for validation. That means that adding a phone number to an entity before adding a country can have a different validation outcome from doing the two operations the other way around. Always define the country first.

group: phones

see followthemoney.types.common.PropertyType.group

matchable: True

see followthemoney.types.common.PropertyType.matchable

pivot: True

see followthemoney.types.common.PropertyType.pivot

Label

string: Labels

A simple string property with no additional semantics.

Text

text: Texts

Longer text fragments, such as descriptions or document text. Unlike string properties, it might make sense to treat properties of this type as full-text search material.

max_size: 31457280

see followthemoney.types.common.PropertyType.max_size

Topic

topic: Topics

Topics define a controlled vocabulary of terms applicable to some entities, such as companies and people. They describe categories of journalistic interest which may apply to the given entity, for example if a given person is a criminal or a politician.

Besides the informative value, topics are ultimately supposed to bear fruits in the context of graph-based data analysis, where they would enable queries such as find all paths between a government procurement award and a politician.

group: topics

see followthemoney.types.common.PropertyType.group

Code value

Label

asset.frozen

Frozen asset

corp.offshore

Offshore

corp.shell

Shell company

crime

Crime

crime.boss

Criminal leadership

crime.cyber

Cybercrime

crime.fin

Financial crime

crime.fraud

Fraud

crime.terror

Terrorism

crime.theft

Theft

crime.traffick

Trafficking

crime.traffick.drug

Drug trafficking

crime.traffick.human

Human trafficking

crime.war

War crimes

ctx.poi

Person of interest

ctx.sanctioned

Sanctioned entity

fin

Financial services

fin.adivsor

Financial advisor

fin.bank

Bank

fin.fund

Fund

gov

Government

gov.igo

Intergovernmental organization

gov.muni

Municipal government

gov.national

National government

gov.soe

State-owned enterprise

gov.state

State government

mil

Military

pol.party

Political party

pol.union

Union

rel

Religion

role.acct

Accountant

role.act

Activist

role.civil

Civil servant

role.diplo

Diplomat

role.journo

Journalist

role.judge

Judge

role.lawyer

Lawyer

role.pep

Politician

role.rca

Associate

role.spy

Spy

URL

url: URLs

A uniform resource locator (URL). This will perform some normalisation on the URL so that it’s sure to be using valid encoding/quoting, and to make sure the URL has a schema (e.g. ‘http’, ‘https’, …).

group: urls

see followthemoney.types.common.PropertyType.group

matchable: True

see followthemoney.types.common.PropertyType.matchable

pivot: True

see followthemoney.types.common.PropertyType.pivot