Torrus Architecture

Configuration processing

The XML configuration is compiled into the database representation by operator's manual request.

A compiled version of configuration is not a one-to-one representation of the XML version. All templates are expanded. Backward restoration of XML from the database is available with the snapshot utility.

A template defines a piece of configuration which can be used in multiple places. Templates can be nested.

The configuration consists of multiple XML files. They are processed in the order as specified in the tree configuration. Each new file is treated as an additive information to the existing tree.

The XML configuration compiler validates all the mandatory parameters.

Data storage

Three types of data stores are used in Torrus:

Git for storing the tree data;
Redis for storing run-time data, sending notifications, and locking;

Tree configuration

The configuration consists of multiple trees. A tree consists of nodes, and each node can be of type "leaf" or "subtree". Subtrees contain child subtrees or leaves, and a leaf does not contain any child elements. Each node has an arbitrary number of parameters. Some parameters can prohibit recursion, but most of parameters are calculated by traversing the tree upwards, until a value is found.

Each node has a path within a tree. A subtree path ends always with a slash, and a leaf path ends with a word character. The top of the tree is identified by a single slash symbol. The node names in the path allow alphanumeric characters, dash and underscore.

Each node is identified by a token. A node token a 40-character SHA-1 checksum calculated from the tree name, followed by a colon, and the path.

`ConfigTree` objects

ConfigTree Perl module provides an API for accessing the configuration trees, as well as other types of data. Each data element is referred to by a token, as follows:

Tree node token is a 40-character SHA-1 checksum of a node as described above.

Tokenset name starts with letter S. The rest is an arbitrary sequence of word characters.

The special token SS is reserved for tokensets list. Also tokenset parameters are inherited from this token's parameters.

View and monitor names must be unique, and must start with a lower case letter.

Git storage

A Torrus instance uses a number of Git branches for each tree configuration. The XML compiler is the only writer, and consumers are only reading from the Git repositories. Both writing and reading is done directly on local Git repositories, so working directories are not needed. If the writer and reader are on the same host, they use the same Git repository. Otherwise, the writer pushes its commits to a remote repository, and the reader pulls from it. The reader sets an exclusive lock on the repository for the time of fetching and merging, so that other readers don't try to pull at the same time.

Each tree has its own set of branches. The TREE_configtree branch contains a full hierarchy of objects, so all parameters that are defined in the input XML are retrievable. Typically the Web UI renderer is consuming this data.

The TREE_srcfiles branch contains XML files that are used by the XML compiler. While processing the XML files, it adds them into this branch in order to track the changes in XML sources.

Also there's a number of agent branches, one per agent instance (collectors and monitors are typical agents): TREE_DAEMON_INSTANCE. Each such branch contains only the information and parameters needed by the agent, so that the agent process can start and update its data as fast as possible.

A Git reference refs/heads/TREE_agents_ref is used to indicate the commit in TREE_configtree branch that corresponds to the current heads in agent branches. This reference is moved when the agent branches are updated.

The TREE_agent_tokens branch is used to store the information which agent branches have which tokens. It is needed for deleting tokens from agent branches when they are deleted from configtree branch. The branch contains two-level 256-way directory hierarchy, for every token, and the JSON files contain arrays with agent branch names where a particular token is used.

The TREE_configtree branch has subdirectories as follows. The directories nodes and children contain JSON files named after the node tokens, arranged into two-level 256-way tree structure. For example, the file for token b7ba0d88a0b14a4e6c3c61f5446aa619a537098f is stored as b7/ba/0d88a0b14a4e6c3c61f5446aa619a537098f.

nodes/: the JSON content defines the node's type, name, parent's token, and parameters.
children/: tor each subtree node, there is a JSON file defining a hash with child tokens as keys and "1" as values.
srcrefs: a JSON hash representing dependencies of nodes from XML files. It's a two-level hash: the first key is the source file name, the second key is the token of topmost dependent node, and the value is "1".
srcglobaldeps: a JSON hash representing the XML source files which define parameter properties, definitions, or templates. The keys are file names, and values are "1".
srcrev: A file containing a JSON scalar referring to the commit in XML sources.
srcincludes: A file containing a JSON hash with source XML files as keys and arrays of included file names as values. The order in the array is the same as the order of "include" statements in the XML file. The key __ROOT__ indicates the XML files where the compilation started. Every source XML file is listed here, and those which do not include other files, have empty arrays.
nodeid/: two-level 256-way directory structure. Each file is a SHA-1 digest of nodeid value. The content of the file is a JSON array of the nodeid value and node token.
nodeidpx/: nodeid prefix searching database. Each nodeid value is split by standard delimiters (two consecutive slashes), and each resulting prefix is used to build a key in this hierarchy. Two-level 256-way directory structure is built from SHA-1 digests of these keys. The directories contain zero-length files representing the SHA-1 digests of nodeid values.
definitions/: each file is a definition name, and the content is a JSON scalar returning the definition value.
other/: JSON objects for views, monitors, and actions definitions. Files are named after the view, monitor, or action name, and the content is a hash with parameters defining each object. The following special files are JSON hashes of object names and "1" as values of corresponding types: __VIEWS__, __MONITORS__, __ACTIONS__.
paramprops: a single JSON file defining parameter properties in two-level hash.

The srcdef structure is mainly required for recursive deletion of nodes if a corresponding XML file is changed or deleted.

Each Git commit refers to a complete and consistent tree structure. If the compiler finds an error, it does not create a new commit, and rolls back to the latest HEAD.

The JSON files within nodes hierarchy are hashes with the following keys and values:

is_subtree: 1 for subtree, 0 for a leaf.
parent: token of the parent node, or empty string if this is the top of the tree.
path: the full node name. Subtree names must end with slash, and leaf names should end with alphanumeric characters.
params: hash with parameter names and values.
vars: hash with variable values (used in setvar, iftrue and iffalse XML statements).
src: optional hash of source XML file names as keys and "1" as values. It's only defined on the topmost node that is affected by a given XML. If an XML file updates a previously defined node, the src content is copied from the nearest parent where it's defined.

The JSON files within other are hashes with the following keys and values:

params: hash with parameter names values.

The agent branches contain JSON files named after the node tokens, arranged into two-level 256-way tree structure. Each daemon that needs a quick access to a subset of leaf nodes (primarily, collector, and also monitor) retrieve the node configurations from this structure. The instance number is a 4-digit lower-case hexademical number. The JSON files are hashes defining all parameter values needed by the daemon. These files are populated by the XML compiler after the tree is processed.

An optional searchdb branch is used for indexing the node parameters in order to provide the search in GUI. It consists of the following directories:

words/TREE/ contains zero-length files in the following hierarchy: KEYWORD/TOKEN/PARAM. If a keyword is matched in the subtree or leaf name, the file name is __NODENAME__.
wordsglobal/ is the same as above, but for global search. In addition, a file called __TREENAME__ contains a JSON scalar with the tree name where this token is defined.
tokens/ is a two-level 256-way hierarchy of directories based on token ID's. These directories contain zero-length files named after keywords.
configtree_ref/ is a directory containing files named after the config tree names. Each file is a JSON scalar indicating the commit ID in corresponding configtree branch.

Redis database

Redis is an in-memory database, supporting key/value hashes and linear arrays, with periodic saving to disk storage. Torrus keeps all run-time and dynamic information in Redis.

All Redis keys that are used within a single Torrus installation are prefixed with a configurable prefix ("torrus:" by default), thus allowing multiple Torrus installations to use the same Redis instance. Further in this document, the prefix is omitted for easier reading.

gitlock:REPOPATH -- this key is used as a mutex that protects a local Git repository from simultaneous initialization by multiple processes. Before accessing the repository in writer mode, the writer sets this Redis key to the current UNIX timestamp.
writer:REPOPATH -- this is a hash representing active Torrus::ConfigTree::Writer objects. Each key is the process ID, and values are the UNIX timestamps when the writer objects were created. Entries older than 24 hours are automatically removed. This hash aims to prevent Git garbage collector from running while there are active compiler processes.
githeads -- this is a hash containing commit numbers written by the compiler. The keys are branch names, and the values are the Git commit numbers of corresponding tops of the branches. The consumer process compares this with the current known commit and pulls the updates if needed.
agent_flush -- a hash with branch names as keys and value of 1 indicating that a corresponding agent needs to re-read its full configuration. The agent deletes the corresponding entry after reading.
tsets:TREE -- hash of tokenset names as keys and "1" as values.
tset:TREE:TSET -- hash of tokenset members. Tokens are the keys, and the values indicate the origins. Currently known origins are "static" and "monitor".
tsetparam:TREE:TSET -- a hash of tokenset parameters.
users -- a hash containing users and groups, as described below.
acl -- a hash containing the access privileges for groups, as described below.
monitor_alarms:TREE is a hash that keeps alarm status information from previous runs of Monitor, with the keys and values as described below.
scheduler_stats:TREE is a hash which stores the runtime statistics of Scheduler tasks. Each key is of structure TYPE:TASKNAME:INSTANCE:PERIOD:OFFSET#VARIABLE>, and the value is a number representing the current value of the variable. Depending on variable purpose, the number is floating point or integer.
serviceid_params is a hash containing properties for each Service ID (exported collector information, usually stored in an SQL database). The keys are Service IDs, and values are JSON hashes describing the properties. Known parameters are: trees, token, dstype, units.
serviceid_tokens is a hash with tokens as keys and Service ID as values.
snmp_failures:TREE -- a hash listing SNMP failures in the collector, as described below.

PubSub channels:

treecommits:TREE -- the value of every new Git commit in TREE_configtree branch is published to this channel.

`users` contents

ua:UID:ATTR => VALUE

User attributes, such as cn (Common name) or userPassword, are stored here. For each user, there is a record consisting of the attribute uid, with the value equal to the user identifier.
uA:UID => ATTR,...

Comma-separated list of attribute names for the given user.
gm:UID => group,...

For each user ID, stores the comma-separated list of groups it belongs to.
ga:GROUP:ATTR => VALUE

Group attributes, such as group description.
gA:GROUP => ATTR,...

Comma-separated list of attribute names for the given group.
G: => GROUP,...

List of all groups

`acl` contents

GROUP:OBJECT:PRIVILEGE => 1

The entry exists if and only if the group members have this privilege over the object given. Most common privilege is DisplayTree, where the object is the tree name.

`monitor_alarms` contents

MNAME:TOKEN => T_SET:T_EXPIRES:STATUS:T_LAST_CHANGE [:ESCALATION[:ESCALATION...]]

Key consists of the monitor name and leaf token. In the value, T_SET is the time when the alarm was raised. If two subsequent runs of Monitor raise the same alarm, T_SET does not change. T_EXPIRES is the timestamp that shows until when it's still important to keep the entry after the alarm is cleared. STATUS is 1 if the alarm is active, and 0 otherwise. T_LAST_CHANGE is the timestamp of last status change. Following values are optional escalation times if escalation events were fired.

If STATUS is 1, the record is kept regardless of timestamps. If STATUS is 0, and the current time is more than T_EXPIRES, the record is not reliable and may be deleted by Monitor.

`serviceid_params` contents

a: => SERVICEID,...

Lists all known service IDs
t:TREE => SERVICEID,...

Lists service IDs exported by a given datasource tree.
p:SERVICEID:PARAM => VALUE

Parameter value for a given service ID. Mandatory parameters are: tree, token, dstype. Optional: units.
P:serviceid => PARAM,...

List of parameter names for a service ID.

`snmp_failures` contents

c:counter => number

A counter with a name. Known names: unreachable, deleted, mib_errors.
h:hosthash => failure:timestamp

SNMP host failure information. Hosthash is a concatenation of hostname, UDP port, and SNMP community, separated by "|". Known failures: unreachable, deleted. Timestamp is a UNIX time of the event.
m:TOKEN => timestamp

MIB failures (noSuchObject, noSuchInstance, and endOfMibView) for a given host, with the tree path of their occurence and the UNIX timestamp.
M:hosthash => number

Count of MIB failures per SNMP host.

Search and indexing service

Searching within trees is implemented in a standalone service, consisting of two parts:

1. the daemon that subscribes to treecommits:* channels and updates its database after every commit;
2. a RESTful API service for retrieving the search results

Torrus Architecture

Configuration processing

Data storage

Tree configuration

ConfigTree objects

Git storage

Redis database

users contents

acl contents

monitor_alarms contents

serviceid_params contents

snmp_failures contents