doc/dev: architecture update and improvements

author: Aleš Mrázek <ales.mrazek@nic.cz> 2024-05-13 13:17:15 +0200
committer: Aleš Mrázek <ales.mrazek@nic.cz> 2024-07-02 14:07:48 +0200
commit: d320ecd21239786a7ef7793a0edab41c058dbcca (patch)
tree: 446e8624c140512e3aa2fd98918d8fa0075dbc32 /doc
parent: doc: architecture schemas improvements (diff)
download: knot-resolver-d320ecd21239786a7ef7793a0edab41c058dbcca.tar.xz
knot-resolver-d320ecd21239786a7ef7793a0edab41c058dbcca.zip
5 files changed, 73 insertions, 28 deletions
diff --git a/doc/dev/architecture-gc.rst b/doc/dev/architecture-gc.rst
index b57c857c..38e8f5b5 100644
--- a/doc/dev/architecture-gc.rst
+++ b/doc/dev/architecture-gc.rst
@@ -1,12 +1,12 @@
-*****************
-``kres-cache-gc``
-*****************
+********
+cache-gc
+********
 
-The garbage collector is a simple component which keeps the shared cache from overfilling.
-Every second it estimates cache usage and if over 80%, records get deleted in order to free 10%.  (Parameters can be configured.)
-
-The freeing happens in a few passes.  First all items are classified by their estimated usefulness, in a simple way based on remaining TTL, type, etc.
-From this histogram it's computed which "level of usefulness" will become the threshold, so that roughly the planned total size gets freed.
-Then all items are passed to collect the set of keys to delete, and finally the deletion is performed.
-As longer transactions can cause issues in LMDB, all passes are split into short batches.
+The garbage collector is a simple component that keeps the shared cache from filling up.
+Every second it estimates the cache usage and if it is over 80%, it deletes records to free up 10%.
+These parameters are configurable.
 
+The freeing happens in a few passes. First all items are classified by their estimated usefulness, in a simple way based on remaining TTL, type, etc.
+From this histogram, it's calculated which "level of usefulness" will become the threshold, so that roughly the planned total size will be freed.
+Then all items are passed to collect the set of keys to be deleted, and finally the deletion is performed.
+Since longer transactions can cause problems in the LMDB cache, all passes are split into short batches.
diff --git a/doc/dev/architecture-kresd.rst b/doc/dev/architecture-kresd.rst
index 783fbb8a..aac043b0 100644
--- a/doc/dev/architecture-kresd.rst
+++ b/doc/dev/architecture-kresd.rst
@@ -1,3 +1,3 @@
-*********
-``kresd``
-*********
-\ No newline at end of file
+*****
+kresd
+*****
+\ No newline at end of file
diff --git a/doc/dev/architecture-manager.rst b/doc/dev/architecture-manager.rst
index 4e3371a9..989b4f67 100644
--- a/doc/dev/architecture-manager.rst
+++ b/doc/dev/architecture-manager.rst
@@ -1,6 +1,6 @@
-****************
-``kres-manager``
-****************
+*******
+manager
+*******
 
 The manager is a component written in Python and a bit of C used for native extension modules. The main goal of the manager is to ensure the system is set up according to a given configuration, provide a user-friendly interface. Performance is only secondary to correctness.
 
diff --git a/doc/dev/architecture-pl.rst b/doc/dev/architecture-pl.rst
new file mode 100644
index 00000000..d5fd80f0
--- /dev/null
+++ b/doc/dev/architecture-pl.rst
@@ -0,0 +1,25 @@
+*************
+policy-loader
+*************
+
+The ``policy-loader`` is a new special kresd instance ensuring that configured policies are loaded into the rules database where they are made available to all running kresd workers. 
+If the policies are loaded successfully, the ``policy-loader`` exits automatically, otherwise it exits with an error code that is detected by Supervisor.
+
+
+The ``policy-loader`` is only triggered when there are the policies relevant configuration changes, or when the resolver is cold started.
+This eliminates the need to restart all running kresd workers if only the policies have changed.
+The running kresd workers are only notified of changes in the rules database by their control socket using the ``kr_rules_reset()`` function.
+The policies are all configuration options located under the ``views``, ``local-data`` and ``forward`` sections.
+
+
+The kresd workers are only fully restarted when a relevant configuration change is made to them (everything else outside the policies), or when the resolver is cold started.
+The same as for the kresd workers applies to the kresd canary process, which is always run before the kresd workers to validate the new configuration.
+The manager always waits for the ``policy-loader`` to finish before working with other processes.
+
+
+The resolver's cold start
+-------------------------
+
+First, the ``policy-loader`` is started and the manager waits for the policies to finish loading into the rules database.
+Then the kresd canary process is started to validate the configuration, and then all the kresd workers are started.
+The resolver will not start if any of the operations fail.
diff --git a/doc/dev/architecture.rst b/doc/dev/architecture.rst
index 79084e5b..dbc4eea8 100644
--- a/doc/dev/architecture.rst
+++ b/doc/dev/architecture.rst
@@ -2,42 +2,61 @@
 System architecture
 *******************
 
-Knot Resolver is split into several components, namely the manager, ``kresd`` and the garbage collector. In addition to these custom components, we also rely on `supervisord <http://supervisord.org/>`_.
+Knot Resolver consists of several independent components that are managed by the ``manager`` which combines them into one functional unit.
+The components are: ``kresd`` the resolving daemon, ``cache-gc`` the cache garbage collector, and ``policy-loader`` which loads configured policy rules.
+In addition to these custom components, we also rely on `supervisord <http://supervisord.org/>`_, which handles the actual process management.
 
 .. image:: ../architecture-schema.svg
     :width: 100%
-    :alt: Diagram showing process tree and contol relationship between Knot Resolver components. Supervisord is a parent to all processes, namely manager, kresd instances and gc. Manager on the other hand controls every other component and what it does.
+    :alt: Diagram showing the process tree and control relationships between Knot Resolver components.
+          Supervisord is a parent to all processes, namely manager, kresd instances and gc.
+          Manager on the other hand controls every other component and what it does.
 
-
-There are two different control structures in place. Semantically, the manager controls every other component in Knot Resolver. It processes configuration and passes it onto every other component. As a user you will always interact with the manager (or kresd). At the same time though, the manager is not the root of the process hierarchy, Supervisord sits at the top of the process tree and runs everything else.
+There are two different control structures in place.
+Semantically, the manager controls every other component in Knot Resolver.
+It processes configuration and passes it to each component.
+As a user you will always interact with the manager.
+At the same time though, the manager is not the root of the process hierarchy,
+supervisord sits at the top of the process tree and runs everything else.
 
 .. note::
-    The rationale for this inverted process hierarchy is mainly stability. Supervisord sits at the top because it is a reliable and stable software we can depend upon. It also does not process user input and its therefore shielded from data processing bugs. This way, any component in Knot Resolver can crash and restart without impacting the rest of the system.
+    The reason for this inverted process hierarchy is mainly stability.
+    Supervisord is at the top because it is a reliable and stable software that we can rely on.
+    In addition, it does not process user input and is therefore shielded from data processing errors.
+    This way, any component in Knot Resolver itself can recover from potential crashes without affecting the rest of the system.
 
 
 Knot Resolver startup
 =====================
 
-The inverted process hierarchy complicates Resolver's launch procedure. You might notice it when reading manager's logs just after start. What happens on cold start is:
+The inverted process hierarchy makes the resolver startup procedure a bit more complicated.
+You may notice this when reading the manager's logs immediately after startup.
+
+What happens on cold start is:
 
-1. Manager starts, reads its configuration and generates new supervisord configuration. Then, it starts supervisord by using ``exec``.
-2. Supervisord loads it's configuration, loads our extensions and start a new instance of manager.
-3. Manager starts again, this time as a child of supervisord. As this is desired state, it loads the configuration again and commands supervisord that it should start new instances of ``kresd``.
+1. The manager starts, reads its configuration and generates a new supervisord configuration.
+   Then, it starts supervisord with the ``exec`` syscall, which causes supervisord to *replace* the manager process.
+2. The supervisord loads its configuration, loads our custom extensions and starts a new instance of the manager.
+3. The manager starts again, this time as a child of the supervisord instance.
+   Since this is the desired state, it reloads the configuration again and instructs the supervisord to start the other components of the resolver.
 
 
 Failure handling
 ================
 
-Knot Resolver is designed to handle failures automatically. Anything except for supervisord will automatically restart. If a failure is irrecoverable, all processes will stop and nothing will be left behind in a half-broken state. While a total failure like this should never happen, it is possible and you should not rely on single instance of Knot Resolver for a highly-available system.
+Knot Resolver is designed to handle failures automatically.
+Everything except the supervisord is automatically restarted after a failure.
+If a failure is unrecoverable, all processes are killed and nothing is left behind in a half-broken state.
+While a total failure like this should not happen, it is possible and you should not rely on specific instances of Knot Resolver in a highly-available system.
 
 .. note::
-    The ability to restart most of the components without downtime means, that Knot Resolver is able to transparently apply updates while running.
+    The ability to restart most of the components without downtime means that Knot Resolver can transparently apply updates while running.
 
 
 Individual components
 =====================
 
-You can learn more about architecture of individual Resolver components in the following chapters.
+Learn more about the architecture of each component in the following chapters:
 
 .. toctree::
     :titlesonly:
@@ -46,3 +65,4 @@ You can learn more about architecture of individual Resolver components in the f
     architecture-manager
     architecture-kresd
     architecture-gc
+    architecture-pl
author	Aleš Mrázek <ales.mrazek@nic.cz>	2024-05-13 13:17:15 +0200
committer	Aleš Mrázek <ales.mrazek@nic.cz>	2024-07-02 14:07:48 +0200
commit	d320ecd21239786a7ef7793a0edab41c058dbcca (patch)
tree	446e8624c140512e3aa2fd98918d8fa0075dbc32 /doc
parent	doc: architecture schemas improvements (diff)
download	knot-resolver-d320ecd21239786a7ef7793a0edab41c058dbcca.tar.xz knot-resolver-d320ecd21239786a7ef7793a0edab41c058dbcca.zip