.. .. Vortex Link .. .. This software and documentation are Copyright 2010 to 2018 ADLINK .. Technology Limited, its affiliated companies and licensors. All rights .. reserved. .. .. Licensed under the ADLINK Software License Agreement Rev 2.7 2nd October .. 2014 (the "License"); you may not use this file except in compliance with .. the License. .. You may obtain a copy of the License at: .. docs/LICENSE.html .. .. See the License for the specific language governing permissions and .. limitations under the License. .. .. raw:: html .. role:: red .. raw:: html .. role:: green .. _`Load balancing Fault tolerance`: ################################ Load balancing & Fault tolerance ################################ ********** Redundancy ********** For both fault tolerance and load balancing purposes, we may need to deploy several Vortex Link services replicas in the subsystems. Indeed : + We don't want the Vortex Link services to be single points of failure. So replicating the services allows the replicas to act as fallbacks if one of the services fails. + We may need to balance the traffic load across several services and hosts. .. figure:: ./images/LB_FT.png :alt: Services redundancy :width: 40em :align: center **Services redundancy** .. _cluster_id: ********************* Clusters & Cluster ID ********************* The set of replicated services deployed to serve a subsystem is called a cluster. Services deployed in a cluster need to behave differently than when deployed standalone. They also need to know which other services are part of the same cluster or not. Consequently each cluster must be identified in the system by a unique cluster id string. By default a service will consider being standalone and not being part of any cluster. To activate redundancy and indicate to the service it is supposed to be in a cluster, the ``link.cluster.id`` property must be set. This property also indicates to the service to which cluster it belongs. So all services of a same cluster must be configured with the same cluster id. Example: .. code-block:: properties link.cluster.id="PublicLink" All services in a cluster must be configured with the same service level (see :ref:`config_link_serviceLevel`). In a system, it is not mandatory to replicate all the services in all the subsystems. A "standalone" service deployed alone in a subsystem is able to interoperate with replicas deployed in other subsystems. Note also that a "standalone" service offers a smallest discovery and route establishement time than a cluster service. So if you know that a service will not be redunded it is better to deploy it as "standalone" (i.e. not setting its cluster id). .. figure:: ./images/Clusters.png :alt: Clusters :width: 40em :align: center **Clusters** .. _automatic_cluster_id: Automatic cluster id ==================== It is possible to let Vortex Link services automatically determine which services belong to the same cluster. When automatic cluster id is activated, Vortex Link services will consider all services deployed in the same IP subnetwork to belong to the same cluster. To do so, Vortex Link services will use the broadcast address of the network interface it uses as cluster id. If both UDP and TCP transport are activated but configured to use different network interfaces, the UDP network interface will be used to determine the cluster id string. For example, assume a Vortex service configured to use the following network interface: .. code-block:: bash eth0 Link encap:Ethernet HWaddr 01:23:45:67:89:AB inet addr:192.168.2.2 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::21d:92ff:fede:499b/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 If this service is also configured with automatic cluster id, it will use string ``"192.168.2.255"`` as cluster id. To activate automatic cluster id, the ``link.cluster.id`` property must be set to value ``"auto"``. Example: .. code-block:: properties link.cluster.id="auto" **WARNING:** when you deploy different clusters on different LANs, you must check that those LANs don't have the same broadcast address. If they have, you must not use the ``"auto"`` value for cluster ids. Similarly, if you want to test the deployment of different clusters in a same LAN, you must not use the ``"auto"`` value for cluster ids. .. _load_balancing_plugins: ********************** Load balancing plugins ********************** A lot of different policies could be applied to balance the several data flows across the several service replicas of a cluster. In some use cases, some policies may be more appropriate but in other cases other policies may be more optimal. So the load balancing decisions are made by plugins. The ``link.loadBalancing.pluginClass`` or ``link.loadBalancing.pluginClass`` property can be used to configure which plugin implementation should be used by the Vortex service. This property must be set to the fully qualified class name of the plugin implementation that should be used. All services of a same cluster must be configured to use the same plugin implementation. Example: .. code-block:: properties link.loadBalancing.pluginClass=vortex.lb.plugins.PerParticipantHashPlugin Vortex Link services are delivered with two plugin implementations : + `PerParticipantHashPlugin`_ + `PerWriterHashPlugin`_ By default, Vortex Link services will use the `PerParticipantHashPlugin`_. PerParticipantHashPlugin ======================== **Fully qualified name :** ``vortex.lb.plugins.PerParticipantHashPlugin`` This is the default plugin. It will make all data coming from the same user Participant (thus typically the same application) to be routed by the same replica in the cluster. The election of the replica that will route all the data flow from a Participant is made by a Rendezvous Hashing algorithm, based on the Participant identifier (GUID). Benefits and drawbacks of this plugin: + When user applications use TCP transport, this implementation induce less TCP connections than the PerWriterHashPlugin, so less resource consumption on both application and services side. + This implementation offers a less optimal balancing than the PerWriterHashPlugin. .. figure:: ./images/PerPartPlugin.png :alt: PerParticipantHashPlugin :width: 30em :align: center **PerParticipantHashPlugin** PerWriterHashPlugin =================== **Fully qualified name :** ``vortex.lb.plugins.PerWriterHashPlugin`` This plugin will make all data coming from the same user DataWriter to be routed by the same replica. The election of the replica that will route all the data flow from a DataWriter is made by a Rendezvous Hashing algorithm, based on the DataWriter identifier (GUID). Benefits and drawbacks of this plugin: + This implementation offers a better balancing than the PerParticipantHashPlugin. + When user applications use TCP transport, this implementation may induce more TCP connections than the PerParticipantHashPlugin, so more resource consumption on both application and services side. .. figure:: ./images/PerWriterPlugin.png :alt: PerWriterHashPlugin :width: 27em :align: center **PerWriterHashPlugin** *************** Fault tolerance *************** When a Vortex Link service fails in a cluster (crashes, is terminated by admin, ...), all the data flows that were routed by this service in the cluster will be rebalanced to other services of the cluster. This switch over will take some time (see `Optimizing fault detection time`_). During this time : + On best-effort communications, all samples emitted during this switch over will be lost and so not received by the concerned DataReaders. + On reliable communications, no sample will be lost and the concerned readers will receive all samples after the switch over. The DataWriter may block in its write operation until the switch over is finished (depending if its reliability queue is full). Peers configuration for services ================================ As described in :ref:`How to deploy and configure Vortex Link?` : when services use unicast communications to communicate with each other, they need to be configured with initial peers. When using fault tolerance, we want the system to work even if some services fail. So we need to make sure that: + The services can connect to the remote clusters even if some of the services of the cluster failed. + Whenever a service is connected to one of the replicas of a remote cluster and if this replica fails, the service connects to another replica of the remote cluster. For this purpose, we need to configure the services with all the locators of the replicas (or at least part of the replicas) of the cluster they should connect to, so that if some of the replicas fail, the service can still connect to one of them. As we don't necessarily want the service to connect to all of the replicas of the remote cluster at the same time, we can use locator groups (see :ref:`Locators group`). Example of connection of Vortex Link to a group of 2 replicas on different hosts: .. code-block:: properties -Dlink.tcp.peers=[65.65.65.65:7400,80.80.80.80:7400] Peers configuration for applications ==================================== As described in :ref:`How to deploy and configure Vortex Link?` : when applications use unicast communications to communicate with Vortex Link services, they need to be configured with initial peers. When using fault tolerance, we want the system to work even if some services fail. So we need to make sure that: + The application can connect to a cluster even if some of the services of the cluster failed. + Whenever an application is connected to one of the replicas of a cluster and if this replica fails, the application connects to another replica of the cluster. For this purpose, we need to configure the applications with all the locators of the replicas (or at least part of the replicas) of the cluster they should connect to so that, if some of the replicas failed, the service can still connect to one of them. As we don't necessarily want the application to connect to all of the replicas of the cluster at the same time, we can use locator groups (see :ref:`Locators group`). Example with Vortex Café connecting to a group of 2 replicas on different hosts: .. code-block:: properties -Dddsi.discovery.tcp.peers=[65.65.65.65:7400,80.80.80.80:7400] Optimizing fault detection time =============================== During a switch over, what usually takes most time is the fault detection is the time other services of the cluster take to detect that one service in the cluster has failed. This fault detection time can be minimized by configuring the Vortex Link services lease duration with the help of the following properties: + ``ddsi.discovery.participant.leaseDuration`` (time in seconds, default 10 seconds) This property configures the amount of time other services should wait when they do not receive any message from a given service before considering the service as dead. The smallest this duration is, the fastest the switch over will be. + ``ddsi.discovery.participant.advertisePeriod`` (time in seconds, default 2.5 seconds) This property configures the period used by services to advertise themselves to other services or applications. This property must be set with a duration at least 2 or 3 times shorter than the duration configured in ``ddsi.discovery.participant.leaseDuration``. For more details about those configuration properties, please refer to the Vortex Café documentation. Fault tolerance configuration examples ====================================== Indirect LAN to LAN ------------------- .. figure:: ./images/LB_FT_Example_IndirectLan2Lan.png :alt: Indirect LAN to LAN with redondancy :width: 45em :height: 30em :align: center **Vortex Link service A**: .. code-block:: properties link.cluster.id="Link" link.network.interface=eth0 link.externalNetworkAddresses=70.70.70.1 link.serviceLevel=20 **Vortex Link service B**: .. code-block:: properties link.cluster.id="Link" link.network.interface=eth0 link.externalNetworkAddresses=70.70.70.2 link.serviceLevel=20 **Vortex Link services 1A & 1B**: .. code-block:: properties link.cluster.id="LAN1" link.network.interface=eth0 link.tcp.peers=[70.70.70.1:7400,70.70.70.2:7400] link.externalNetworkAddresses=none **Vortex Link services 2A & 2B**: .. code-block:: properties link.cluster.id="LAN2" link.network.interface=eth0 link.tcp.peers=[70.70.70.1:7400,70.70.70.2:7400] link.externalNetworkAddresses=none **App 1 & 2**: *(nothing to configure)* **App 3**: *(nothing to configure)* **App 4** (using Vortex Café): .. code-block:: properties ddsi.network.udp.enabled=false ddsi.discovery.tcp.peers=[70.70.70.1:7400,70.70.70.2:7400] ddsi.discovery.tcp.port=7400 ddsi.discovery.externalNetworkAddresses=none