declearn 2.4.0
Released: 18/03/2024
These release notes can also be found on our website and on the source GitLab repo.
Important notice:
DecLearn 2.4 derogates to SemVer by revising some of the major DecLearn component APIs.
This is mitigated in two ways:
- No changes relate to the main process setup code, meaning that end-users that do not use custom components (aggregator, optimodule, metric, etc.) should not see any difference, and their code will work as before (as an illustration, our examples' code remains unchanged).
- Key methods that were deprecated in favor of new API ones are kept for two more minor versions, and are still tested to work as before.
Any end-user encountering issues due to the released or planned evolution of DecLearn is invited to contact us via GitLab, GitHub or e-mail so that we can provide with assistance and/or update our roadmap so that changes do not hinder the usability of DecLearn 2.x for research and applications.
New version policy (and future roadmap)
As noted above, v2.4 does not fully abide by SemVer rules. In the future, more partially-breaking changes and API revisions may be introduced, incrementally paving the way towards the next major release, while trying as much as possible not to break end-user code.
To avoid unforeseen incompatibilities and cryptic bugs from arsing, from this version onward, the server and clients are expected and verified to use the same major.minor
version of DecLearn. This policy may be updated in the future, e.g. to specify that clients may have a newer minor version than the server (and most probably not the other way around).
To avoid unhappy surprises, we are starting to maintain a public roadmap on our GitLab. Although it may change, it should provide interested users (notably those that are interested in developing custom components or processes on top of DecLearn) with a way to anticipate changes, and voice any concerns or advice they might have.
Revise all aggregation APIs
Revise the overall design for aggregation and introduce Aggregate
API
This release introduces the Aggregate
API, which is based on an abstract base dataclass acting as a template for data structures that require sharing across peers and aggregation.
The declearn.utils.Aggregate
ABC acts as a shared ancestor providing with a base API and shared backend code to define data structures that:
- are serializable to and deserializable from JSON, and may therefore be preserved across network communications
- are aggregatable into an instance of the same structure
- use summation as the default aggregation rule for fields, which is overridable by redefining the
default_aggregate
method - can implement custom
aggregate_<field.name>
methods to override the default summation rule - implement a
prepare_for_secagg
method that- enables defining which fields merely require sum-aggregation and need encryption when using SecAgg, and which fields are to be preserved in cleartext (and therefore go through the usual default or custom aggregation methods)
- can be made to raise a
NotImplementedError
when SecAgg cannot be achieved on a data structure
This new ABC currently has three main children:
AuxVar
: replaces plain dict forOptimizer
auxiliary variablesMetricState
: replaces plain dict forMetric
intermediate statesModelUpdates
: replaces sharing of updates asVector
andn_steps
Each of this is defined jointly with another (pre-existing, revised) API for components that (a) produce Aggregate
data structures based on some input data and/or computations; (b) produce some output results based on a received Aggregate
structure, meant to result from the aggregation of multiple peers' produced data.
Revise Aggregator
API, introducing ModelUpdates
The Aggregator
API was revised to make use of the new ModelUpdates
data structure (inheriting Aggregate
).
Aggregator.prepare_for_sharing
pre-processes an inputVector
containing raw model updates and an integer indicating the number of local SGD steps into aModelUpdates
structure.Aggregator.finalize_updates
receives aModelUpdates
resulting from the aggregation of peers' instances, and performs final computations to produce aVector
of aggregated model updates.- The legacy
Aggregator.aggregate
method is deprecated (but still works).
Revise auxiliary variables for Optimizer
, introducing AuxVar
The OptiModule
API (and, consequently, Optimizer
) was revised as to the design and signature of auxiliary variables related methods, to make use of the new AuxVar
data structure (inheriting Aggregate
).
OptiModule.collect_aux_var
now emits eitherNone
or anAuxVar
instance (the precise type of which is module-dependent), instead of a mere dict.OptiModule.process_aux_var
now expects a proper-typeAuxVar
instance that already aggregates clients' data, externalizing the aggregation rules to theAuxVar
subtypes, while keeping the finalization logic part of theOptiModule
subclasses.Optimizer.collect_aux_var
therefore emits a{name: aux_var}
dict.Optimizer.process_aux_var
therefore expects a{name: aux_var}
dict, rather than having distinct signatures on the client and server sides.- It is now expected that server-side components will send the same data to all clients, rather than allow sending client-wise values.
The backend code of ScaffoldClientModule
and ScaffoldServerModule
was heavily revised to alter the distribution of information and computations:
- Client-side modules are now the sole owners of their local state, and send sum-aggregatable updates to the server, that are therefore SecAgg-compatible.
- The server consequently shares the same information with all clients, namely the current global state.
- To keep track of the (possibly growing with time) number of unique client, clients generate a random uuid that is sent with their state updates and preserved in cleartext when SecAgg is used.
- As a consequence, the server component knows which clients contributed to a given round, but receives an aggregate of local updates rather than the client-wise state values.
Revise Metric
API, introducing ModelState
The Metric
API was revised to make use of the new MetricState
data structure (inheriting Aggregate
).
Metric.build_initial_states
generates a "zero-state"MetricState
instance (it replaces the previously-private_build_states
method that returned a dict).Metric.get_states
returns a (Metric-type-dependent)MetricState
instance, instead of a mere dict.Metric.set_states
assigns an incomingMetricState
into the instance, that may be finalized into results using the unchangedget_result
method.- The legacy
Metric.agg_states
is deprecated, in favor ofset_states
(but it still works).
Revise backend communications and messaging APIs
This release introduces some important backend changes to the communication and messaging APIs of DecLearn, resulting in more robust code (that is also easier to test and maintain), more efficient message parsing (possibly-costly de-serialization is now delayed to a time posterior to validity verification) and the extensibility of application messages, enabling to easily define and use custom message structures in downstream applications.
The most important API change is that network communication endpoints now return SerializedMessage
instances rather than Message
ones.
New declearn.communication.api.backend
submodule
- Introduce a new
ActionMessage
minimal API under itsactions
submodule, that defines hard-coded, lightweight and easy-to-parse data structures designed to convey information and content across network communications agnostic to the content's nature. - Revise and expose the
MessagesHandler
util, that now builds on theActionMessage
API to model remote calls and answer them. - Move the
declearn.communication.messaging.flags
submodule todeclearn.communication.api.backend.flags
.
New declearn.messaging
submodule
- Revise the
Message
API to make it extendable, with automated type-registration of subclasses by default. - Introduce
SerializedMessage
as a wrapper for received messages, that parses the exactMessage
subtype (enabling logic tests and message filtering) but delays actual content de-serialization andMessage
object recovery (enabling to prevent undue resources use for unwanted messages that end up being discarded). - Move most existing
Message
subclasses to the new submodule, for retro-compatibility purposes. In DecLearn 3.0 these will probably be re-dispatched to make it clear that concrete messages only make sense in the context of specific multi-agent processes. - Drop backend-oriented
Message
subclasses that are replaced with the newActionMessage
backbone structures. - Deprecate the
declearn.communication.messaging
submodule, that is temporarily maintained, re-exporting moved contents as well as deprecated message types (which are bound to be rejected if sent).
Revise NetworkClient
and NetworkServer
- Have message-receiving methods return
SerializedMessage
instances rather than finalized de-serializedMessage
ones. - Quit sending and expecting 'data_info' with registration requests.
- Rename
NetworkClient.check_message
intorecv_message
(keep the former as an alias, with aDeprecationWarning
). - Improve the use of (optional) timeouts when sending or expecting messages and overall exceptions handling:
NetworkClient.recv_message
may either raise aTimeoutError
(in case of timeout) orRuntimeError
(in case of rejection).NeworkServer.send_messages
andbroadcast_message
quietly stop waiting for clients to collect messages after the (opt.) timeout delay has passed. Messages may still be collected.NetworkServer.wait_for_messages
no longer accepts a timeout.NetworkServer.wait_for_messages_with_timeout
implements the possibility to setup a timeout. It returns both received client replies and a list of clients that failed to answer.- All timeouts can now be specified as float values (which is mostly useful for testing purposes or simulated environments).
- Add a
heartbeat
instantiation parameter, with a default value of 1 second, that is passed to the underlyingMessagesHandler
. In simulated contexts (including tests), setting a low heartbeat can cut runtime down significantly.
New declearn.communication.utils
submodule
Introduce the declearn.communication.utils
submodule, and move existing declearn.communication
utils to it. Keep re-exporting them from the parent module to preserve code compatibility.
Add verify_client_messages_validity
and verify_server_message_validity
as part of the new submodule, that refactor some backend code from orchestration classes related to the filtering and type-checking of exchanged messages.
Usability updates
A few minor changes were made in hope that they can improve DecLearn usability for end-users.
Record and save training losses
The Model
API was updated so that Model.compute_batch_gradients
now records the computed batch-averaged model loss as a float value in an internal buffer, and the newly-introduced Model.collect_training_losses
method enables getting all stored values (and purging the buffer on the way).
The FederatedClient
was consequently updated to collect and export training loss values at the end of each and every training round when a Checkpointer
is attached to it (otherwise, values are purged from memory but not recorded to disk).
Add a verbosity option for FederatedClient
and TrainingManager
.
FederatedClient
and TrainingManager
now both accept a verbose: bool=True
instantiation keyword argument, that changes:
- (a) the default logger verbosity level: if
logger=None
is also passed, the default logger will have 'info'-level verbosity ifverbose
anddeclearn.utils.LOGGING_LEVEL_MAJOR
-level ifnot verbose
, so that only evaluation metrics and errors are logged. - (b) the optional display of a progressbar when conducting training or evaluation rounds; if
not verbose
, no progressbar is used.
Add 'TrainingManager.(train|evaluate)_under_constraints' to the API.
These public methods enable running training or evaluation rounds without relying on Message
structures to specify parameters nor collect results.
Modularize 'run_as_processes' input specs.
The declearn.utils.run_as_processes
util was modularized to that routines can be specified in various ways. Previously, they could only be passed as (func, args)
tuples. Now, they can either be passed as (func, args)
, (func, kwargs)
or (func, args, kwargs)
, where args
is still a tuple of positional arguments, and kwargs
is a dict of keyword ones.
Other changes
Fix redundant sharing of model weights with clients
FederatedServer
now keeps track of clients having received the latest global model weights, and avoids sending them redundantly with future training (or evaluation) requests. To achieve this, TrainRequest
and EvaluationRequest
now support setting their weights
field to None
.
Update TensorFlow supported versions
Older TensorFlow versions (v2.5 to 2.10 included) were improperly marked as supported in spite of TensorflowOptiModule
requiring at least version 2.11 to work (due to changes of the Keras Optimizer API). This has been corrected.
The latest TensorFlow version (v2.16) introduces backward-breaking changes, due to the backend swap from Keras 2 to Keras 3. Our backend code was updated to both add support for this newer Keras backend, and preserve existing support.
Note that at the moment, the CI does not support TensorFlow above 2.13, due to newer versions not being compatible with Python 3.8. As such, our code will be tested to remain backward-compatible. Forward compatibility has been (and will keep being) tested locally with a newer Python version.
Deprecate declearn.dataset.load_from_json
As the save_to_json
and from_to_json
methods were removed from the declearn.dataset.Dataset
API in DecLearn 2.3.0, there is no longer a guarantee that this function works (save with InMemoryDataset
).
As a consequence, this function should have been deprecated, and has now been documented as such planned for removal in DecLearn 2.6 and/or 3.0.