Skip to content

Releases: tony-framework/TonY

Release TonY 0.4.9

13 Oct 07:20
7622a6b
Compare
Choose a tag to compare

What's Changed

  • Keep all resources close when client is killed by @zuston in #600
  • Task failure handling mechanism: missed-heartbeat-failure is consistent with other failures by @zuston in #607
  • Pass secret keys from AM to containers to support Hadoop encryption by @helloworld1 in #605
  • Make untrackedTaskFailed volatile by @zuston in #608
  • Release TonY v0.4.9 by @plliao in #609

Full Changelog: v0.4.8...v0.4.9

Release TonY 0.4.8

29 Sep 16:46
d0f3176
Compare
Choose a tag to compare

Changes in this release:
TonyClient to create FileSystem from Path to support fully qualified HDFS path (#598)
Prevent loss of root cause due to resetting the final state (#599)
Rename tony.worker.timeout to tony.task.executor.execution-timeout-ms (#596)
Set job failed when runtime is not healthy (#597)
Speed up ci test when am crashed (#594)
Fixed checkstyle suppressions invalid problem on windows (#590)
Remove task from heart beat monitor when container finished (#588)
Refactor tensorflow related class to tony (#583)
Remove tensorflow package (#582)
Catch unknown exception when retrieving task metrics (#581)
Make task executor's heart beat max failed number consistent with AM max-missed-heartbeats conf (#580)
Update TonY to include CII Best Practices badge (#576)
Introduce container allocation timeout (#575)
Clean up local tmp files in client (#574)

Release TonY 0.4.7

16 Jul 19:29
b1cb617
Compare
Choose a tag to compare

Change list:

TonYClient ignore connection error to prevent app failure when sending AM stop signal (#522)
Remove AMRM credentials on task executor (#527)
Introduce generic interface to support multiple frameworks (#529)
Introduce new registerCallbackInfo rpc endpoint (#530)
Introduce standalone runtime type (#533)
Support horovod (#524)
[Runtime] Make job fast fail when conf is illegal (#535)
[Horovod-Runtime] Introduce custom horovod driver script in debug mode (#540)
Re-enable tensorboard port reuse (#541)
Prevent the running containers from stopping when AM crash (#549)
Support sidecar tensorboard (#546)
Update jquery version to 3.5.0 (#556)
[Horovod] Using user-defined python exec path to start built-in Horovod driver (#555)
Add interface comparable for TonyTask (#551)
Allow to specify side-car job type for task to ignore its failure (#558)
Specify sidecar tensorboard with sidecar job type (#561)
Add estimator implementation for MNIST (#560)
Allow to specify sidecar tensorboard startup extra options (#564)
Make AM and TaskExecutor runtime interface separate (#562)
TonY should throw exception when gpu resource is not found on cluster (#565)
Introduce pluggable runtime provider (#566)
Compatible with Hadoop 2.6.0-cdh5.11.0 (#571)

release TonY 0.4.6 including numbers of enhancements

29 Mar 06:42
4c21527
Compare
Choose a tag to compare

Change List:
403182d Make TonY client log layout more organized (#474)
8470968 [MINOR] Support specify timeout for AM waiting for client signal stop (#518)
a0e39ea [MINOR] Ignore updating task info connection error to prevent app failure (#517)
b3f96f1 When registrationTimeoutMS below 0, AM will wait forever (#520)
f000db4 Fast fail when container launch failed and not in stop.on.failure.jobtypes (#516)
346b086 [MINOR] Setting diagnostic msg to Yarn (#519)
06aef04 Reserve evaluator host spec in TF_CONFIG cluster, only when in evaluator process (#515)
c7407db Evaluator should be standalone with training cluster in TF (#512)

Fix TonY requests yarn config "yarn.io/gpu" on non-GPU clusters

24 Feb 19:13
5e679c1
Compare
Choose a tag to compare

Release v0.4.5 fixed issue #500 and addressed issue #450

Fault tolerance to missing resource paths

02 Feb 19:36
6b097b4
Compare
Choose a tag to compare

Proceed without failing if container resource paths do not exist

If container resource paths do not exist, we want to be able to still continue looking at the resource paths that were available.

Bug fixes

14 Dec 22:40
6a79e50
Compare
Choose a tag to compare

Bug fix: AM Retry prints - “Task was null! Nothing to schedule”

12 Nov 21:14
4b4d3bd
Compare
Choose a tag to compare

Fixed AM Retry prints - “Task was null! Nothing to schedule” due to accumulation of container request.

Bump up hadoop version to 2.10.0

11 Nov 23:03
a45ff8c
Compare
Choose a tag to compare

Bump up hadoop version to 2.10.0

Bug fix: Exception in AM thread that causes TonY hangs

05 Nov 00:28
c059623
Compare
Choose a tag to compare

Handle the "NoMethodError" exception in AM thread due to incompatible avro version.