Merge pull request #41165 from kubernetes/dev-1.28

Official 1.28 Release Docs
kubernetes · Aug 15, 2023 · 0180cc4 · 0180cc4
2 parents beb7d1c + e05d2b7
commit 0180cc4
Show file tree

Hide file tree

Showing 86 changed files with 10,270 additions and 24,721 deletions.
diff --git a/api-ref-assets/api/swagger.json b/api-ref-assets/api/swagger.json
diff --git a/api-ref-assets/config/fields.yaml b/api-ref-assets/config/fields.yaml
@@ -88,6 +88,7 @@
   - fields:
     - nominatedNodeName
     - hostIP
+    - hostIPs
     - startTime
     - phase
     - message
@@ -99,6 +100,7 @@
     - initContainerStatuses
     - containerStatuses
     - ephemeralContainerStatuses
+    - resourceClaimStatuses
     - resize
 
 - definition: io.k8s.api.core.v1.Container
@@ -137,6 +139,7 @@
     - livenessProbe
     - readinessProbe
     - startupProbe
+    - restartPolicy
   - name: Security Context
     fields:
     - securityContext
@@ -228,6 +231,7 @@
     fields:
     - terminationMessagePath
     - terminationMessagePolicy
+    - restartPolicy
   - name: Debugging
     fields:
     - stdin
@@ -393,9 +397,14 @@
     fields:
     - selector
     - manualSelector
-  - name: Alpha level
+  - name: Beta level
     fields:
     - podFailurePolicy
+  - name: Alpha level
+    fields:
+    - backoffLimitPerIndex
+    - maxFailedIndexes
+    - podReplacementPolicy
 
 - definition: io.k8s.api.batch.v1.JobStatus
   field_categories:
@@ -411,6 +420,10 @@
   - name: Beta level
     fields:
     - ready
+  - name: Alpha level
+    fields:
+    - failedIndexes
+    - terminating
 
 - definition: io.k8s.api.batch.v1.CronJobSpec
   field_categories:

diff --git a/api-ref-assets/config/toc.yaml b/api-ref-assets/config/toc.yaml
@@ -153,7 +153,7 @@ parts:
     version: v1alpha1
   - name: SelfSubjectReview
     group: authentication.k8s.io
-    version: v1beta1
+    version: v1
 - name: Authorization Resources
   chapters:
   - name: LocalSubjectAccessReview
@@ -168,9 +168,6 @@ parts:
   - name: SubjectAccessReview
     group: authorization.k8s.io
     version: v1
-  - name: SelfSubjectReview
-    group: authentication.k8s.io
-    version: v1alpha1
   - name: ClusterRole
     group: rbac.authorization.k8s.io
     version: v1
@@ -218,7 +215,7 @@ parts:
     version: v1
   - name: ValidatingAdmissionPolicy
     group: admissionregistration.k8s.io
-    version: v1alpha1
+    version: v1beta1
     otherDefinitions:
     - ValidatingAdmissionPolicyList
     - ValidatingAdmissionPolicyBinding

diff --git a/content/en/blog/_posts/2022-09-14-pod-has-network-condition.md b/content/en/blog/_posts/2022-09-14-pod-has-network-condition.md
@@ -23,6 +23,13 @@ per container characteristics like image size or payload) can utilize the
 the `PodHasNetwork` condition to optimize the set of actions performed when pods
 repeatedly fail to come up.
 
+### Updates for Kubernetes 1.28
+
+The `PodHasNetwork` condition has been renamed to `PodReadyToStartContainers`.
+Alongside that change, the feature gate `PodHasNetworkCondition` has been replaced by
+`PodReadyToStartContainersCondition`. You need to set `PodReadyToStartContainersCondition`
+to true in order to use the new feature in v1.28.0 and later.
+
 ### How is this different from the existing Initialized condition reported for pods?
 
 The kubelet sets the status of the existing `Initialized` condition reported in

diff --git a/content/en/docs/concepts/architecture/mixed-version-proxy.md b/content/en/docs/concepts/architecture/mixed-version-proxy.md
@@ -0,0 +1,84 @@
+---
+reviewers:
+- jpbetz
+title: Mixed Version Proxy
+content_type: concept
+weight: 220
+---
+
+<!-- overview -->
+{{< feature-state state="alpha"  for_k8s_version="v1.28" >}}
+
+Kubernetes {{< skew currentVersion >}} includes an alpha feature that lets a
+{{< glossary_tooltip text="API Server" term_id="kube-apiserver" >}}
+proxy a resource requests to other _peer_ API servers. This is useful when there are multiple
+API servers running different versions of Kubernetes in one cluster (for example, during a long-lived
+rollout to a new release of Kubernetes).
+
+This enables cluster administrators to configure highly available clusters that can be upgraded
+more safely, by directing resource requests (made during the upgrade) to the correct kube-apiserver.
+That proxying prevents users from seeing unexpected 404 Not Found errors that stem
+from the upgrade process.
+
+This mechanism is called the _Mixed Version Proxy_. 
+
+## Enabling the Mixed Version Proxy
+Ensure that `UnknownVersionInteroperabilityProxy` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) 
+is enabled when you start the {{< glossary_tooltip text="API Server" term_id="kube-apiserver" >}}:
+
+```shell
+kube-apiserver \
+--feature-gates=UnknownVersionInteroperabilityProxy=true \
+# required command line arguments for this feature
+--peer-ca-file=<path to kube-apiserver CA cert>
+--proxy-client-cert-file=<path to aggregator proxy cert>,
+--proxy-client-key-file=<path to aggregator proxy key>,
+--requestheader-client-ca-file=<path to aggregator CA cert>,
+# requestheader-allowed-names can be set to blank to allow any Common Name
+--requestheader-allowed-names=<valid Common Names to verify proxy client cert against>,
+
+# optional flags for this feature
+--peer-advertise-ip=`IP of this kube-apiserver that should be used by peers to proxy requests`
+--peer-advertise-port=`port of this kube-apiserver that should be used by peers to proxy requests`
+
+# …and other flags as usual
+```
+
+### Proxy transport and authentication between API servers {#transport-and-authn}
+
+* The source kube-apiserver reuses the [existing APIserver client authentication flags](https://kubernetes.io/docs/tasks/extend-kubernetes/configure-aggregation-layer/#kubernetes-apiserver-client-authentication) `--proxy-client-cert-file` and `--proxy-client-key-file` to present its identity that will be verified by its peer (the destination kube-apiserver). The destination API server verifies that peer connection based on the configuration you specify using the `--requestheader-client-ca-file` command line argument. 
+
+* To authenticate the destination server's serving certs, you must configure a certificate authority bundle by specifying the `--peer-ca-file` command line argument to the **source** API server.
+
+### Configuration for peer API server connectivity
+
+To set the network location of a kube-apiserver that peers will use to proxy requests, use the 
+`--peer-advertise-ip` and `--peer-advertise-port` command line arguments to kube-apiserver or specify 
+these fields in the API server configuration file.
+If these flags are unspecified, peers will use the value from either `--advertise-address` or
+`--bind-address` command line argument to the kube-apiserver. If those too, are unset, the host's default interface is used.
+
+## Mixed version proxying
+
+When you enable mixed version proxying, the [aggregation layer](/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/) 
+loads a special filter that does the following:
+
+* When a resource request reaches an API server that cannot serve that API (either because it is at a version pre-dating the introduction of the API or the API is turned off on the API server) the API server attempts to send the request to a peer API server that can serve the requested API. It does so by identifying API groups / versions / resources that the local server doesn't recognise, and tries to proxy those requests to a peer API server that is capable of handling the request.
+* If the peer API server fails to respond, the _source_ API server responds with 503("Service Unavailable") error.
+
+### How it works under the hood
+
+When an API Server receives a resource request, it first checks which API servers can serve the requested resource. This check happens using the internal [`StorageVersion` API].
+
+* If the resource is known to the API server that received the request (ex: `GET /api/v1/pods/some-pod`), the request is handled locally. 
+
+* If there is no internal `StorageVersion` object found for the requested resource (ex: `GET /my-api/v1/my-resource`) and the configured APIService specifies proxying to an extension API server, that proxying happens following the usual
+[flow](/docs/tasks/extend-kubernetes/configure-aggregation-layer/) for
+extension APIs.
+
+* If a valid internal `StorageVersion` object is found for the requested resource (ex: `GET /batch/v1/jobs`) and the API server trying to handle the request (the _handling API server_) has the `batch` API disabled, then the _handling API server_fetches the peer API servers that do serve the relevant API group / version / resource (`api/v1/batch` in this case) using the information in the fetched `StorageVersion` object. The _handling API server_ then proxies the request to one of the matching peer kube-apiservers that are aware of the requested resource.
+  * If there is no peer known for that API group / version / resource, the handling API server passes the request to its own handler chain which should eventually return a 404("Not Found") response.
+  * If the handling API server has identified and selected a peer API server, but that peer fails
+    to respond (for reasons such as network connectivity issues, or a data race between the request
+    being received and a controller registering the peer's info into the control plane), then the handling
+    API server responds with a 503 (“Service Unavailable”) error.
diff --git a/content/en/docs/concepts/architecture/nodes.md b/content/en/docs/concepts/architecture/nodes.md
@@ -571,9 +571,9 @@ the feature is Beta and is enabled by default.
 Metrics `graceful_shutdown_start_time_seconds` and `graceful_shutdown_end_time_seconds`
 are emitted under the kubelet subsystem to monitor node shutdowns.
 
-## Non Graceful node shutdown {#non-graceful-node-shutdown}
+## Non-graceful node shutdown handling {#non-graceful-node-shutdown}
 
-{{< feature-state state="beta" for_k8s_version="v1.26" >}}
+{{< feature-state state="stable" for_k8s_version="v1.28" >}}
 
 A node shutdown action may not be detected by kubelet's Node Shutdown Manager,
 either because the command does not trigger the inhibitor locks mechanism used by
@@ -617,11 +617,7 @@ During a non-graceful shutdown, Pods are terminated in the two phases:
 
 ## Swap memory management {#swap-memory}
 
-{{< feature-state state="alpha" for_k8s_version="v1.22" >}}
-
-Prior to Kubernetes 1.22, nodes did not support the use of swap memory, and a
-kubelet would by default fail to start if swap was detected on a node. In 1.22
-onwards, swap memory support can be enabled on a per-node basis.
+{{< feature-state state="beta" for_k8s_version="v1.28" >}}
 
 To enable swap on a node, the `NodeSwap` feature gate must be enabled on
 the kubelet, and the `--fail-swap-on` command line flag or `failSwapOn`
@@ -638,29 +634,40 @@ specify how a node will use swap memory. For example,
 
 ```yaml
 memorySwap:
-  swapBehavior: LimitedSwap
+  swapBehavior: UnlimitedSwap
 ```
 
-The available configuration options for `swapBehavior` are:
-
-- `LimitedSwap`: Kubernetes workloads are limited in how much swap they can
-  use. Workloads on the node not managed by Kubernetes can still swap.
-- `UnlimitedSwap`: Kubernetes workloads can use as much swap memory as they
+- `UnlimitedSwap` (default): Kubernetes workloads can use as much swap memory as they
   request, up to the system limit.
+- `LimitedSwap`: The utilization of swap memory by Kubernetes workloads is subject to limitations. Only Pods of Burstable QoS are permitted to employ swap.
 
 If configuration for `memorySwap` is not specified and the feature gate is
 enabled, by default the kubelet will apply the same behaviour as the
-`LimitedSwap` setting.
+`UnlimitedSwap` setting.
+
+With `LimitedSwap`, Pods that do not fall under the Burstable QoS classification (i.e.
+`BestEffort`/`Guaranteed` Qos Pods) are prohibited from utilizing swap memory.
+To maintain the aforementioned security and node
+health guarantees, these Pods are not permitted to use swap memory when `LimitedSwap` is
+in effect. 
+
+Prior to detailing the calculation of the swap limit, it is necessary to define the following terms:
+* `nodeTotalMemory`: The total amount of physical memory available on the node.
+* `totalPodsSwapAvailable`: The total amount of swap memory on the node that is available for use by Pods (some swap memory may be reserved for system use).
+* `containerMemoryRequest`: The container's memory request.
+
+Swap limitation is configured as:
+`(containerMemoryRequest / nodeTotalMemory) * totalPodsSwapAvailable`.
 
-The behaviour of the `LimitedSwap` setting depends if the node is running with
-v1 or v2 of control groups (also known as "cgroups"):
+It is important to note that, for containers within Burstable QoS Pods, it is possible to
+opt-out of swap usage by specifying memory requests that are equal to memory limits.
+Containers configured in this manner will not have access to swap memory.
 
-- **cgroupsv1:** Kubernetes workloads can use any combination of memory and
-  swap, up to the pod's memory limit, if set.
-- **cgroupsv2:** Kubernetes workloads cannot use swap memory.
+Swap is supported only with **cgroup v2**, cgroup v1 is not supported. 
 
 For more information, and to assist with testing and provide feedback, please
-see [KEP-2400](https://github.com/kubernetes/enhancements/issues/2400) and its
+see the blog-post about [Kubernetes 1.28: NodeSwap graduates to Beta1](/blog/2023/07/18/swap-beta1-1.28-2023/),
+[KEP-2400](https://github.com/kubernetes/enhancements/issues/4128) and its
 [design proposal](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md).
 
 ## {{% heading "whatsnext" %}}

diff --git a/content/en/docs/concepts/extend-kubernetes/api-extension/custom-resources.md b/content/en/docs/concepts/extend-kubernetes/api-extension/custom-resources.md
@@ -209,7 +209,7 @@ Aggregated APIs offer more advanced API features and customization of other feat
 
 | Feature | Description | CRDs | Aggregated API |
 | ------- | ----------- | ---- | -------------- |
-| Validation | Help users prevent errors and allow you to evolve your API independently of your clients. These features are most useful when there are many clients who can't all update at the same time. | Yes.  Most validation can be specified in the CRD using [OpenAPI v3.0 validation](/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#validation).  Any other validations supported by addition of a [Validating Webhook](/docs/reference/access-authn-authz/admission-controllers/#validatingadmissionwebhook-alpha-in-1-8-beta-in-1-9). | Yes, arbitrary validation checks |
+| Validation | Help users prevent errors and allow you to evolve your API independently of your clients. These features are most useful when there are many clients who can't all update at the same time. | Yes.  Most validation can be specified in the CRD using [OpenAPI v3.0 validation](/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#validation). [CRDValidationRatcheting](/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#validation-ratcheting) feature gate allows failing validations specified using OpenAPI also can be ignored if the failing part of the resource was unchanged.  Any other validations supported by addition of a [Validating Webhook](/docs/reference/access-authn-authz/admission-controllers/#validatingadmissionwebhook-alpha-in-1-8-beta-in-1-9). | Yes, arbitrary validation checks |
 | Defaulting | See above | Yes, either via [OpenAPI v3.0 validation](/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#defaulting) `default` keyword (GA in 1.17), or via a [Mutating Webhook](/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook) (though this will not be run when reading from etcd for old objects). | Yes |
 | Multi-versioning | Allows serving the same object through two API versions. Can help ease API changes like renaming fields. Less important if you control your client versions. | [Yes](/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning) | Yes |
 | Custom Storage | If you need storage with a different performance mode (for example, a time-series database instead of key-value store) or isolation for security (for example, encryption of sensitive information, etc.) | No | Yes |

diff --git a/content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md b/content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md
@@ -147,6 +147,22 @@ The general workflow of a device plugin includes the following steps:
    runtime configurations for accessing the allocated devices. The kubelet passes this information
    to the container runtime.
 
+   An `AllocateResponse` contains zero or more `ContainerAllocateResponse` objects. In these, the
+   device plugin defines modifications that must be made to a container's definition to provide
+   access to the device. These modifications include:
+
+   * annotations
+   * device nodes
+   * environment variables
+   * mounts
+   * fully-qualified CDI device names
+
+   {{< note >}}
+   The processing of the fully-qualified CDI device names by the Device Manager requires
+   the `DevicePluginCDIDevices` feature gate to be enabled. This was added as an alpha feature in
+   v1.28.
+   {{< /note >}}
+
 ### Handling kubelet restarts
 
 A device plugin is expected to detect kubelet restarts and re-register itself with the new
@@ -195,7 +211,7 @@ of the device allocations during the upgrade.
 
 ## Monitoring device plugin resources
 
-{{< feature-state for_k8s_version="v1.15" state="beta" >}}
+{{< feature-state for_k8s_version="v1.28" state="stable" >}}
 
 In order to monitor resources provided by device plugins, monitoring agents need to be able to
 discover the set of devices that are in-use on the node and obtain metadata to describe which
@@ -312,7 +328,7 @@ below:
 
 ### `GetAllocatableResources` gRPC endpoint {#grpc-endpoint-getallocatableresources}
 
-{{< feature-state state="beta" for_k8s_version="v1.23" >}}
+{{< feature-state state="stable" for_k8s_version="v1.28" >}}
 
 GetAllocatableResources provides information on resources initially available on the worker node.
 It provides more information than kubelet exports to APIServer.
@@ -338,16 +354,6 @@ message AllocatableResourcesResponse {
 }
 ```
 
-Starting from Kubernetes v1.23, the `GetAllocatableResources` is enabled by default.
-You can disable it by turning off the `KubeletPodResourcesGetAllocatable`
-[feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
-
-Preceding Kubernetes v1.23, to enable this feature `kubelet` must be started with the following flag:
-
-```
---feature-gates=KubeletPodResourcesGetAllocatable=true
-```
-
 `ContainerDevices` do expose the topology information declaring to which NUMA cells the device is
 affine. The NUMA cells are identified using a opaque integer ID, which value is consistent to
 what device plugins report
@@ -381,8 +387,6 @@ Support for the `PodResourcesLister service` requires `KubeletPodResources`
 [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to be enabled.
 It is enabled by default starting with Kubernetes 1.15 and is v1 since Kubernetes 1.20.
 
-
-
 ### `Get` gRPC endpoint {#grpc-endpoint-get}
 
 {{< feature-state state="alpha" for_k8s_version="v1.27" >}}