Skip to content

Commit

Permalink
Merge pull request #41165 from kubernetes/dev-1.28
Browse files Browse the repository at this point in the history
Official 1.28 Release Docs
  • Loading branch information
reylejano authored Aug 15, 2023
2 parents beb7d1c + e05d2b7 commit 0180cc4
Show file tree
Hide file tree
Showing 86 changed files with 10,270 additions and 24,721 deletions.
31,486 changes: 7,820 additions & 23,666 deletions api-ref-assets/api/swagger.json

Large diffs are not rendered by default.

15 changes: 14 additions & 1 deletion api-ref-assets/config/fields.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@
- fields:
- nominatedNodeName
- hostIP
- hostIPs
- startTime
- phase
- message
Expand All @@ -99,6 +100,7 @@
- initContainerStatuses
- containerStatuses
- ephemeralContainerStatuses
- resourceClaimStatuses
- resize

- definition: io.k8s.api.core.v1.Container
Expand Down Expand Up @@ -137,6 +139,7 @@
- livenessProbe
- readinessProbe
- startupProbe
- restartPolicy
- name: Security Context
fields:
- securityContext
Expand Down Expand Up @@ -228,6 +231,7 @@
fields:
- terminationMessagePath
- terminationMessagePolicy
- restartPolicy
- name: Debugging
fields:
- stdin
Expand Down Expand Up @@ -393,9 +397,14 @@
fields:
- selector
- manualSelector
- name: Alpha level
- name: Beta level
fields:
- podFailurePolicy
- name: Alpha level
fields:
- backoffLimitPerIndex
- maxFailedIndexes
- podReplacementPolicy

- definition: io.k8s.api.batch.v1.JobStatus
field_categories:
Expand All @@ -411,6 +420,10 @@
- name: Beta level
fields:
- ready
- name: Alpha level
fields:
- failedIndexes
- terminating

- definition: io.k8s.api.batch.v1.CronJobSpec
field_categories:
Expand Down
7 changes: 2 additions & 5 deletions api-ref-assets/config/toc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ parts:
version: v1alpha1
- name: SelfSubjectReview
group: authentication.k8s.io
version: v1beta1
version: v1
- name: Authorization Resources
chapters:
- name: LocalSubjectAccessReview
Expand All @@ -168,9 +168,6 @@ parts:
- name: SubjectAccessReview
group: authorization.k8s.io
version: v1
- name: SelfSubjectReview
group: authentication.k8s.io
version: v1alpha1
- name: ClusterRole
group: rbac.authorization.k8s.io
version: v1
Expand Down Expand Up @@ -218,7 +215,7 @@ parts:
version: v1
- name: ValidatingAdmissionPolicy
group: admissionregistration.k8s.io
version: v1alpha1
version: v1beta1
otherDefinitions:
- ValidatingAdmissionPolicyList
- ValidatingAdmissionPolicyBinding
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,13 @@ per container characteristics like image size or payload) can utilize the
the `PodHasNetwork` condition to optimize the set of actions performed when pods
repeatedly fail to come up.

### Updates for Kubernetes 1.28

The `PodHasNetwork` condition has been renamed to `PodReadyToStartContainers`.
Alongside that change, the feature gate `PodHasNetworkCondition` has been replaced by
`PodReadyToStartContainersCondition`. You need to set `PodReadyToStartContainersCondition`
to true in order to use the new feature in v1.28.0 and later.

### How is this different from the existing Initialized condition reported for pods?

The kubelet sets the status of the existing `Initialized` condition reported in
Expand Down
84 changes: 84 additions & 0 deletions content/en/docs/concepts/architecture/mixed-version-proxy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
reviewers:
- jpbetz
title: Mixed Version Proxy
content_type: concept
weight: 220
---

<!-- overview -->
{{< feature-state state="alpha" for_k8s_version="v1.28" >}}

Kubernetes {{< skew currentVersion >}} includes an alpha feature that lets a
{{< glossary_tooltip text="API Server" term_id="kube-apiserver" >}}
proxy a resource requests to other _peer_ API servers. This is useful when there are multiple
API servers running different versions of Kubernetes in one cluster (for example, during a long-lived
rollout to a new release of Kubernetes).

This enables cluster administrators to configure highly available clusters that can be upgraded
more safely, by directing resource requests (made during the upgrade) to the correct kube-apiserver.
That proxying prevents users from seeing unexpected 404 Not Found errors that stem
from the upgrade process.

This mechanism is called the _Mixed Version Proxy_.

## Enabling the Mixed Version Proxy
Ensure that `UnknownVersionInteroperabilityProxy` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
is enabled when you start the {{< glossary_tooltip text="API Server" term_id="kube-apiserver" >}}:

```shell
kube-apiserver \
--feature-gates=UnknownVersionInteroperabilityProxy=true \
# required command line arguments for this feature
--peer-ca-file=<path to kube-apiserver CA cert>
--proxy-client-cert-file=<path to aggregator proxy cert>,
--proxy-client-key-file=<path to aggregator proxy key>,
--requestheader-client-ca-file=<path to aggregator CA cert>,
# requestheader-allowed-names can be set to blank to allow any Common Name
--requestheader-allowed-names=<valid Common Names to verify proxy client cert against>,

# optional flags for this feature
--peer-advertise-ip=`IP of this kube-apiserver that should be used by peers to proxy requests`
--peer-advertise-port=`port of this kube-apiserver that should be used by peers to proxy requests`

# …and other flags as usual
```

### Proxy transport and authentication between API servers {#transport-and-authn}

* The source kube-apiserver reuses the [existing APIserver client authentication flags](https://kubernetes.io/docs/tasks/extend-kubernetes/configure-aggregation-layer/#kubernetes-apiserver-client-authentication) `--proxy-client-cert-file` and `--proxy-client-key-file` to present its identity that will be verified by its peer (the destination kube-apiserver). The destination API server verifies that peer connection based on the configuration you specify using the `--requestheader-client-ca-file` command line argument.

* To authenticate the destination server's serving certs, you must configure a certificate authority bundle by specifying the `--peer-ca-file` command line argument to the **source** API server.

### Configuration for peer API server connectivity

To set the network location of a kube-apiserver that peers will use to proxy requests, use the
`--peer-advertise-ip` and `--peer-advertise-port` command line arguments to kube-apiserver or specify
these fields in the API server configuration file.
If these flags are unspecified, peers will use the value from either `--advertise-address` or
`--bind-address` command line argument to the kube-apiserver. If those too, are unset, the host's default interface is used.

## Mixed version proxying

When you enable mixed version proxying, the [aggregation layer](/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
loads a special filter that does the following:

* When a resource request reaches an API server that cannot serve that API (either because it is at a version pre-dating the introduction of the API or the API is turned off on the API server) the API server attempts to send the request to a peer API server that can serve the requested API. It does so by identifying API groups / versions / resources that the local server doesn't recognise, and tries to proxy those requests to a peer API server that is capable of handling the request.
* If the peer API server fails to respond, the _source_ API server responds with 503("Service Unavailable") error.

### How it works under the hood

When an API Server receives a resource request, it first checks which API servers can serve the requested resource. This check happens using the internal [`StorageVersion` API].

* If the resource is known to the API server that received the request (ex: `GET /api/v1/pods/some-pod`), the request is handled locally.

* If there is no internal `StorageVersion` object found for the requested resource (ex: `GET /my-api/v1/my-resource`) and the configured APIService specifies proxying to an extension API server, that proxying happens following the usual
[flow](/docs/tasks/extend-kubernetes/configure-aggregation-layer/) for
extension APIs.

* If a valid internal `StorageVersion` object is found for the requested resource (ex: `GET /batch/v1/jobs`) and the API server trying to handle the request (the _handling API server_) has the `batch` API disabled, then the _handling API server_fetches the peer API servers that do serve the relevant API group / version / resource (`api/v1/batch` in this case) using the information in the fetched `StorageVersion` object. The _handling API server_ then proxies the request to one of the matching peer kube-apiservers that are aware of the requested resource.
* If there is no peer known for that API group / version / resource, the handling API server passes the request to its own handler chain which should eventually return a 404("Not Found") response.
* If the handling API server has identified and selected a peer API server, but that peer fails
to respond (for reasons such as network connectivity issues, or a data race between the request
being received and a controller registering the peer's info into the control plane), then the handling
API server responds with a 503 (“Service Unavailable”) error.
47 changes: 27 additions & 20 deletions content/en/docs/concepts/architecture/nodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -571,9 +571,9 @@ the feature is Beta and is enabled by default.
Metrics `graceful_shutdown_start_time_seconds` and `graceful_shutdown_end_time_seconds`
are emitted under the kubelet subsystem to monitor node shutdowns.

## Non Graceful node shutdown {#non-graceful-node-shutdown}
## Non-graceful node shutdown handling {#non-graceful-node-shutdown}

{{< feature-state state="beta" for_k8s_version="v1.26" >}}
{{< feature-state state="stable" for_k8s_version="v1.28" >}}

A node shutdown action may not be detected by kubelet's Node Shutdown Manager,
either because the command does not trigger the inhibitor locks mechanism used by
Expand Down Expand Up @@ -617,11 +617,7 @@ During a non-graceful shutdown, Pods are terminated in the two phases:

## Swap memory management {#swap-memory}

{{< feature-state state="alpha" for_k8s_version="v1.22" >}}

Prior to Kubernetes 1.22, nodes did not support the use of swap memory, and a
kubelet would by default fail to start if swap was detected on a node. In 1.22
onwards, swap memory support can be enabled on a per-node basis.
{{< feature-state state="beta" for_k8s_version="v1.28" >}}

To enable swap on a node, the `NodeSwap` feature gate must be enabled on
the kubelet, and the `--fail-swap-on` command line flag or `failSwapOn`
Expand All @@ -638,29 +634,40 @@ specify how a node will use swap memory. For example,

```yaml
memorySwap:
swapBehavior: LimitedSwap
swapBehavior: UnlimitedSwap
```

The available configuration options for `swapBehavior` are:

- `LimitedSwap`: Kubernetes workloads are limited in how much swap they can
use. Workloads on the node not managed by Kubernetes can still swap.
- `UnlimitedSwap`: Kubernetes workloads can use as much swap memory as they
- `UnlimitedSwap` (default): Kubernetes workloads can use as much swap memory as they
request, up to the system limit.
- `LimitedSwap`: The utilization of swap memory by Kubernetes workloads is subject to limitations. Only Pods of Burstable QoS are permitted to employ swap.

If configuration for `memorySwap` is not specified and the feature gate is
enabled, by default the kubelet will apply the same behaviour as the
`LimitedSwap` setting.
`UnlimitedSwap` setting.

With `LimitedSwap`, Pods that do not fall under the Burstable QoS classification (i.e.
`BestEffort`/`Guaranteed` Qos Pods) are prohibited from utilizing swap memory.
To maintain the aforementioned security and node
health guarantees, these Pods are not permitted to use swap memory when `LimitedSwap` is
in effect.

Prior to detailing the calculation of the swap limit, it is necessary to define the following terms:
* `nodeTotalMemory`: The total amount of physical memory available on the node.
* `totalPodsSwapAvailable`: The total amount of swap memory on the node that is available for use by Pods (some swap memory may be reserved for system use).
* `containerMemoryRequest`: The container's memory request.

Swap limitation is configured as:
`(containerMemoryRequest / nodeTotalMemory) * totalPodsSwapAvailable`.

The behaviour of the `LimitedSwap` setting depends if the node is running with
v1 or v2 of control groups (also known as "cgroups"):
It is important to note that, for containers within Burstable QoS Pods, it is possible to
opt-out of swap usage by specifying memory requests that are equal to memory limits.
Containers configured in this manner will not have access to swap memory.

- **cgroupsv1:** Kubernetes workloads can use any combination of memory and
swap, up to the pod's memory limit, if set.
- **cgroupsv2:** Kubernetes workloads cannot use swap memory.
Swap is supported only with **cgroup v2**, cgroup v1 is not supported.

For more information, and to assist with testing and provide feedback, please
see [KEP-2400](https://github.com/kubernetes/enhancements/issues/2400) and its
see the blog-post about [Kubernetes 1.28: NodeSwap graduates to Beta1](/blog/2023/07/18/swap-beta1-1.28-2023/),
[KEP-2400](https://github.com/kubernetes/enhancements/issues/4128) and its
[design proposal](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md).

## {{% heading "whatsnext" %}}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ Aggregated APIs offer more advanced API features and customization of other feat

| Feature | Description | CRDs | Aggregated API |
| ------- | ----------- | ---- | -------------- |
| Validation | Help users prevent errors and allow you to evolve your API independently of your clients. These features are most useful when there are many clients who can't all update at the same time. | Yes. Most validation can be specified in the CRD using [OpenAPI v3.0 validation](/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#validation). Any other validations supported by addition of a [Validating Webhook](/docs/reference/access-authn-authz/admission-controllers/#validatingadmissionwebhook-alpha-in-1-8-beta-in-1-9). | Yes, arbitrary validation checks |
| Validation | Help users prevent errors and allow you to evolve your API independently of your clients. These features are most useful when there are many clients who can't all update at the same time. | Yes. Most validation can be specified in the CRD using [OpenAPI v3.0 validation](/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#validation). [CRDValidationRatcheting](/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#validation-ratcheting) feature gate allows failing validations specified using OpenAPI also can be ignored if the failing part of the resource was unchanged. Any other validations supported by addition of a [Validating Webhook](/docs/reference/access-authn-authz/admission-controllers/#validatingadmissionwebhook-alpha-in-1-8-beta-in-1-9). | Yes, arbitrary validation checks |
| Defaulting | See above | Yes, either via [OpenAPI v3.0 validation](/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#defaulting) `default` keyword (GA in 1.17), or via a [Mutating Webhook](/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook) (though this will not be run when reading from etcd for old objects). | Yes |
| Multi-versioning | Allows serving the same object through two API versions. Can help ease API changes like renaming fields. Less important if you control your client versions. | [Yes](/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning) | Yes |
| Custom Storage | If you need storage with a different performance mode (for example, a time-series database instead of key-value store) or isolation for security (for example, encryption of sensitive information, etc.) | No | Yes |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,22 @@ The general workflow of a device plugin includes the following steps:
runtime configurations for accessing the allocated devices. The kubelet passes this information
to the container runtime.

An `AllocateResponse` contains zero or more `ContainerAllocateResponse` objects. In these, the
device plugin defines modifications that must be made to a container's definition to provide
access to the device. These modifications include:

* annotations
* device nodes
* environment variables
* mounts
* fully-qualified CDI device names

{{< note >}}
The processing of the fully-qualified CDI device names by the Device Manager requires
the `DevicePluginCDIDevices` feature gate to be enabled. This was added as an alpha feature in
v1.28.
{{< /note >}}

### Handling kubelet restarts

A device plugin is expected to detect kubelet restarts and re-register itself with the new
Expand Down Expand Up @@ -195,7 +211,7 @@ of the device allocations during the upgrade.

## Monitoring device plugin resources

{{< feature-state for_k8s_version="v1.15" state="beta" >}}
{{< feature-state for_k8s_version="v1.28" state="stable" >}}

In order to monitor resources provided by device plugins, monitoring agents need to be able to
discover the set of devices that are in-use on the node and obtain metadata to describe which
Expand Down Expand Up @@ -312,7 +328,7 @@ below:

### `GetAllocatableResources` gRPC endpoint {#grpc-endpoint-getallocatableresources}

{{< feature-state state="beta" for_k8s_version="v1.23" >}}
{{< feature-state state="stable" for_k8s_version="v1.28" >}}

GetAllocatableResources provides information on resources initially available on the worker node.
It provides more information than kubelet exports to APIServer.
Expand All @@ -338,16 +354,6 @@ message AllocatableResourcesResponse {
}
```

Starting from Kubernetes v1.23, the `GetAllocatableResources` is enabled by default.
You can disable it by turning off the `KubeletPodResourcesGetAllocatable`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/).

Preceding Kubernetes v1.23, to enable this feature `kubelet` must be started with the following flag:

```
--feature-gates=KubeletPodResourcesGetAllocatable=true
```

`ContainerDevices` do expose the topology information declaring to which NUMA cells the device is
affine. The NUMA cells are identified using a opaque integer ID, which value is consistent to
what device plugins report
Expand Down Expand Up @@ -381,8 +387,6 @@ Support for the `PodResourcesLister service` requires `KubeletPodResources`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to be enabled.
It is enabled by default starting with Kubernetes 1.15 and is v1 since Kubernetes 1.20.



### `Get` gRPC endpoint {#grpc-endpoint-get}

{{< feature-state state="alpha" for_k8s_version="v1.27" >}}
Expand Down
Loading

0 comments on commit 0180cc4

Please sign in to comment.