Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting MetadataScripts startup to false may result in google-startup-scripts service to enter failed state #107

Open
action opened this issue Apr 19, 2021 · 4 comments

Comments

@action
Copy link

action commented Apr 19, 2021

Problem:
After disabling startup scripts in the instance config, the google-startup-scripts service may enter a failed state after rebooting the associated VM.

Expectation:
The google-startup-scripts service does not enter a failed state, after booting, because startup scripts are disabled in the instance config.


Snippet of detected failure:

$ systemctl --failed --all
  UNIT                           LOAD   ACTIVE SUB    DESCRIPTION
● google-startup-scripts.service loaded failed failed Google Compute Engine Startup Scripts

Take a look at the service's status:

$ sudo systemctl status google-startup-scripts.service
● google-startup-scripts.service - Google Compute Engine Startup Scripts
   Loaded: loaded (/lib/systemd/system/google-startup-scripts.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2021-04-19 18:32:44 UTC; 3min 6s ago
  Process: 3016 ExecStart=/usr/bin/google_metadata_script_runner startup (code=exited, status=2)
 Main PID: 3016 (code=exited, status=2)

Apr 19 18:32:44 aa-qa-6080-gcp0 systemd[1]: Starting Google Compute Engine Startup Scripts...
Apr 19 18:32:44 aa-qa-6080-gcp0 google_metadata_script_runner[3016]: startup scripts disabled in instance config
Apr 19 18:32:44 aa-qa-6080-gcp0 systemd[1]: google-startup-scripts.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Apr 19 18:32:44 aa-qa-6080-gcp0 systemd[1]: google-startup-scripts.service: Failed with result 'exit-code'.
Apr 19 18:32:44 aa-qa-6080-gcp0 systemd[1]: Failed to start Google Compute Engine Startup Scripts.

Inspect contents of /etc/default/instance_configs.cfg:

$ cat /etc/default/instance_configs.cfg | tail -8
#
# Disable user supplied startup/shutdown scripts from running on
# the engine.
#
[MetadataScripts]
shutdown = false
startup = false
# END ANSIBLE MANAGED BLOCK

The service logged that it failed due to the result of an an "exit-code", let's take a closer look:

$ sudo google_metadata_script_runner startup
startup scripts disabled in instance config

$ echo $?
2

Details of the google-guest-agent package:

$ dpkg-query --status google-guest-agent
Package: google-guest-agent
Status: install ok installed
Priority: optional
Section: devel
Installed-Size: 23901
Maintainer: Ubuntu Developers <[email protected]>
Architecture: amd64
Version: 20201217.02-0ubuntu1~18.04.0
Replaces: gce-compute-image-packages (<< 20191115)
Depends: libc6 (>= 2.4)
Breaks: gce-compute-image-packages (<< 20191115), python3-google-compute-engine
Description: Google Compute Engine Guest Agent
 Contains the guest agent and metadata script runner binaries.
Built-Using: golang-1.13 (= 1.13.8-1ubuntu1~18.04.2)
Homepage: https://github.com/GoogleCloudPlatform/guest-agent

Please let me know if there is any additional information I can provide that will be helpful to reproduce, diagnose, or address the issue.

Our expectation was that by following the instructions (found here: https://github.com/GoogleCloudPlatform/guest-agent#configuration) to disable startup scripts, the associated services would continue to execute gracefully. It was an unexpected result to find the google-startup-scripts service in a failed state.

@hopkiw
Copy link
Contributor

hopkiw commented Apr 19, 2021

this is by design. please take note of the output message when you invoke the startup script runner: "startup scripts disabled in instance config".

can you share what the impact of having this service in ActiveState=failed ?

@action
Copy link
Author

action commented Apr 19, 2021

Thank you for the quick response and explaining that this might be expected behavior.

My team finds it a little odd for a service to be in a failed state and for a failed state to be an expected state for a service. Our general expectation is that a failed service means that attention is required. We expect that no services should be in a failed state on our system when things are configured successfully and operating as expected.

The impact of having a service in a failed state is that our process for certifying that our product is running successfully on GCP is resulting in an error.

@hopkiw
Copy link
Contributor

hopkiw commented Apr 19, 2021

Yes, I understand it is not intuitive. With systemd, it is not necessarily expected that every service will succeed, and many situations where the condition features for systemd units are insufficient simply allow their services to fail. Systems administrators are expected to define their own conditions for which services must succeed and what to do in case of failures.

We will consider changing the exit behavior in this scenario, but it takes time to roll out such changes. In the meantime, when you add the instance config entry, you can also disable the startup scripts service, which should resolve the issue for you immediately.

@action
Copy link
Author

action commented Apr 19, 2021

Thank you for addressing my questions and concerns. Your feedback was helpful and informative. Appreciate you.

We look forward to the exit behavior changing in regards to this scenario. We understand that changes can take time, and we will keep an eye on this issue to stay informed of any changes.

In the meantime, I will have my team look into how we can mitigate the issue on our end; including, as you suggested, disabling the google-startup-scripts service.

patelne pushed a commit to patelne/guest-agent that referenced this issue Feb 17, 2022
* add image license test

* fix go lint

* fix gocheck

* adress comment

* fix package name

* fix gocheck

* rebase conflict

* LICENSE

* add vm in setup to run test

* add distrios support

* rename

* rename

* fix typo

* fix

* refactor

* address comment

* fix

* address comment

* remove log

* address comment

* address comment

* address comment

* small fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants