Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot build numpy in specific situation #60

Closed
rmartin16 opened this issue Sep 21, 2024 · 15 comments · Fixed by #65
Closed

Cannot build numpy in specific situation #60

rmartin16 opened this issue Sep 21, 2024 · 15 comments · Fixed by #65
Labels
bug Something isn't working

Comments

@rmartin16
Copy link
Member

rmartin16 commented Sep 21, 2024

Describe the bug

When the number of characters for the parent file path for the mobile-forge repo is sufficiently low, the numpy build fails.

File "/opt/homebrew/Cellar/[email protected]/3.11.9_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/subprocess.py", line 548, in run
with Popen(*popenargs, **kwargs) as process:
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.11.9_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/subprocess.py", line 1026, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/opt/homebrew/Cellar/[email protected]/3.11.9_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/subprocess.py", line 1955, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: '/Users/marko/Documents/Repos/mobileforge2/build/cp311/numpy/1.26.2/venv3.11-ios_13_0_arm64_iphoneos/cross/bin/ninja'

ERROR Backend subprocess exited when trying to invoke get_requires_for_build_wheel
--------------------------------------------------------------------------------
<<< Return code: 1
********************************************************************************
Failed build: numpy 1.26.2 for iphoneos 13.0 on arm64
********************************************************************************
Traceback (most recent call last):
  File "/Users/marko/Documents/Repos/mobileforge2/src/forge/build.py", line 296, in build
    self._build()
  File "/Users/marko/Documents/Repos/mobileforge2/src/forge/build.py", line 563, in _build
    self.cross_venv.run(
  File "/Users/marko/Documents/Repos/mobileforge2/src/forge/cross.py", line 375, in run
    return subprocess.run(logfile, *args, **self.cross_kwargs(kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marko/Documents/Repos/mobileforge2/src/forge/subprocess.py", line 49, in run
    raise stdlib_subprocess.CalledProcessError(return_code, args)
subprocess.CalledProcessError: Command '(['python', '-m', 'build', '--no-isolation', '--wheel', '--outdir', '/Users/marko/Documents/Repos/mobileforge2/dist'],)' returned non-zero exit status 1.

Steps to reproduce

Clone mobile-forge in to /tmp and attempt to build numpy.

Expected behavior

Numpy can build regardless of where mobile-forge exists in the file system.

The underlying problem is the shebang that pip creates for ninja which is used by numpy's build system. Numpy invokes ninja with this file:

> cat mobile-forge/build/cp311/numpy/1.26.2/venv3.11-ios_13_0_arm64_iphoneos/cross/bin/ninja

#!/Users/user/github/beeware/mobile-forge/build/cp311/numpy/1.26.2/venv3.11-ios_13_0_arm64_iphoneos/cross/bin/python3.11
# -*- coding: utf-8 -*-
import re
import sys
from ninja import ninja
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(ninja())

The problem here is that venv3.11-ios_13_0_arm64_iphoneos/cross/bin/python3.11 is not a binary; it is another python script that eventually calls os.execv() for venv3.11-ios_13_0_arm64_iphoneos/build/bin/python3.11....and this python3.11 is a binary.

pip creates this shebang when installing ninja. As you can see there, if the shebang is long enough, it creates a shebang that has /bin/sh run venv3.11-ios_13_0_arm64_iphoneos/build/bin/python3.11. This avoids this issue altogether.

pip does a lot of introspection to determine what to use for the shebang....and a lot of the possible values come from sysconfig. Given how much crossenv is messing with sysconfig, this could possibly be considered a bug in crossenv.

Screenshots

No response

Environment

  • Operating System: Sonoma
  • Python version: 3.11
  • Software versions:

Logs

numpy-1.26.2-cp311-ios_13_0_x86_64_iphonesimulator.log
numpy-1.26.2-cp311-ios_13_0_arm64_iphonesimulator.log
numpy-1.26.2-cp311-ios_13_0_arm64_iphoneos.log

Additional context

No response

@rmartin16 rmartin16 added the bug Something isn't working label Sep 21, 2024
@freakboy3742
Copy link
Member

Confirmed that I can reproduce this - and, interestingly, I also see the "forge failure causes shell exit" bug in the /tmp-based environment.

@freakboy3742
Copy link
Member

I've resolved the "forge failure causes shell exit" problem - but now I'm seeing this build numpy failure on every build... still trying to understand what is going on here. It appears to be a discrepancy between what happens when you invoke /path/to/ninja at a venv prompt, and what happens when you run subprocess.run(["/path/to/ninja"]) in a python shell in the same venv.

@rmartin16
Copy link
Member Author

rmartin16 commented Sep 24, 2024

My thought was it's because subprocess.run() is assuming a binary at some point to invoke; it appears to respect the shebang when the first index of the command list for run() isn't a binary itself....but it doesn't then keep recursively reading the shebangs down this chain. For instance, I think if you add shell=true, then it will handle this non-binary shebang.

@freakboy3742
Copy link
Member

shell=True doesn't appear to work, because it assumes that the script being referenced is a shell script, so it breaks because the script content is actually Python code.

The problem appears to be a problem with the OS-level execv call, which is rejecting the initial binary because the shebang is script, not an actual binary... but I have no idea why this was working just 2 weeks ago. My only assumption is that some dependency in the chain that leads to ninja being installed has changed... but I can't work out what dependency that would be. pip, wheel and ninja itself have all been stable for ~2 months; the only package that has changes is setuptools... but setuptools shouldn't be involved in either building or installing ninja.

@rmartin16
Copy link
Member Author

Perhaps just for my own edification, how do you know this wasn't happening two weeks ago? I guess more to the point, are you confident that testing was using a short enough parent file path to mobile-forge for the issue to have had a chance to present?

My impression is this has been a latent issue that I just happened to finally encounter.

@freakboy3742
Copy link
Member

I know it was working 2 weeks ago because I was able to publish numpy versions using this script; I'm now seeing this problem in every build, not just the /tmp ones.

However.... those builds were done prior to the cross rename... and the older name was considerably longer...

@freakboy3742
Copy link
Member

Yup. It's the path rewriting thing you flagged.

Prior to the cross rename, the code worked because I was building in /Users/rkm/beeware/mobile-forge/build/cp312/numpy/1.26.2/venv3.12-ios_13_0_arm64_iphoneos/venv3.12-ios_13_0_arm64_iphoneos/bin/python3.12 (138 chars); it now uses /Users/rkm/beeware/mobile-forge/build/cp312/numpy/1.26.2/venv3.12-ios_13_0_arm64_iphoneos/cross/bin/python3.12 (111 chars), so it no longer uses the exec shim, causing the build to fail.

/me swears extensively.

So - there's a fix needed in crossenv to ensure that the exec shim is always used; with an alternate (more permanent) fix upstream in pip as well.

@freakboy3742
Copy link
Member

(also, put another tally in the "RKM, just pay attention to what Russell Martin says, and your life will be a lot easier" column... :-) )

@rmartin16
Copy link
Member Author

haha, careful! I can't vouch for everything I say... FWIW, I am still curious if this is a valid file path for crossenv to put in to sysconfig; while we've only seen an issue when the file path is put in to a shebang, perhaps other use cases also assume it to be directly executable or something.

@freakboy3742
Copy link
Member

The path that is in the shebang is the "host Python" - however, the "host Python" can't be an executable (at least, not easily), because it's a shell wrapper around the build Python. The only way that you could make it an actual executable is by writing a binary wrapper that does what a shell script does... which is possible, but not really nice to work with.

Given that the script works fine in every context except os.execv(), I think it's not an unreasonable value for the shebang.

I've pushed an update to my branch of crossenv that patches the problem. I'm working on a bug report and patch for distutils (which will eventually turn in to a vendored patch in pip).

@freakboy3742
Copy link
Member

The patch for distlib is now available: pypa/distlib#231

This patch has also been incorporated into the patch for pip: pypa/pip#12962

Once the patch for pip has been landed, the crossenv modification will no longer be required; however, there will be a need to modify mobile-forge ensure that the version of pip is up to date.

As soon as someone can confirm that numpy builds are working for them, I think this bug can be closed.

@freakboy3742
Copy link
Member

A patch to enforce pip updates: #64

@rmartin16
Copy link
Member Author

I've resolved the "forge failure causes shell exit" problem

Curious...what was the resolution here?

As soon as someone can confirm that numpy builds are working for them, I think this bug can be closed.

Builds in my VM now.

@freakboy3742
Copy link
Member

Calling set +e at the end of the setup script. Since that script is sourced, the set -e call persists into the new environment; so any failure in a tool being invoked in the new environment causes an exit of the shell.

@freakboy3742
Copy link
Member

I'm in the process of adding numpy to the CI system in #65, as that would have revealed this bug; but I'm hitting a C++ issue on x86 builds. I'll use that PR to close this one once I've worked out what is going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants