The point of the talk is it is non-trivial to detect those dependencies.
It looks like most of the time was spent discussing Python. I suspect that is because it is possible to create software without an explicit build stage, so you would not receive warnings about a dependency until the code is called. If the software treats it as an optional dependency, you may not receive any warnings. This sort of situation is by no means unique to interpreted languages. You can write a program in C, then load a library at run time. (I've never tried this sort of thing, so I don't know how the compiler handles unknown identifiers/symbols.) Heck, even the Linux kernel is expected to run "hidden packages" (i.e. the kernel has no means of tracking the origin of software you ask for it to run).
Yes, you can write software to detect when an inspected application loads external binaries. No, it is not trivial (especially if the software developer was trying to hide a dependency).
And just a quibble: even bootstrapping requires the use of a binary (unless you go to unbelievably extraordinary measures).
pjmlp 2 hours ago [-]
Yeah, and Gentoo exists.
Except mankind uses other platforms as well, and even having the source code available isn't enough if no one is looking into it for vulnerabilities.
3 hours ago [-]
woodruffw 2 hours ago [-]
Seth Larson gave a talk on this (with a focus on Python as well) at PyCon US last year[1] as well.
It's a non-trivial issue, in terms of balancing conflicting interests: Python (like most interpreted languages) has a story for integrating native libraries, but that story is not particularly user friendly (in terms of users, Python developers, etc. not having the domain expertise to debug failing native builds). So these ecosystems tend to develop bespoke mechanisms for stashing native binaries inside package distributions, turning a build reliability problem into an introspection problem.
> In almost all ecosystems, it is difficult to keep track of binary dependencies. When you depend on a package’s source code, this is normally recorded in your manifest file — pyproject.toml, package.json and so on. However, when you depend on a package’s precompiled binaries, this information is usually not recorded anywhere. This means that the binary dependency relationship between your project and whatever you’re depending on is hidden — so we can say that you have a phantom binary dependency.
I know it comes up every time... but nix does kinda exist to solve this problem. At least in pure mode.
pjmlp 2 hours ago [-]
Now we just have to improve its ergonomics, while supporting all existing operating systems in production.
okanat 26 minutes ago [-]
I think the Conda ecosystem is the closest and has even better ergonomics than Nix. Especially with Pixi, it is a joy to use.
pjmlp 22 minutes ago [-]
If one is using Python.
All these s suggestions always fall off, because they are special cases for given programming languages, or operating systems.
mplanchard 2 hours ago [-]
This is one of the reasons I like having a nix flake in all of my projects that defines a dev environment, and integration with direnv to activate it. The flake lockfile, combined with the language-specific lockfile, gives a mostly complete picture of everything needed to build/deploy/develop the package.
pabs3 7 hours ago [-]
Personally I like using Debian packages to keep track of source and binary dependencies.
https://bootstrappable.org/ https://lwn.net/Articles/983340/ https://github.com/fosslinux/live-bootstrap https://stagex.tools/
It looks like most of the time was spent discussing Python. I suspect that is because it is possible to create software without an explicit build stage, so you would not receive warnings about a dependency until the code is called. If the software treats it as an optional dependency, you may not receive any warnings. This sort of situation is by no means unique to interpreted languages. You can write a program in C, then load a library at run time. (I've never tried this sort of thing, so I don't know how the compiler handles unknown identifiers/symbols.) Heck, even the Linux kernel is expected to run "hidden packages" (i.e. the kernel has no means of tracking the origin of software you ask for it to run).
Yes, you can write software to detect when an inspected application loads external binaries. No, it is not trivial (especially if the software developer was trying to hide a dependency).
And just a quibble: even bootstrapping requires the use of a binary (unless you go to unbelievably extraordinary measures).
Except mankind uses other platforms as well, and even having the source code available isn't enough if no one is looking into it for vulnerabilities.
It's a non-trivial issue, in terms of balancing conflicting interests: Python (like most interpreted languages) has a story for integrating native libraries, but that story is not particularly user friendly (in terms of users, Python developers, etc. not having the domain expertise to debug failing native builds). So these ecosystems tend to develop bespoke mechanisms for stashing native binaries inside package distributions, turning a build reliability problem into an introspection problem.
[1]: https://www.youtube.com/watch?v=x9K3xPmi_tg
I know it comes up every time... but nix does kinda exist to solve this problem. At least in pure mode.
All these s suggestions always fall off, because they are special cases for given programming languages, or operating systems.