Discussion:
[Import-SIG] Round 2 for "A ModuleSpec Type for the Import System"
Eric Snow
2013-08-09 22:58:09 UTC
Permalink
Here's an updated version of the PEP for ModuleSpec which addresses the
feedback I've gotten. Thanks for the help. The big open question, to me,
is whether or not to have a separate reload() method. I'll be looking into
that when I get a chance. There's also the question of a path-based
subclass, but I'm currently not convinced it's worth it.

-eric

-----------------------------------

PEP: 4XX
Title: A ModuleSpec Type for the Import System
Version: $Revision$
Last-Modified: $Date$
Author: Eric Snow <ericsnowcurrently at gmail.com>
BDFL-Delegate: ???
Discussions-To: import-sig at python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 8-Aug-2013
Python-Version: 3.4
Post-History: 8-Aug-2013
Resolution:


Abstract
========

This PEP proposes to add a new class to ``importlib.machinery`` called
``ModuleSpec``. It will contain all the import-related information
about a module without needing to load the module first. Finders will
now return a module's spec rather than a loader. The import system will
use the spec to load the module.


Motivation
==========

The import system has evolved over the lifetime of Python. In late 2002
PEP 302 introduced standardized import hooks via ``finders`` and
``loaders`` and ``sys.meta_path``. The ``importlib`` module, introduced
with Python 3.1, now exposes a pure Python implementation of the APIs
described by PEP 302, as well as of the full import system. It is now
much easier to understand and extend the import system. While a benefit
to the Python community, this greater accessibilty also presents a
challenge.

As more developers come to understand and customize the import system,
any weaknesses in the finder and loader APIs will be more impactful. So
the sooner we can address any such weaknesses the import system, the
better...and there are a couple we can take care of with this proposal.

Firstly, any time the import system needs to save information about a
module we end up with more attributes on module objects that are
generally only meaningful to the import system and occoasionally to some
people. It would be nice to have a per-module namespace to put future
import-related information. Secondly, there's an API void between
finders and loaders that causes undue complexity when encountered.

Finders are strictly responsible for providing the loader which the
import system will use to load the module. The loader is then
responsible for doing some checks, creating the module object, setting
import-related attributes, "installing" the module to ``sys.modules``,
and loading the module, along with some cleanup. This all takes place
during the import system's call to ``Loader.load_module()``. Loaders
also provide some APIs for accessing data associated with a module.

Loaders are not required to provide any of the functionality of
``load_module()`` through other methods. Thus, though the import-
related information about a module is likely available without loading
the module, it is not otherwise exposed.

Furthermore, the requirements assocated with ``load_module()`` are
common to all loaders and mostly are implemented in exactly the same
way. This means every loader has to duplicate the same boilerplate
code. ``importlib.util`` provides some tools that help with this, but
it would be more helpful if the import system simply took charge of
these responsibilities. The trouble is that this would limit the degree
of customization that ``load_module()`` facilitates. This is a gap
between finders and loaders which this proposal aims to fill.

Finally, when the import system calls a finder's ``find_module()``, the
finder makes use of a variety of information about the module that is
useful outside the context of the method. Currently the options are
limited for persisting that per-module information past the method call,
since it only returns the loader. Popular options for this limitation
are to store the information in a module-to-info mapping somewhere on
the finder itself, or store it on the loader.

Unfortunately, loaders are not required to be module-specific. On top
of that, some of the useful information finders could provide is
common to all finders, so ideally the import system could take care of
that. This is the same gap as before between finders and loaders.

As an example of complexity attributable to this flaw, the
implementation of namespace packages in Python 3.3 (see PEP 420) added
``FileFinder.find_loader()`` because there was no good way for
``find_module()`` to provide the namespace path.

The answer to this gap is a ``ModuleSpec`` object that contains the
per-module information and takes care of the boilerplate functionality
of loading the module.

(The idea gained momentum during discussions related to another PEP.[1])


Specification
=============

The goal is to address the gap between finders and loaders while
changing as little of their semantics as possible. Though some
functionality and information is moved the new ``ModuleSpec`` type,
their semantics should remain the same. However, for the sake of
clarity, those semantics will be explicitly identified.

A High-Level View
-----------------

...

ModuleSpec
----------

A new class which defines the import-related values to use when loading
the module. It closely corresponds to the import-related attributes of
module objects. ``ModuleSpec`` objects may also be used by finders and
loaders and other import-related APIs to hold extra import-related
state about the module. This greatly reduces the need to add any new
new import-related attributes to module objects, and loader ``__init__``
methods won't need to accommodate such per-module state.

Creating a ModuleSpec:

``ModuleSpec(name, loader, *, origin=None, filename=None, cached=None,
path=None)``

The parameters have the same meaning as the attributes described below.
However, not all ``ModuleSpec`` attributes are also parameters. The
passed values are set as-is. For calculated values use the
``from_loader()`` method.

ModuleSpec Attributes
---------------------

Each of the following names is an attribute on ``ModuleSpec`` objects.
A value of ``None`` indicates "not set". This contrasts with module
objects where the attribute simply doesn't exist.

While ``package`` and ``is_package`` are read-only properties, the
remaining attributes can be replaced after the module spec is created
and after import is complete. This allows for unusual cases where
modifying the spec is the best option. However, typical use should not
involve changing the state of a module's spec.

Most of the attributes correspond to the import-related attributes of
modules. Here is the mapping, followed by a description of the
attributes. The reverse of this mapping is used by
``init_module_attrs()``.

============= ===========
On ModuleSpec On Modules
============= ===========
name __name__
loader __loader__
package __package__
is_package -
origin -
filename __file__
cached __cached__
path __path__
============= ===========

``name``

The module's fully resolved and absolute name. It must be set.

``loader``

The loader to use during loading and for module data. These specific
functionalities do not change for loaders. Finders are still
responsible for creating the loader and this attribute is where it is
stored. The loader must be set.

``package``

The name of the module's parent. This is a dynamic attribute with a
value derived from ``name`` and ``is_package``. For packages it is the
value of ``name``. Otherwise it is equivalent to
``name.rpartition('.')[0]``. Consequently, a top-level module will have
give the empty string for ``package``.


``is_package``

Whether or not the module is a package. This dynamic attribute is True
if ``path`` is set (even if empty), else it is false.

``origin``

A string for the location from which the module originates. If
``filename`` is set, ``origin`` should be set to the same value unless
some other value is more appropriate. ``origin`` is used in
``module_repr()`` if it does not match the value of ``filename``.

Using ``filename`` for this meaning would be inaccurate, since not all
modules have path-based locations. For instance, built-in modules do
not have ``__file__`` set. Yet it is useful to have a descriptive
string indicating that it originated from the interpreter as a built-in
module. So built-in modules will have ``origin`` set to ``"built-in"``.

Path-based attributes:

If any of these is set, it indicates that the module is path-based. For
reference, a path entry is a string for a location where the import
system will look for modules, e.g. the path entries in ``sys.path`` or a
package's ``__path__``).

``filename``

Like ``origin``, but limited to a path-based location. If ``filename``
is set, ``origin`` should be set to the same string, unless origin is
explicitly set to something else. ``filename`` is not necessarily an
actual file name, but could be any location string based on a path
entry. Regarding the attribute name, while it is potentially
inaccurate, it is both consistent with the equivalent module attribute
and generally accurate.

.. XXX Would a different name be better? ``path_location``?

``cached``

The path-based location where the compiled code for a module should be
stored. If ``filename`` is set to a source file, this should be set to
corresponding path that PEP 3147 specifies. The
``importlib.util.source_to_cache()`` function facilitates getting the
correct value.

``path``

The list of path entries in which to search for submodules if this
module is a package. Otherwise it is ``None``.

.. XXX add a path-based subclass?

ModuleSpec Methods
------------------

``from_loader(name, loader, *, is_package=None, origin=None, filename=None,
cached=None, path=None)``

.. XXX use a different name?

A factory classmethod that returns a new ``ModuleSpec`` derived from the
arguments. ``is_package`` is used inside the method to indicate that
the module is a package. If not explicitly passed in, it is set to
``True`` if ``path`` is passed in. It falls back to using the result of
the loader's ``is_package()``, if available. Finally it defaults to
False. The remaining parameters have the same meaning as the
corresponding ``ModuleSpec`` attributes.

In contrast to ``ModuleSpec.__init__()``, which takes the arguments
as-is, ``from_loader()`` calculates missing values from the ones passed
in, as much as possible. This replaces the behavior that is currently
provided the several ``importlib.util`` functions as well as the
optional ``init_module_attrs()`` method of loaders. Just to be clear,
here is a more detailed description of those calculations::

If not passed in, ``filename`` is to the result of calling the
loader's ``get_filename()``, if available. Otherwise it stays
unset (``None``).

If not passed in, ``path`` is set to an empty list if
``is_package`` is true. Then the directory from ``filename`` is
appended to it, if possible. If ``is_package`` is false, ``path``
stays unset.

If ``cached`` is not passed in and ``filename`` is passed in,
``cached`` is derived from it. For filenames with a source suffix,
it set to the result of calling
``importlib.util.cache_from_source()``. For bytecode suffixes (e.g.
``.pyc``), ``cached`` is set to the value of ``filename``. If
``filename`` is not passed in or ``cache_from_source()`` raises
``NotImplementedError``, ``cached`` stays unset.

If not passed in, ``origin`` is set to ``filename``. Thus if
``filename`` is unset, ``origin`` stays unset.

``module_repr()``

Returns a repr string for the module if ``origin`` is set and
``filename`` is not set. The string refers to the value of ``origin``.
Otherwise ``module_repr()`` returns None. This indicates to the module
type's ``__repr__()`` that it should fall back to the default repr.

We could also have ``module_repr()`` produce the repr for the case where
``filename`` is set or where ``origin`` is not set, mirroring the repr
that the module type produces directly. However, the repr string is
derived from the import-related module attributes, which might be out of
sync with the spec.

.. XXX Is using the spec close enough? Probably not.

The implementation of the module type's ``__repr__()`` will change to
accommodate this PEP. However, the current functionality will remain to
handle the case where a module does not have a ``__spec__`` attribute.

``init_module_attrs(module)``

Sets the module's import-related attributes to the corresponding values
in the module spec. If a path-based attribute is not set on the spec,
it is not set on the module. For the rest, a ``None`` value on the spec
(aka "not set") means ``None`` will be set on the module. If any of the
attributes are already set on the module, the existing values are
replaced. The module's own ``__spec__`` is not consulted but does get
replaced with the spec on which ``init_module_attrs()`` was called.
The earlier mapping of ``ModuleSpec`` attributes to module attributes
indicates which attributes are involved on both sides.

``load(module=None, *, is_reload=False)``

This method captures the current functionality of and requirements on
``Loader.load_module()`` without any semantic changes, except one.
Reloading a module when ``exec_module()`` is available actually uses
``module`` rather than ignoring it in favor of the one in
``sys.modules``, as ``Loader.load_module()`` does.

``module`` is only allowed when ``is_reload`` is true. This means that
``is_reload`` could be dropped as a parameter. However, doing so would
mean we could not use ``None`` to indicate that the module should be
pulled from ``sys.modules``. Furthermore, ``is_reload`` makes the
intent of the call clear.

There are two parts to what happens in ``load()``. First, the module is
prepared, loaded, updated appropriately, and left available for the
second part. This is described in more detail shortly.

Second, in the case of error during a normal load (not reload) the
module is removed from ``sys.modules``. If no error happened, the
module is pulled from ``sys.modules``. This the module returned by
``load()``. Before it is returned, if it is a different object than the
one produced by the first part, attributes of the module from
``sys.modules`` are updated to reflect the spec.

Returning the module from ``sys.modules`` accommodates the ability of
the module to replace itself there while it is executing (during load).

As already noted, this is what already happens in the import system.
``load()`` is not meant to change any of this behavior.

Regarding the first part of ``load()``, the following describes what
happens. It depends on if ``is_reload`` is true and if the loader has
``exec_module()``.

For normal load with ``exec_module()`` available::

A new module is created, ``init_module_attrs()`` is called to set
its attributes, and it is set on sys.modules. At that point
the loader's ``exec_module()`` is called, after which the module
is ready for the second part of loading.

.. XXX What if the module already exists in sys.modules?

For normal load without ``exec_module()`` available::

The loader's ``load_module()`` is called and the attributes of the
module it returns are updated to match the spec.

For reload with ``exec_module()`` available::

If ``module`` is ``None``, it is pulled from ``sys.modules``. If
still ``None``, ImportError is raised. Otherwise ``exec_module()``
is called, passing in the module-to-be-reloaded.

For reload without ``exec_module()`` available::

The loader's ``load_module()`` is called and the attributes of the
module it returns are updated to match the spec.

There is some boilerplate involved when ``exec_module()`` is available,
but only the boilerplate that the import system uses currently.

If ``loader`` is not set (``None``), ``load()`` raises a ValueError. If
``module`` is passed in but ``is_reload`` is false, a ValueError is also
raises to indicate that ``load()`` was called incorrectly. There may be
use cases for calling ``load()`` in that way, but they are outside the
scope of this PEP

.. XXX add reload(module=None) and drop load()'s parameters entirely?
.. XXX add more of importlib.reload()'s boilerplate to load()/reload()?

Backward Compatibility
----------------------

Since ``Finder.find_module()`` methods would now return a module spec
instead of loader, specs must act like the loader that would have been
returned instead. This is relatively simple to solve since the loader
is available as an attribute of the spec. We will use ``__getattr__()``
to do it.

However, ``ModuleSpec.is_package`` (an attribute) conflicts with
``InspectLoader.is_package()`` (a method). Working around this requires
a more complicated solution but is not a large obstacle. Simply making
``ModuleSpec.is_package`` a method does not reflect that is a relatively
static piece of data. ``module_repr()`` also conflicts with the same
method on loaders, but that workaround is not complicated since both are
methods.

Unfortunately, the ability to proxy does not extend to ``id()``
comparisons and ``isinstance()`` tests. In the case of the return value
of ``find_module()``, we accept that break in backward compatibility.
However, we will mitigate the problem with ``isinstance()`` somewhat by
registering ``ModuleSpec`` on the loaders in ``importlib.abc``.

Subclassing
-----------

Subclasses of ModuleSpec are allowed, but should not be necessary.
Adding functionality to a custom finder or loader will likely be a
better fit and should be tried first. However, as long as a subclass
still fulfills the requirements of the import system, objects of that
type are completely fine as the return value of ``find_module()``.

Module Objects
--------------

Module objects will now have a ``__spec__`` attribute to which the
module's spec will be bound. None of the other import-related module
attributes will be changed or deprecated, though some of them could be;
any such deprecation can wait until Python 4.

``ModuleSpec`` objects will not be kept in sync with the corresponding
module object's import-related attributes. Though they may differ, in
practice they will typically be the same.

Finders
-------

Finders will now return ModuleSpec objects when ``find_module()`` is
called rather than loaders. For backward compatility, ``Modulespec``
objects proxy the attributes of their ``loader`` attribute.

Adding another similar method to avoid backward-compatibility issues
is undersireable if avoidable. The import APIs have suffered enough,
especially considering ``PathEntryFinder.find_loader()`` was just
added in Python 3.3. The approach taken by this PEP should be
sufficient to address backward-compatibility issues for
``find_module()``.

The change to ``find_module()`` applies to both ``MetaPathFinder`` and
``PathEntryFinder``. ``PathEntryFinder.find_loader()`` will be
deprecated and, for backward compatibility, implicitly special-cased if
the method exists on a finder.

Finders are still responsible for creating the loader. That loader will
now be stored in the module spec returned by ``find_module()`` rather
than returned directly. As is currently the case without the PEP, if a
loader would be costly to create, that loader can be designed to defer
the cost until later.

Loaders
-------

Loaders will have a new method, ``exec_module(module)``. Its only job
is to "exec" the module and consequently populate the module's
namespace. It is not responsible for creating or preparing the module
object, nor for any cleanup afterward. It has no return value.

The ``load_module()`` of loaders will still work and be an active part
of the loader API. It is still useful for cases where the default
module creation/prepartion/cleanup is not appropriate for the loader.

For example, the C API for extension modules only supports the full
control of ``load_module()``. As such, ``ExtensionFileLoader`` will not
implement ``exec_module()``. In the future it may be appropriate to
produce a second C API that would support an ``exec_module()``
implementation for ``ExtensionFileLoader``. Such a change is outside
the scope of this PEP.

A loader must have at least one of ``exec_module()`` and
``load_module()`` defined. If both exist on the loader,
``ModuleSpec.load()`` uses ``exec_module()`` and ignores
``load_module()``.

PEP 420 introduced the optional ``module_repr()`` loader method to limit
the amount of special-casing in the module type's ``__repr__()``. Since
this method is part of ``ModuleSpec``, it will be deprecated on loaders.
However, if it exists on a loader it will be used exclusively.

``Loader.init_module_attr()`` method, added prior to Python 3.4's
release , will be removed in favor of the same method on ``ModuleSpec``.

However, ``InspectLoader.is_package()`` will not be deprecated even
though the same information is found on ``ModuleSpec``. ``ModuleSpec``
can use it to populate its own ``is_package`` if that information is
not otherwise available. Still, it will be made optional.

The path-based loaders in ``importlib`` take arguments in their
``__init__()`` and have corresponding attributes. However, the need for
those values is eliminated. The only exception is
``FileLoader.get_filename()``, which uses ``self.path``. The signatures
for these loaders and the accompanying attributes will be deprecated.

In addition to executing a module during loading, loaders will still be
directly responsible for providing APIs concerning module-related data.

Other Changes
-------------

* The various finders and loaders provided by ``importlib`` will be
updated to comply with this proposal.

* The spec for the ``__main__`` module will reflect how the interpreter
was started. For instance, with ``-m`` the spec's name will be that of
the run module, while ``__main__.__name__`` will still be "__main__".

* We add ``importlib.find_module()`` to mirror
``importlib.find_loader()`` (which becomes deprecated).

* Deprecations in ``importlib.util``: ``set_package()``,
``set_loader()``, and ``module_for_loader()``. ``module_to_load()``
(introduced prior to Python 3.4's release) can be removed.

* ``importlib.reload()`` is changed to use ``ModuleSpec.load()``.

* ``ModuleSpec.load()`` and ``importlib.reload()`` will now make use of
the per-module import lock, whereas ``Loader.load_module()`` did not.

Reference Implementation
------------------------

A reference implementation is available at <TBD>.


References
==========

[1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html


Copyright
=========

This document has been placed in the public domain.

..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130809/8fc6e9e9/attachment-0001.html>
Nick Coghlan
2013-08-11 13:03:00 UTC
Permalink
I think this is solid enough to be worth adding to the PEPs repo now.
Post by Eric Snow
Here's an updated version of the PEP for ModuleSpec which addresses the
feedback I've gotten. Thanks for the help. The big open question, to me,
is whether or not to have a separate reload() method. I'll be looking into
that when I get a chance. There's also the question of a path-based
subclass, but I'm currently not convinced it's worth it.
One piece of feedback from me (triggered by the C extension modules
discussion on python-dev): we should consider proposing a new "exec"
hook for C extension modules that could be defined instead of or in
addition to the existing PEP 3121 init hook.

Extension modules that don't rely on mutable static variables or the
PEP 3121 per-interpreter state APIs could just define the new exec
hook and get a new module instance every time they're imported. Those
that do have per-interpreter state would still get an opportunity to
run additional code after all the magic attributes have been set.

Also, to handle the extension module case, we may need to let loaders
define an optional "create_module" method that accepts the MethodSpec
object as an argument. The extension module loader would implement
this as handling the PyInit_<modulename> call. (Setting the magic
attributes according to the spec would happen automatically after the
call, so each loader wouldn't need to implement that part)

(Note: once I get back to Australia around the 22nd, I should have
time to help out more directly with this)
Post by Eric Snow
-----------------------------------
Firstly, any time the import system needs to save information about a
module we end up with more attributes on module objects that are
generally only meaningful to the import system and occoasionally to some
Typo: occoasionally
Post by Eric Snow
people. It would be nice to have a per-module namespace to put future
import-related information. Secondly, there's an API void between
finders and loaders that causes undue complexity when encountered.
Finders are strictly responsible for providing the loader which the
"are currently responsible" (since the PEP is about changing the
responsibiity of finders, this is a little unclear at present)
Post by Eric Snow
Specification
=============
The goal is to address the gap between finders and loaders while
changing as little of their semantics as possible. Though some
functionality and information is moved the new ``ModuleSpec`` type,
"moved to the new"
Post by Eric Snow
their semantics should remain the same. However, for the sake of
clarity, those semantics will be explicitly identified.
A High-Level View
-----------------
...
Not sure a high level view is needed, but you can fill this in if you want :)
Post by Eric Snow
ModuleSpec
----------
A new class which defines the import-related values to use when loading
the module. It closely corresponds to the import-related attributes of
module objects. ``ModuleSpec`` objects may also be used by finders and
loaders and other import-related APIs to hold extra import-related
state about the module. This greatly reduces the need to add any new
new import-related attributes to module objects, and loader ``__init__``
methods won't need to accommodate such per-module state.
To avoid conflicts as the spec attributes evolve in the future, would
it be worth having a "custom" field which is just an arbitrary object
reference used to pass info from the finder to the loader without
troubling the rest of the import system?
Post by Eric Snow
``ModuleSpec(name, loader, *, origin=None, filename=None, cached=None,
path=None)``
The parameters have the same meaning as the attributes described below.
However, not all ``ModuleSpec`` attributes are also parameters.
The
passed values are set as-is. For calculated values use the
``from_loader()`` method.
This paragraph isn't particularly clear. Perhaps:

"Passed in parameter values are assigned directly to the corresponding
attributes below. Other attributes not listed as parameters (such as
``package``) are read-only properties that are automatically derived
from these values.

The ``ModuleSpec.from_loader()`` class method allows a suitable
ModuleSpec instance to be easily created from a PEP 302 loader object"
Post by Eric Snow
ModuleSpec Attributes
---------------------
Each of the following names is an attribute on ``ModuleSpec`` objects.
A value of ``None`` indicates "not set". This contrasts with module
objects where the attribute simply doesn't exist.
While ``package`` and ``is_package`` are read-only properties, the
remaining attributes can be replaced after the module spec is created
and after import is complete. This allows for unusual cases where
modifying the spec is the best option. However, typical use should not
involve changing the state of a module's spec.
I'm with Brett that "is_package" should go, to be replaced by
"spec.path is not None" wherever it matters. is_package() would then
fall through to the PEP 302 loader API via __getattr__.
Post by Eric Snow
``package``
The name of the module's parent. This is a dynamic attribute with a
value derived from ``name`` and ``is_package``. For packages it is the
value of ``name``. Otherwise it is equivalent to
``name.rpartition('.')[0]``. Consequently, a top-level module will have
give the empty string for ``package``.
s/give//
Post by Eric Snow
``is_package``
Whether or not the module is a package. This dynamic attribute is True
if ``path`` is set (even if empty), else it is false.
As above (i.e. don't use it)
Post by Eric Snow
``origin``
A string for the location from which the module originates. If
``filename`` is set, ``origin`` should be set to the same value unless
some other value is more appropriate. ``origin`` is used in
``module_repr()`` if it does not match the value of ``filename``.
Using ``filename`` for this meaning would be inaccurate, since not all
modules have path-based locations. For instance, built-in modules do
not have ``__file__`` set. Yet it is useful to have a descriptive
string indicating that it originated from the interpreter as a built-in
module. So built-in modules will have ``origin`` set to ``"built-in"``.
How about we *just* have origin, with a separate "set_fileattr"
attribute to indicate "this is a discrete file, you should set
__file__"?

Also, we should explicitly note that we'll still set __file__ for zip
imports, due to backwards compatibility concerns, even though it
doesn't correspond to a valid filesystem path.

(Random thought: spec.origin + spec.cached + a cache directory setting
in zipimport would give a potentially clean way to do extension module
imports from zip archives)
Post by Eric Snow
``path``
The list of path entries in which to search for submodules if this
module is a package. Otherwise it is ``None``.
Path entries don't have to correspond to filesystem locations - they
just have to make sense to at least one path hook
(e.g. a DB URI would be a valid path entry).
Post by Eric Snow
.. XXX add a path-based subclass?
Nope :)
Post by Eric Snow
ModuleSpec Methods
------------------
``from_loader(name, loader, *, is_package=None, origin=None, filename=None,
cached=None, path=None)``
.. XXX use a different name?
I'd disallow customisation on this one - if people want to customise,
they should just query the PEP 302 APIs themselves and call the
ModuleSpec constructor directly. The use case for this one should be
to make it trivial to switch from "return loader" to "return
ModuleSpec.from_loader(loader)" in a find_module implementation.
Post by Eric Snow
In contrast to ``ModuleSpec.__init__()``, which takes the arguments
as-is, ``from_loader()`` calculates missing values from the ones passed
in, as much as possible. This replaces the behavior that is currently
provided the several ``importlib.util`` functions as well as the
optional ``init_module_attrs()`` method of loaders. Just to be clear,
If not passed in, ``filename`` is to the result of calling the
loader's ``get_filename()``, if available. Otherwise it stays
unset (``None``).
If not passed in, ``path`` is set to an empty list if
``is_package`` is true. Then the directory from ``filename`` is
appended to it, if possible. If ``is_package`` is false, ``path``
stays unset.
If ``cached`` is not passed in and ``filename`` is passed in,
``cached`` is derived from it. For filenames with a source suffix,
it set to the result of calling
``importlib.util.cache_from_source()``. For bytecode suffixes (e.g.
``.pyc``), ``cached`` is set to the value of ``filename``. If
``filename`` is not passed in or ``cache_from_source()`` raises
``NotImplementedError``, ``cached`` stays unset.
If not passed in, ``origin`` is set to ``filename``. Thus if
``filename`` is unset, ``origin`` stays unset.
Hmm, is there a reason this can't be the default constructor
behaviour? What's the value of *not* having the sensible fallbacks,
given they can always be overridden by passing in explicit values when
you want something different?

A separate "from_module(m)" constructor would probably make sense, though.
Post by Eric Snow
``module_repr()``
Returns a repr string for the module if ``origin`` is set and
``filename`` is not set. The string refers to the value of ``origin``.
Otherwise ``module_repr()`` returns None. This indicates to the module
type's ``__repr__()`` that it should fall back to the default repr.
We could also have ``module_repr()`` produce the repr for the case where
``filename`` is set or where ``origin`` is not set, mirroring the repr
that the module type produces directly. However, the repr string is
derived from the import-related module attributes, which might be out of
sync with the spec.
.. XXX Is using the spec close enough? Probably not.
I think it makes sense to always return the expected repr based on the
spec attributes, but allow a custom origin to be passed in to handle
the case where the module __file__ attribute differs from
__spec__.origin (keeping in mind I think __spec__.filename should be
replaced with __spec__.set_fileattr)
Post by Eric Snow
The implementation of the module type's ``__repr__()`` will change to
accommodate this PEP. However, the current functionality will remain to
handle the case where a module does not have a ``__spec__`` attribute.
Experience tells us that the import system should ensure the __spec__
attribute always exists (even if it has to be filled in from the
module attributes after calling load_module)
Post by Eric Snow
``load(module=None, *, is_reload=False)``
Yep, definitely needs to be a separate method. "is_reload" would
almost always be set to a boolean, which means a separate API is
likely to be better.

However, I think the separate method should be "exec()" rather than
"reload()" and require that the module always be passed in.

We could also expose a "create" method that just creates and returns
the new module object, and replace importlib.util.module_to_load with
a context manager that accepted the module as a parameter. Say
"add_to_sys", which fails if the module is already present in
sys.modules.

load() would then look something like:

def load(self):
m = self.create()
with importlib.util.add_to_sys(m):
self.exec(m)
return sys.modules[self.name]

We could also provide reload() if we wanted to:

def reload(self):
self.exec(sys.modules[self.name])
return sys.modules[self.name]
Post by Eric Snow
Subclassing
-----------
Subclasses of ModuleSpec are allowed, but should not be necessary.
Adding functionality to a custom finder or loader will likely be a
better fit and should be tried first. However, as long as a subclass
still fulfills the requirements of the import system, objects of that
type are completely fine as the return value of ``find_module()``.
We may need to do subclasses for the ABC registration backwards
compatibility hack.
Post by Eric Snow
Module Objects
--------------
Module objects will now have a ``__spec__`` attribute to which the
module's spec will be bound. None of the other import-related module
attributes will be changed or deprecated, though some of them could be;
any such deprecation can wait until Python 4.
``ModuleSpec`` objects will not be kept in sync with the corresponding
module object's import-related attributes. Though they may differ, in
practice they will typically be the same.
Worth mentioning that __main__.__spec__.name will give the real name
of module's executed with -m here rather than delaying that until the
notes at the end.
Post by Eric Snow
Finders
-------
Finders will now return ModuleSpec objects when ``find_module()`` is
called rather than loaders. For backward compatility, ``Modulespec``
objects proxy the attributes of their ``loader`` attribute.
Adding another similar method to avoid backward-compatibility issues
is undersireable if avoidable. The import APIs have suffered enough,
especially considering ``PathEntryFinder.find_loader()`` was just
added in Python 3.3. The approach taken by this PEP should be
sufficient to address backward-compatibility issues for
``find_module()``.
The change to ``find_module()`` applies to both ``MetaPathFinder`` and
``PathEntryFinder``. ``PathEntryFinder.find_loader()`` will be
deprecated and, for backward compatibility, implicitly special-cased if
the method exists on a finder.
Actually, we don't currently have anything on ModuleSpec to indicate
"this is complete, stop scanning for more path fragments" or how we
will compose multiple module specs for the individual fragments into a
combined spec for the namespace package.
Post by Eric Snow
Finders are still responsible for creating the loader. That loader will
now be stored in the module spec returned by ``find_module()`` rather
than returned directly. As is currently the case without the PEP, if a
loader would be costly to create, that loader can be designed to defer
the cost until later.
Loaders
-------
Loaders will have a new method, ``exec_module(module)``. Its only job
is to "exec" the module and consequently populate the module's
namespace. It is not responsible for creating or preparing the module
object, nor for any cleanup afterward. It has no return value.
The ``load_module()`` of loaders will still work and be an active part
of the loader API. It is still useful for cases where the default
module creation/prepartion/cleanup is not appropriate for the loader.
For example, the C API for extension modules only supports the full
control of ``load_module()``. As such, ``ExtensionFileLoader`` will not
implement ``exec_module()``. In the future it may be appropriate to
produce a second C API that would support an ``exec_module()``
implementation for ``ExtensionFileLoader``. Such a change is outside
the scope of this PEP.
As above, I think it may worth tackling this. It shouldn't be *that*
hard given the higher level changes and will solve some hard problems
at the lower level.

Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
Eric Snow
2013-08-13 03:35:14 UTC
Permalink
Post by Nick Coghlan
I think this is solid enough to be worth adding to the PEPs repo now.
Sounds good.
Post by Nick Coghlan
Post by Eric Snow
Here's an updated version of the PEP for ModuleSpec which addresses the
feedback I've gotten. Thanks for the help. The big open question, to
me,
Post by Eric Snow
is whether or not to have a separate reload() method. I'll be looking
into
Post by Eric Snow
that when I get a chance. There's also the question of a path-based
subclass, but I'm currently not convinced it's worth it.
One piece of feedback from me (triggered by the C extension modules
discussion on python-dev): we should consider proposing a new "exec"
hook for C extension modules that could be defined instead of or in
addition to the existing PEP 3121 init hook.
Sounds good. I expect you mean as a separate proposal...
Post by Nick Coghlan
Also, to handle the extension module case, we may need to let loaders
define an optional "create_module" method that accepts the MethodSpec
object as an argument.
I'd considered that here, whether on the loader or on ModuleSpec. My plan
was to hold off on that to stay focused on the rest of the changes.
However, I'm open to adding this to the PEP.
Post by Nick Coghlan
Post by Eric Snow
A High-Level View
-----------------
...
Not sure a high level view is needed, but you can fill this in if you want :)
Forgot that was in there. :)
Post by Nick Coghlan
Post by Eric Snow
ModuleSpec
----------
A new class which defines the import-related values to use when loading
the module. It closely corresponds to the import-related attributes of
module objects. ``ModuleSpec`` objects may also be used by finders and
loaders and other import-related APIs to hold extra import-related
state about the module. This greatly reduces the need to add any new
new import-related attributes to module objects, and loader ``__init__``
methods won't need to accommodate such per-module state.
To avoid conflicts as the spec attributes evolve in the future, would
it be worth having a "custom" field which is just an arbitrary object
reference used to pass info from the finder to the loader without
troubling the rest of the import system?
I see what you're saying, but am conflicted. For some reason providing a
sub-namespace for that doesn't seem quite right. However, the alternative
runs the risk of collisions later on. Maybe we could recommend the use of
a preceding "_" for custom attributes? I'll see if I can come up with
something.
Post by Nick Coghlan
Post by Eric Snow
The parameters have the same meaning as the attributes described below.
However, not all ``ModuleSpec`` attributes are also parameters.
The
passed values are set as-is. For calculated values use the
``from_loader()`` method.
"Passed in parameter values are assigned directly to the corresponding
attributes below. Other attributes not listed as parameters (such as
``package``) are read-only properties that are automatically derived
from these values.
The ``ModuleSpec.from_loader()`` class method allows a suitable
ModuleSpec instance to be easily created from a PEP 302 loader object"
That's much better.
Post by Nick Coghlan
Post by Eric Snow
While ``package`` and ``is_package`` are read-only properties, the
remaining attributes can be replaced after the module spec is created
and after import is complete. This allows for unusual cases where
modifying the spec is the best option. However, typical use should not
involve changing the state of a module's spec.
I'm with Brett that "is_package" should go, to be replaced by
"spec.path is not None" wherever it matters. is_package() would then
fall through to the PEP 302 loader API via __getattr__.
I'm considering the recommendation, but I still feel like `is_package` as
an attribute is worth having. I see module.__spec__ as useful to more than
the import system and its hackers, and `is_package` as a value to the
broader audience that may not have learned about what __path__ means. It's
certainly not obvious that __path__ implies a package. Then again, a
person would have to be looking at __spec__ to see `is_package`, so maybe
it loses enough utility to be worth keeping.
Post by Nick Coghlan
``origin``
Post by Eric Snow
A string for the location from which the module originates. If
``filename`` is set, ``origin`` should be set to the same value unless
some other value is more appropriate. ``origin`` is used in
``module_repr()`` if it does not match the value of ``filename``.
Using ``filename`` for this meaning would be inaccurate, since not all
modules have path-based locations. For instance, built-in modules do
not have ``__file__`` set. Yet it is useful to have a descriptive
string indicating that it originated from the interpreter as a built-in
module. So built-in modules will have ``origin`` set to ``"built-in"``.
How about we *just* have origin, with a separate "set_fileattr"
attribute to indicate "this is a discrete file, you should set
__file__"?
I like that. I'll see how it works. There doesn't seem to be any reason
why you would have two distinct strings for origin and filename. In fact,
that's kind of smelly.

However, I wonder if this is where a PathModuleSpec subclass would be
meaningful. Then no flag would be necessary.
Post by Nick Coghlan
Also, we should explicitly note that we'll still set __file__ for zip
imports, due to backwards compatibility concerns, even though it
doesn't correspond to a valid filesystem path.
Hmm. So deprecate the use of __file__ for anything but actual file names?
Interesting. I was planning on just leaving the current meaning of
"location relative to a path entry".
Post by Nick Coghlan
(Random thought: spec.origin + spec.cached + a cache directory setting
in zipimport would give a potentially clean way to do extension module
imports from zip archives)
That would be cool.
Post by Nick Coghlan
Post by Eric Snow
``path``
The list of path entries in which to search for submodules if this
module is a package. Otherwise it is ``None``.
Path entries don't have to correspond to filesystem locations - they
just have to make sense to at least one path hook
(e.g. a DB URI would be a valid path entry).
Right. I didn't mean to imply that they do.
Post by Nick Coghlan
Post by Eric Snow
.. XXX add a path-based subclass?
Nope :)
I keep vacillating on this.
Post by Nick Coghlan
Post by Eric Snow
ModuleSpec Methods
------------------
``from_loader(name, loader, *, is_package=None, origin=None,
filename=None,
Post by Eric Snow
cached=None, path=None)``
.. XXX use a different name?
I'd disallow customisation on this one - if people want to customise,
they should just query the PEP 302 APIs themselves and call the
ModuleSpec constructor directly. The use case for this one should be
to make it trivial to switch from "return loader" to "return
ModuleSpec.from_loader(loader)" in a find_module implementation.
What do you mean by disallow customization? Make it "private"?
`from_loader()` is intended for exactly the use that you described.
Post by Nick Coghlan
Post by Eric Snow
In contrast to ``ModuleSpec.__init__()``, which takes the arguments
as-is, ``from_loader()`` calculates missing values from the ones passed
in, as much as possible. This replaces the behavior that is currently
provided the several ``importlib.util`` functions as well as the
optional ``init_module_attrs()`` method of loaders. Just to be clear,
If not passed in, ``filename`` is to the result of calling the
loader's ``get_filename()``, if available. Otherwise it stays
unset (``None``).
If not passed in, ``path`` is set to an empty list if
``is_package`` is true. Then the directory from ``filename`` is
appended to it, if possible. If ``is_package`` is false, ``path``
stays unset.
If ``cached`` is not passed in and ``filename`` is passed in,
``cached`` is derived from it. For filenames with a source suffix,
it set to the result of calling
``importlib.util.cache_from_source()``. For bytecode suffixes (e.g.
``.pyc``), ``cached`` is set to the value of ``filename``. If
``filename`` is not passed in or ``cache_from_source()`` raises
``NotImplementedError``, ``cached`` stays unset.
If not passed in, ``origin`` is set to ``filename``. Thus if
``filename`` is unset, ``origin`` stays unset.
Hmm, is there a reason this can't be the default constructor
behaviour? What's the value of *not* having the sensible fallbacks,
given they can always be overridden by passing in explicit values when
you want something different?
I'll think about this. There was some value in it before, but with changes
to other signatures, `from_loader()` is much less useful as a separate
factory method.
Post by Nick Coghlan
A separate "from_module(m)" constructor would probably make sense, though.
I have this for internal use in the implementation, but did not expose it
since all modules should already have a spec.
Post by Nick Coghlan
``module_repr()``
Post by Eric Snow
Returns a repr string for the module if ``origin`` is set and
``filename`` is not set. The string refers to the value of ``origin``.
Otherwise ``module_repr()`` returns None. This indicates to the module
type's ``__repr__()`` that it should fall back to the default repr.
We could also have ``module_repr()`` produce the repr for the case where
``filename`` is set or where ``origin`` is not set, mirroring the repr
that the module type produces directly. However, the repr string is
derived from the import-related module attributes, which might be out of
sync with the spec.
.. XXX Is using the spec close enough? Probably not.
I think it makes sense to always return the expected repr based on the
spec attributes, but allow a custom origin to be passed in to handle
the case where the module __file__ attribute differs from
__spec__.origin (keeping in mind I think __spec__.filename should be
replaced with __spec__.set_fileattr)
That's the approach that I took at first, but the module that is passed in
is not guaranteed to be a spec. Furthermore, having the spec take
precedence over the module's attrs for the repr seems like too big a
backward-compatibility risk.
Post by Nick Coghlan
Post by Eric Snow
The implementation of the module type's ``__repr__()`` will change to
accommodate this PEP. However, the current functionality will remain to
handle the case where a module does not have a ``__spec__`` attribute.
Experience tells us that the import system should ensure the __spec__
attribute always exists (even if it has to be filled in from the
module attributes after calling load_module)
That's a good point. The only possible problem is for someone that creates
their own module object and expects repr to work the same as it does
currently.
Post by Nick Coghlan
``load(module=None, *, is_reload=False)``
Yep, definitely needs to be a separate method. "is_reload" would
almost always be set to a boolean, which means a separate API is
likely to be better.
Agreed.
Post by Nick Coghlan
However, I think the separate method should be "exec()" rather than
"reload()" and require that the module always be passed in.
I'll see how that looks. It seems like a better fit than just plain
`reload()`.

We could also expose a "create" method that just creates and returns
Post by Nick Coghlan
the new module object, and replace importlib.util.module_to_load with
a context manager that accepted the module as a parameter. Say
"add_to_sys", which fails if the module is already present in
sys.modules.
One of the points of ModuleSpec is to remove the need for
`module_to_load()`. I'm not convinced of the utility of a create method
like you've described other than possibly as something internal to
ModuleSpec.
Post by Nick Coghlan
m = self.create()
self.exec(m)
return sys.modules[self.name]
self.exec(sys.modules[self.name])
return sys.modules[self.name]
Post by Eric Snow
Subclassing
-----------
Subclasses of ModuleSpec are allowed, but should not be necessary.
Adding functionality to a custom finder or loader will likely be a
better fit and should be tried first. However, as long as a subclass
still fulfills the requirements of the import system, objects of that
type are completely fine as the return value of ``find_module()``.
We may need to do subclasses for the ABC registration backwards
compatibility hack.
I was thinking of registering ModuleSpec in the setter of a `loader
Post by Nick Coghlan
Post by Eric Snow
Module Objects
--------------
Module objects will now have a ``__spec__`` attribute to which the
module's spec will be bound. None of the other import-related module
attributes will be changed or deprecated, though some of them could be;
any such deprecation can wait until Python 4.
``ModuleSpec`` objects will not be kept in sync with the corresponding
module object's import-related attributes. Though they may differ, in
practice they will typically be the same.
Worth mentioning that __main__.__spec__.name will give the real name
of module's executed with -m here rather than delaying that until the
notes at the end.
Post by Eric Snow
Finders
-------
Finders will now return ModuleSpec objects when ``find_module()`` is
called rather than loaders. For backward compatility, ``Modulespec``
objects proxy the attributes of their ``loader`` attribute.
Adding another similar method to avoid backward-compatibility issues
is undersireable if avoidable. The import APIs have suffered enough,
especially considering ``PathEntryFinder.find_loader()`` was just
added in Python 3.3. The approach taken by this PEP should be
sufficient to address backward-compatibility issues for
``find_module()``.
The change to ``find_module()`` applies to both ``MetaPathFinder`` and
``PathEntryFinder``. ``PathEntryFinder.find_loader()`` will be
deprecated and, for backward compatibility, implicitly special-cased if
the method exists on a finder.
Actually, we don't currently have anything on ModuleSpec to indicate
"this is complete, stop scanning for more path fragments" or how we
will compose multiple module specs for the individual fragments into a
combined spec for the namespace package.
Post by Eric Snow
Finders are still responsible for creating the loader. That loader will
now be stored in the module spec returned by ``find_module()`` rather
than returned directly. As is currently the case without the PEP, if a
loader would be costly to create, that loader can be designed to defer
the cost until later.
Loaders
-------
Loaders will have a new method, ``exec_module(module)``. Its only job
is to "exec" the module and consequently populate the module's
namespace. It is not responsible for creating or preparing the module
object, nor for any cleanup afterward. It has no return value.
The ``load_module()`` of loaders will still work and be an active part
of the loader API. It is still useful for cases where the default
module creation/prepartion/cleanup is not appropriate for the loader.
For example, the C API for extension modules only supports the full
control of ``load_module()``. As such, ``ExtensionFileLoader`` will not
implement ``exec_module()``. In the future it may be appropriate to
produce a second C API that would support an ``exec_module()``
implementation for ``ExtensionFileLoader``. Such a change is outside
the scope of this PEP.
As above, I think it may worth tackling this. It shouldn't be *that*
hard given the higher level changes and will solve some hard problems
at the lower level.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130812/eee2a8d2/attachment-0001.html>
Eric Snow
2013-08-13 03:47:27 UTC
Permalink
Accidently sent. :P

Continuing...
Post by Eric Snow
Post by Eric Snow
Subclassing
-----------
Post by Eric Snow
Subclasses of ModuleSpec are allowed, but should not be necessary.
Adding functionality to a custom finder or loader will likely be a
better fit and should be tried first. However, as long as a subclass
still fulfills the requirements of the import system, objects of that
type are completely fine as the return value of ``find_module()``.
We may need to do subclasses for the ABC registration backwards
compatibility hack.
I was thinking of registering ModuleSpec in the setter of a `loader`
property (as long as the loader's class has a `register()` method
Post by Eric Snow
Post by Eric Snow
Module Objects
--------------
Module objects will now have a ``__spec__`` attribute to which the
module's spec will be bound. None of the other import-related module
attributes will be changed or deprecated, though some of them could be;
any such deprecation can wait until Python 4.
``ModuleSpec`` objects will not be kept in sync with the corresponding
module object's import-related attributes. Though they may differ, in
practice they will typically be the same.
Worth mentioning that __main__.__spec__.name will give the real name
of module's executed with -m here rather than delaying that until the
notes at the end.
Fair enough.
Post by Eric Snow
Post by Eric Snow
Finders
-------
Finders will now return ModuleSpec objects when ``find_module()`` is
called rather than loaders. For backward compatility, ``Modulespec``
objects proxy the attributes of their ``loader`` attribute.
Adding another similar method to avoid backward-compatibility issues
is undersireable if avoidable. The import APIs have suffered enough,
especially considering ``PathEntryFinder.find_loader()`` was just
added in Python 3.3. The approach taken by this PEP should be
sufficient to address backward-compatibility issues for
``find_module()``.
The change to ``find_module()`` applies to both ``MetaPathFinder`` and
``PathEntryFinder``. ``PathEntryFinder.find_loader()`` will be
deprecated and, for backward compatibility, implicitly special-cased if
the method exists on a finder.
Actually, we don't currently have anything on ModuleSpec to indicate
"this is complete, stop scanning for more path fragments" or how we
will compose multiple module specs for the individual fragments into a
combined spec for the namespace package.
I was planning on just using the loader's type. If it's NamespaceLoader
then path is where we'll get the fragments. I was going to say it's
working in my implementation, but namespace packages are actually the one
part that still have some failing tests. :P
Post by Eric Snow
Post by Eric Snow
Finders are still responsible for creating the loader. That loader will
now be stored in the module spec returned by ``find_module()`` rather
than returned directly. As is currently the case without the PEP, if a
loader would be costly to create, that loader can be designed to defer
the cost until later.
Loaders
-------
Loaders will have a new method, ``exec_module(module)``. Its only job
is to "exec" the module and consequently populate the module's
namespace. It is not responsible for creating or preparing the module
object, nor for any cleanup afterward. It has no return value.
The ``load_module()`` of loaders will still work and be an active part
of the loader API. It is still useful for cases where the default
module creation/prepartion/cleanup is not appropriate for the loader.
For example, the C API for extension modules only supports the full
control of ``load_module()``. As such, ``ExtensionFileLoader`` will not
implement ``exec_module()``. In the future it may be appropriate to
produce a second C API that would support an ``exec_module()``
implementation for ``ExtensionFileLoader``. Such a change is outside
the scope of this PEP.
As above, I think it may worth tackling this. It shouldn't be *that*
hard given the higher level changes and will solve some hard problems
at the lower level.
For me that seems like a separate proposal. Certainly it's related, but in
some ways it would feel tacked on. On top of that, I'd have to dive into
the extension module API much more than I have and I'd rather get
ModuleSpec and .ref file wrapped up sooner. At the same time, I haven't
really done much API design in C so that would be interesting. In the end,
I'd like to keep the extension module API additions out of this PEP.

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130812/933a76a1/attachment.html>
Nick Coghlan
2013-08-15 15:59:43 UTC
Permalink
Post by Eric Snow
Post by Nick Coghlan
I think this is solid enough to be worth adding to the PEPs repo now.
Sounds good.
Post by Nick Coghlan
Post by Eric Snow
Here's an updated version of the PEP for ModuleSpec which addresses the
feedback I've gotten. Thanks for the help. The big open question, to me,
is whether or not to have a separate reload() method. I'll be looking into
that when I get a chance. There's also the question of a path-based
subclass, but I'm currently not convinced it's worth it.
One piece of feedback from me (triggered by the C extension modules
discussion on python-dev): we should consider proposing a new "exec"
hook for C extension modules that could be defined instead of or in
addition to the existing PEP 3121 init hook.
Sounds good. I expect you mean as a separate proposal...
I actually meant in this proposal, but strictly speaking, I just need
the "create" part of the API at this level to tie into my ideas for C
extensions :)

Now that this has a PEP number I can reference, I'll try to get
something more fleshed out posted later this week.
Post by Eric Snow
Post by Nick Coghlan
Post by Eric Snow
ModuleSpec
----------
A new class which defines the import-related values to use when loading
the module. It closely corresponds to the import-related attributes of
module objects. ``ModuleSpec`` objects may also be used by finders and
loaders and other import-related APIs to hold extra import-related
state about the module. This greatly reduces the need to add any new
new import-related attributes to module objects, and loader ``__init__``
methods won't need to accommodate such per-module state.
To avoid conflicts as the spec attributes evolve in the future, would
it be worth having a "custom" field which is just an arbitrary object
reference used to pass info from the finder to the loader without
troubling the rest of the import system?
I see what you're saying, but am conflicted. For some reason providing a sub-namespace for that doesn't seem quite right. However, the alternative runs the risk of collisions later on. Maybe we could recommend the use of a preceding "_" for custom attributes? I'll see if I can come up with something.
It wouldn't be a custom namespace, just a single attribute to pass
data to the loader. It could be a dict, namespace, string, custom
object, anything. By default, it would be None.

For example, zipimporter could use it to pass the zip archive name to
the loader directly, rather than needing to derive it from origin or
create a custom loader for each find operation.
Post by Eric Snow
Post by Nick Coghlan
Post by Eric Snow
While ``package`` and ``is_package`` are read-only properties, the
remaining attributes can be replaced after the module spec is created
and after import is complete. This allows for unusual cases where
modifying the spec is the best option. However, typical use should not
involve changing the state of a module's spec.
I'm with Brett that "is_package" should go, to be replaced by
"spec.path is not None" wherever it matters. is_package() would then
fall through to the PEP 302 loader API via __getattr__.
I'm considering the recommendation, but I still feel like `is_package` as an attribute is worth having. I see module.__spec__ as useful to more than the import system and its hackers, and `is_package` as a value to the broader audience that may not have learned about what __path__ means. It's certainly not obvious that __path__ implies a package. Then again, a person would have to be looking at __spec__ to see `is_package`, so maybe it loses enough utility to be worth keeping.
I think we need to emphasise the fact that a package is just a module
with a search path attribute *more* rather than less. Don't try to
hide it, shout it from the rooftops :)

Say, something like "spec.submodule_search_path is not None" :)
Post by Eric Snow
Post by Nick Coghlan
How about we *just* have origin, with a separate "set_fileattr"
attribute to indicate "this is a discrete file, you should set
__file__"?
I like that. I'll see how it works. There doesn't seem to be any reason why you would have two distinct strings for origin and filename. In fact, that's kind of smelly.
However, I wonder if this is where a PathModuleSpec subclass would be meaningful. Then no flag would be necessary.
I realised we may not need a separate flag at all: how about we key
this off "hasattr(self.loader, 'get_data')"? And expose that as a
"is_location" read-only property? (I like PJE's suggestion of
"location" as a name for modules which may be used with a loader that
supports the get_data API)

(Tangent: at some point in the future, we could define an "open"
method on spec objects. This would do the path munging relative to
origin automatically, using the opener argument to the builtin open to
back it with BytesIO and the get_data API on the loader. If loaders
defined an "opener" method, then the spec could use that instead)
Post by Eric Snow
Post by Nick Coghlan
Post by Eric Snow
ModuleSpec Methods
------------------
``from_loader(name, loader, *, is_package=None, origin=None, filename=None,
cached=None, path=None)``
.. XXX use a different name?
I'd disallow customisation on this one - if people want to customise,
they should just query the PEP 302 APIs themselves and call the
ModuleSpec constructor directly. The use case for this one should be
to make it trivial to switch from "return loader" to "return
ModuleSpec.from_loader(loader)" in a find_module implementation.
What do you mean by disallow customization? Make it "private"? `from_loader()` is intended for exactly the use that you described.
The keyword arguments. If from_loader stays, it shouldn't allow you to
override the values derived from the loader - if you want to do that,
just read the values you want to keep from the loader and pass them in
explicitly.
Post by Eric Snow
Post by Nick Coghlan
A separate "from_module(m)" constructor would probably make sense, though.
I have this for internal use in the implementation, but did not expose it since all modules should already have a spec.
It's more for the benefit of adapting existing loaders - since they
already have the code to initialise the module, we should make it easy
for them to just initialise a throwaway module and convert it to a
spec object, rather than having to completely rewrite their
initialisation code to be spec based.
Post by Eric Snow
Post by Nick Coghlan
Post by Eric Snow
``module_repr()``
Returns a repr string for the module if ``origin`` is set and
``filename`` is not set. The string refers to the value of ``origin``.
Otherwise ``module_repr()`` returns None. This indicates to the module
type's ``__repr__()`` that it should fall back to the default repr.
We could also have ``module_repr()`` produce the repr for the case where
``filename`` is set or where ``origin`` is not set, mirroring the repr
that the module type produces directly. However, the repr string is
derived from the import-related module attributes, which might be out of
sync with the spec.
.. XXX Is using the spec close enough? Probably not.
I think it makes sense to always return the expected repr based on the
spec attributes, but allow a custom origin to be passed in to handle
the case where the module __file__ attribute differs from
__spec__.origin (keeping in mind I think __spec__.filename should be
replaced with __spec__.set_fileattr)
That's the approach that I took at first, but the module that is passed in is not guaranteed to be a spec. Furthermore, having the spec take precedence over the module's attrs for the repr seems like too big a backward-compatibility risk.
I don't understand your response. Simplifying the API a bit to allow a
module to be passed in directly, ModuleType.__repr__ would just call
it like this:

self.__spec__.module_repr(self)

All the logic would be in one place (ModuleSpec), but modules could
still override the original values with the actual settings in the
module namespace.
Post by Eric Snow
Post by Nick Coghlan
Post by Eric Snow
The implementation of the module type's ``__repr__()`` will change to
accommodate this PEP. However, the current functionality will remain to
handle the case where a module does not have a ``__spec__`` attribute.
Experience tells us that the import system should ensure the __spec__
attribute always exists (even if it has to be filled in from the
module attributes after calling load_module)
That's a good point. The only possible problem is for someone that creates their own module object and expects repr to work the same as it does currently.
Hmm, true - however, we can handle that by creating and throwing away
a dummy spec object rather than duplicating the logic.
Post by Eric Snow
Post by Nick Coghlan
We could also expose a "create" method that just creates and returns
the new module object, and replace importlib.util.module_to_load with
a context manager that accepted the module as a parameter. Say
"add_to_sys", which fails if the module is already present in
sys.modules.
One of the points of ModuleSpec is to remove the need for `module_to_load()`. I'm not convinced of the utility of a create method like you've described other than possibly as something internal to ModuleSpec.
Splitting create and exec should eventually let me delete a bunch of
code from runpy :)

Cheers,
Nick.
Nick Coghlan
2013-08-24 11:50:24 UTC
Permalink
Post by Nick Coghlan
It wouldn't be a custom namespace, just a single attribute to pass
data to the loader. It could be a dict, namespace, string, custom
object, anything. By default, it would be None.
For example, zipimporter could use it to pass the zip archive name to
the loader directly, rather than needing to derive it from origin or
create a custom loader for each find operation.
Having implemented the "exec" part of the C extension modifications
(see http://mail.python.org/pipermail/python-dev/2013-August/128101.html),
I'm more convinced than ever that ModuleSpec should have some kind of
a subnamespace for storing arbitrary loader specific details.
Providing such a storage location not only allows information to be
passed from the finder to the loader, but also from the create step to
the exec step in the loading process (the C extension loader would, of
necessity, find the execution entry point while determining how to
create the module. It makes sense to be able to store that somewhere
on the spec object, rather than having to go searching through the
exported symbols again in the execution step.

Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
Eric Snow
2013-08-28 08:53:19 UTC
Permalink
Post by Nick Coghlan
Post by Nick Coghlan
It wouldn't be a custom namespace, just a single attribute to pass
data to the loader. It could be a dict, namespace, string, custom
object, anything. By default, it would be None.
For example, zipimporter could use it to pass the zip archive name to
the loader directly, rather than needing to derive it from origin or
create a custom loader for each find operation.
Having implemented the "exec" part of the C extension modifications
(see http://mail.python.org/pipermail/python-dev/2013-August/128101.html),
I'm more convinced than ever that ModuleSpec should have some kind of
a subnamespace for storing arbitrary loader specific details.
Providing such a storage location not only allows information to be
passed from the finder to the loader, but also from the create step to
the exec step in the loading process (the C extension loader would, of
necessity, find the execution entry point while determining how to
create the module. It makes sense to be able to store that somewhere
on the spec object, rather than having to go searching through the
exported symbols again in the execution step.
Okay, I'm sold. For now I'm calling it "loading_info", but that name
sounds kind of lame.

FYI, I have an update of the PEP up. I've posted it to this list so it may
show up in a day or two. <wink>

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130828/180ef2a5/attachment.html>
Nick Coghlan
2013-08-28 09:26:43 UTC
Permalink
Post by Eric Snow
Post by Nick Coghlan
Post by Nick Coghlan
It wouldn't be a custom namespace, just a single attribute to pass
data to the loader. It could be a dict, namespace, string, custom
object, anything. By default, it would be None.
For example, zipimporter could use it to pass the zip archive name to
the loader directly, rather than needing to derive it from origin or
create a custom loader for each find operation.
Having implemented the "exec" part of the C extension modifications
(see http://mail.python.org/pipermail/python-dev/2013-August/128101.html
),
Post by Eric Snow
Post by Nick Coghlan
I'm more convinced than ever that ModuleSpec should have some kind of
a subnamespace for storing arbitrary loader specific details.
Providing such a storage location not only allows information to be
passed from the finder to the loader, but also from the create step to
the exec step in the loading process (the C extension loader would, of
necessity, find the execution entry point while determining how to
create the module. It makes sense to be able to store that somewhere
on the spec object, rather than having to go searching through the
exported symbols again in the execution step.
Okay, I'm sold. For now I'm calling it "loading_info", but that name
sounds kind of lame.

I realised that if we're going to allow mutating the spec in create, we're
going to have to promise not to reuse them across load calls. So loaders
can be shared, but specs can't.
Post by Eric Snow
FYI, I have an update of the PEP up. I've posted it to this list so it
may show up in a day or two. <wink>

Heh :)

Cheers,
Nick.
Post by Eric Snow
-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130828/b42fe247/attachment.html>
Eric V. Smith
2013-08-28 09:49:33 UTC
Permalink
On 28 Aug 2013 18:53, "Eric Snow" <ericsnowcurrently at gmail.com
Post by Eric Snow
FYI, I have an update of the PEP up. I've posted it to this list so
it may show up in a day or two. <wink>
Heh :)
No matter how big I make the message limit, the PEP seems to exceed it!
I'll release it shortly.
--
Eric.
Eric Snow
2013-08-28 16:06:04 UTC
Permalink
Post by Eric V. Smith
No matter how big I make the message limit, the PEP seems to exceed it!
I'll release it shortly.
Sorry for the trouble. I appreciate you running the list and dealing with
my verbose PEP. :)

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130828/84e9a33b/attachment.html>
Eric Snow
2013-08-28 14:43:16 UTC
Permalink
Post by Nick Coghlan
I realised that if we're going to allow mutating the spec in create,
we're going to have to promise not to reuse them across load calls. So
loaders can be shared, but specs can't.

The latest version of the PEP already specifies that each module will have
its own copy, even if the spec is otherwise the same. Perhaps it should
also make clear that loading_info should not be shared between specs. It
wouldn't hurt to also say something about allowing only one call to load()
or something along those lines.

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130828/c753fca3/attachment.html>
Eric Snow
2013-08-28 16:04:39 UTC
Permalink
Post by Eric Snow
Post by Nick Coghlan
I realised that if we're going to allow mutating the spec in create,
we're going to have to promise not to reuse them across load calls. So
loaders can be shared, but specs can't.
The latest version of the PEP already specifies that each module will have
its own copy, even if the spec is otherwise the same. Perhaps it should
also make clear that loading_info should not be shared between specs. It
wouldn't hurt to also say something about allowing only one call to load()
or something along those lines
I see three options:

1. We advise against calling Modulespec.create() and ModuleSpec.load() more
than once.
2. ModuleSpec's create() and load() programmatically disallow (or otherwise
handle) being called more than once.
3. Dictate that Loader.create_module() must handle the case where it is
called more than once. Fail? Return None? Return the same module as
before?

I'll advocate for 3 along with making sure ModuleSpec.create() correctly
handles the exceptional response of Loader.create_module(). However, the
PEP does not really specify what happens when create() and load() are
called multiple times. That needs to be added. I'm tempted to have load()
simply return whatever is in sys.modules and bypass loading if the module
is already loaded. And create() would simply return a new, prepared
module, with special handling for the Loader.create_module() exceptional
case.

Really, the sticky part is the (potential) call to Loader.create_module()
in ModuleSpec.create(). Otherwise it should not matter. ModuleSpec.exec()
should be able to be called as many times as desired, just like
Loader.load_module() (and Loader.exec_module()).

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130828/f1fc3501/attachment.html>
Brett Cannon
2013-08-28 17:27:12 UTC
Permalink
Post by Eric Snow
Post by Eric Snow
Post by Nick Coghlan
I realised that if we're going to allow mutating the spec in create,
we're going to have to promise not to reuse them across load calls. So
loaders can be shared, but specs can't.
The latest version of the PEP already specifies that each module will
have its own copy, even if the spec is otherwise the same. Perhaps it
should also make clear that loading_info should not be shared between
specs. It wouldn't hurt to also say something about allowing only one call
to load() or something along those lines
1. We advise against calling Modulespec.create() and ModuleSpec.load()
more than once.
2. ModuleSpec's create() and load() programmatically disallow (or
otherwise handle) being called more than once.
3. Dictate that Loader.create_module() must handle the case where it is
called more than once. Fail? Return None? Return the same module as
before?
I'll advocate for 3 along with making sure ModuleSpec.create() correctly
handles the exceptional response of Loader.create_module(). However, the
PEP does not really specify what happens when create() and load() are
called multiple times. That needs to be added. I'm tempted to have load()
simply return whatever is in sys.modules and bypass loading if the module
is already loaded.
Isn't that the point of reload() sans the blind return? This is heading
down the road of trying to worry about stuff that will likely never happen
except by people trying to bypass the import system and thus are just
asking to get screwed up. We shouldn't bend over to block or (or support
it).

-Brett
Post by Eric Snow
And create() would simply return a new, prepared module, with special
handling for the Loader.create_module() exceptional case.
Really, the sticky part is the (potential) call to Loader.create_module()
in ModuleSpec.create(). Otherwise it should not matter. ModuleSpec.exec()
should be able to be called as many times as desired, just like
Loader.load_module() (and Loader.exec_module()).
-eric
_______________________________________________
Import-SIG mailing list
Import-SIG at python.org
http://mail.python.org/mailman/listinfo/import-sig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130828/e0c7d285/attachment-0001.html>
Eric Snow
2013-08-28 19:40:57 UTC
Permalink
On Wed, Aug 28, 2013 at 12:04 PM, Eric Snow <ericsnowcurrently at gmail.com>
On Wed, Aug 28, 2013 at 8:43 AM, Eric Snow <ericsnowcurrently at gmail.com>
Post by Eric Snow
Post by Nick Coghlan
I realised that if we're going to allow mutating the spec in create,
we're going to have to promise not to reuse them across load calls. So
loaders can be shared, but specs can't.
Post by Eric Snow
The latest version of the PEP already specifies that each module will
have its own copy, even if the spec is otherwise the same. Perhaps it
should also make clear that loading_info should not be shared between
specs. It wouldn't hurt to also say something about allowing only one call
to load() or something along those lines
1. We advise against calling Modulespec.create() and ModuleSpec.load()
more than once.
2. ModuleSpec's create() and load() programmatically disallow (or
otherwise handle) being called more than once.
3. Dictate that Loader.create_module() must handle the case where it is
called more than once. Fail? Return None? Return the same module as
before?
I'll advocate for 3 along with making sure ModuleSpec.create() correctly
handles the exceptional response of Loader.create_module(). However, the
PEP does not really specify what happens when create() and load() are
called multiple times. That needs to be added. I'm tempted to have load()
simply return whatever is in sys.modules and bypass loading if the module
is already loaded.
Isn't that the point of reload() sans the blind return? This is heading
down the road of trying to worry about stuff that will likely never happen
except by people trying to bypass the import system and thus are just
asking to get screwed up. We shouldn't bend over to block or (or support
it).

I'm fine with that. My only concern is the case where people take
advantage of the spec methods to directly load/reload/etc. and it behaves
in an unexpected way.

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130828/2dd6e744/attachment.html>
Brett Cannon
2013-08-28 20:20:49 UTC
Permalink
Post by Eric Snow
On Wed, Aug 28, 2013 at 12:04 PM, Eric Snow <ericsnowcurrently at gmail.com>
On Wed, Aug 28, 2013 at 8:43 AM, Eric Snow <ericsnowcurrently at gmail.com>
Post by Eric Snow
Post by Nick Coghlan
I realised that if we're going to allow mutating the spec in create,
we're going to have to promise not to reuse them across load calls. So
loaders can be shared, but specs can't.
Post by Eric Snow
The latest version of the PEP already specifies that each module will
have its own copy, even if the spec is otherwise the same. Perhaps it
should also make clear that loading_info should not be shared between
specs. It wouldn't hurt to also say something about allowing only one call
to load() or something along those lines
1. We advise against calling Modulespec.create() and ModuleSpec.load()
more than once.
2. ModuleSpec's create() and load() programmatically disallow (or
otherwise handle) being called more than once.
3. Dictate that Loader.create_module() must handle the case where it is
called more than once. Fail? Return None? Return the same module as
before?
I'll advocate for 3 along with making sure ModuleSpec.create()
correctly handles the exceptional response of Loader.create_module().
However, the PEP does not really specify what happens when create() and
load() are called multiple times. That needs to be added. I'm tempted to
have load() simply return whatever is in sys.modules and bypass loading if
the module is already loaded.
Isn't that the point of reload() sans the blind return? This is heading
down the road of trying to worry about stuff that will likely never happen
except by people trying to bypass the import system and thus are just
asking to get screwed up. We shouldn't bend over to block or (or support
it).
I'm fine with that. My only concern is the case where people take
advantage of the spec methods to directly load/reload/etc. and it behaves
in an unexpected way.
They shouldn't do that. =) If it is that big of a worry then the methods
could shift to importlib.abc.Loader and be completely removed from
ModuleSpec to make it very obvious they should not be trifled with unless
you know what you are doing.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130828/2a7ca946/attachment-0001.html>
Brett Cannon
2013-08-28 17:25:15 UTC
Permalink
Post by Eric Snow
Post by Nick Coghlan
I realised that if we're going to allow mutating the spec in create,
we're going to have to promise not to reuse them across load calls. So
loaders can be shared, but specs can't.
The latest version of the PEP already specifies that each module will have
its own copy, even if the spec is otherwise the same. Perhaps it should
also make clear that loading_info should not be shared between specs.
That's really none of our business. If loading_info is going to be up to
the finder to populate and the loader to consume as an opaque thing then we
should not dictate its usage, just say that only the corresponding loader
for the finder should use that object and that people should not expect its
interface to be stable.
Post by Eric Snow
It wouldn't hurt to also say something about allowing only one call to
load() or something along those lines.
Why? You can create objects constantly. You should say you expect people to
use reload() to reload things, but otherwise what if I truly want to reset
the module and start from scratch with a second call to load()?

-Brett
Post by Eric Snow
-eric
_______________________________________________
Import-SIG mailing list
Import-SIG at python.org
http://mail.python.org/mailman/listinfo/import-sig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130828/24228b05/attachment.html>
Eric Snow
2013-08-28 19:34:43 UTC
Permalink
On Wed, Aug 28, 2013 at 10:43 AM, Eric Snow <ericsnowcurrently at gmail.com>
Post by Eric Snow
Post by Nick Coghlan
I realised that if we're going to allow mutating the spec in create,
we're going to have to promise not to reuse them across load calls. So
loaders can be shared, but specs can't.
Post by Eric Snow
The latest version of the PEP already specifies that each module will
have its own copy, even if the spec is otherwise the same. Perhaps it
should also make clear that loading_info should not be shared between specs.
That's really none of our business. If loading_info is going to be up to
the finder to populate and the loader to consume as an opaque thing then we
should not dictate its usage, just say that only the corresponding loader
for the finder should use that object and that people should not expect its
interface to be stable.

Fair enough.
Post by Eric Snow
It wouldn't hurt to also say something about allowing only one call to
load() or something along those lines.
Why? You can create objects constantly. You should say you expect people
to use reload() to reload things, but otherwise what if I truly want to
reset the module and start from scratch with a second call to load()?

That's fine. I'll just make sure to note what happens when the different
spec methods are called more than once. If a loader can't handle multiple
create_module() calls, I'd expect an ImportError.

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130828/97a3c7d5/attachment.html>
Brett Cannon
2013-08-11 20:08:26 UTC
Permalink
Post by Eric Snow
Here's an updated version of the PEP for ModuleSpec which addresses the
feedback I've gotten. Thanks for the help. The big open question, to me,
is whether or not to have a separate reload() method. I'll be looking into
that when I get a chance. There's also the question of a path-based
subclass, but I'm currently not convinced it's worth it.
-eric
-----------------------------------
PEP: 4XX
Title: A ModuleSpec Type for the Import System
Version: $Revision$
Last-Modified: $Date$
Author: Eric Snow <ericsnowcurrently at gmail.com>
BDFL-Delegate: ???
Discussions-To: import-sig at python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 8-Aug-2013
Python-Version: 3.4
Post-History: 8-Aug-2013
Abstract
========
This PEP proposes to add a new class to ``importlib.machinery`` called
``ModuleSpec``. It will contain all the import-related information
about a module without needing to load the module first. Finders will
now return a module's spec rather than a loader. The import system will
use the spec to load the module.
Motivation
==========
The import system has evolved over the lifetime of Python. In late 2002
PEP 302 introduced standardized import hooks via ``finders`` and
``loaders`` and ``sys.meta_path``. The ``importlib`` module, introduced
with Python 3.1, now exposes a pure Python implementation of the APIs
described by PEP 302, as well as of the full import system. It is now
much easier to understand and extend the import system. While a benefit
to the Python community, this greater accessibilty also presents a
challenge.
As more developers come to understand and customize the import system,
any weaknesses in the finder and loader APIs will be more impactful. So
the sooner we can address any such weaknesses the import system, the
better...and there are a couple we can take care of with this proposal.
Firstly, any time the import system needs to save information about a
module we end up with more attributes on module objects that are
generally only meaningful to the import system and occoasionally to some
people. It would be nice to have a per-module namespace to put future
import-related information. Secondly, there's an API void between
finders and loaders that causes undue complexity when encountered.
Finders are strictly responsible for providing the loader which the
import system will use to load the module. The loader is then
responsible for doing some checks, creating the module object, setting
import-related attributes, "installing" the module to ``sys.modules``,
and loading the module, along with some cleanup. This all takes place
during the import system's call to ``Loader.load_module()``. Loaders
also provide some APIs for accessing data associated with a module.
Loaders are not required to provide any of the functionality of
``load_module()`` through other methods. Thus, though the import-
related information about a module is likely available without loading
the module, it is not otherwise exposed.
Furthermore, the requirements assocated with ``load_module()`` are
common to all loaders and mostly are implemented in exactly the same
way. This means every loader has to duplicate the same boilerplate
code. ``importlib.util`` provides some tools that help with this, but
it would be more helpful if the import system simply took charge of
these responsibilities. The trouble is that this would limit the degree
of customization that ``load_module()`` facilitates. This is a gap
between finders and loaders which this proposal aims to fill.
Finally, when the import system calls a finder's ``find_module()``, the
finder makes use of a variety of information about the module that is
useful outside the context of the method. Currently the options are
limited for persisting that per-module information past the method call,
since it only returns the loader. Popular options for this limitation
are to store the information in a module-to-info mapping somewhere on
the finder itself, or store it on the loader.
Unfortunately, loaders are not required to be module-specific. On top
of that, some of the useful information finders could provide is
common to all finders, so ideally the import system could take care of
that. This is the same gap as before between finders and loaders.
As an example of complexity attributable to this flaw, the
implementation of namespace packages in Python 3.3 (see PEP 420) added
``FileFinder.find_loader()`` because there was no good way for
``find_module()`` to provide the namespace path.
The answer to this gap is a ``ModuleSpec`` object that contains the
per-module information and takes care of the boilerplate functionality
of loading the module.
(The idea gained momentum during discussions related to another PEP.[1])
Specification
=============
The goal is to address the gap between finders and loaders while
changing as little of their semantics as possible. Though some
functionality and information is moved the new ``ModuleSpec`` type,
their semantics should remain the same. However, for the sake of
clarity, those semantics will be explicitly identified.
A High-Level View
-----------------
...
ModuleSpec
----------
A new class which defines the import-related values to use when loading
the module. It closely corresponds to the import-related attributes of
module objects. ``ModuleSpec`` objects may also be used by finders and
loaders and other import-related APIs to hold extra import-related
state about the module. This greatly reduces the need to add any new
new import-related attributes to module objects, and loader ``__init__``
methods won't need to accommodate such per-module state.
``ModuleSpec(name, loader, *, origin=None, filename=None, cached=None,
path=None)``
The parameters have the same meaning as the attributes described below.
However, not all ``ModuleSpec`` attributes are also parameters. The
passed values are set as-is. For calculated values use the
``from_loader()`` method.
ModuleSpec Attributes
---------------------
Each of the following names is an attribute on ``ModuleSpec`` objects.
A value of ``None`` indicates "not set". This contrasts with module
objects where the attribute simply doesn't exist.
While ``package`` and ``is_package`` are read-only properties, the
remaining attributes can be replaced after the module spec is created
and after import is complete. This allows for unusual cases where
modifying the spec is the best option. However, typical use should not
involve changing the state of a module's spec.
Most of the attributes correspond to the import-related attributes of
modules. Here is the mapping, followed by a description of the
attributes. The reverse of this mapping is used by
``init_module_attrs()``.
============= ===========
On ModuleSpec On Modules
============= ===========
name __name__
loader __loader__
package __package__
is_package -
origin -
filename __file__
cached __cached__
path __path__
============= ===========
``name``
The module's fully resolved and absolute name. It must be set.
``loader``
The loader to use during loading and for module data. These specific
functionalities do not change for loaders. Finders are still
responsible for creating the loader and this attribute is where it is
stored. The loader must be set.
``package``
The name of the module's parent. This is a dynamic attribute with a
value derived from ``name`` and ``is_package``. For packages it is the
value of ``name``. Otherwise it is equivalent to
``name.rpartition('.')[0]``. Consequently, a top-level module will have
give the empty string for ``package``.
``is_package``
Whether or not the module is a package. This dynamic attribute is True
if ``path`` is set (even if empty), else it is false.
"is True if ``path`` is not None (e.g. the empty list is a "true" value),
else it is False".
Post by Eric Snow
``origin``
A string for the location from which the module originates. If
``filename`` is set, ``origin`` should be set to the same value unless
some other value is more appropriate. ``origin`` is used in
``module_repr()`` if it does not match the value of ``filename``.
Using ``filename`` for this meaning would be inaccurate, since not all
modules have path-based locations. For instance, built-in modules do
not have ``__file__`` set. Yet it is useful to have a descriptive
string indicating that it originated from the interpreter as a built-in
module. So built-in modules will have ``origin`` set to ``"built-in"``.
I still don't know what you would put there for a zipfile-based loader.
Would you still put __file__ or would you put the zipfile? I ask because I
would want a way to pass along in a zipfile finder to the loader where the
zipfile is located and then the internal location of the file. Otherwise
you need to pass in the zip path separately from the internal path to the
loader constructor instead of simply passing in a ModuleSpec (e.g. see
_split_path in http://bugs.python.org/file30660/zip_importlib.diff).
Post by Eric Snow
If any of these is set, it indicates that the module is path-based. For
reference, a path entry is a string for a location where the import
system will look for modules, e.g. the path entries in ``sys.path`` or a
package's ``__path__``).
``filename``
Like ``origin``, but limited to a path-based location. If ``filename``
is set, ``origin`` should be set to the same string, unless origin is
explicitly set to something else. ``filename`` is not necessarily an
actual file name, but could be any location string based on a path
entry. Regarding the attribute name, while it is potentially
inaccurate, it is both consistent with the equivalent module attribute
and generally accurate.
.. XXX Would a different name be better? ``path_location``?
``cached``
The path-based location where the compiled code for a module should be
stored. If ``filename`` is set to a source file, this should be set to
corresponding path that PEP 3147 specifies. The
``importlib.util.source_to_cache()`` function facilitates getting the
correct value.
``path``
The list of path entries in which to search for submodules if this
module is a package. Otherwise it is ``None``.
.. XXX add a path-based subclass?
You mean like namespace package's __path__ object? Or are you saying you
want ModuleSpec vs. PackageSpec?
Post by Eric Snow
ModuleSpec Methods
------------------
``from_loader(name, loader, *, is_package=None, origin=None,
filename=None, cached=None, path=None)``
.. XXX use a different name?
A factory classmethod that returns a new ``ModuleSpec`` derived from the
arguments. ``is_package`` is used inside the method to indicate that
the module is a package.
Why is this parameter instead of the other than inferring from 'path' or
loader.is_package() as you fall back on? What's the motivation?
Post by Eric Snow
If not explicitly passed in, it is set to
``True`` if ``path`` is passed in. It falls back to using the result of
the loader's ``is_package()``, if available. Finally it defaults to
False. The remaining parameters have the same meaning as the
corresponding ``ModuleSpec`` attributes.
In contrast to ``ModuleSpec.__init__()``, which takes the arguments
as-is, ``from_loader()`` calculates missing values from the ones passed
in, as much as possible. This replaces the behavior that is currently
provided the several ``importlib.util`` functions as well as the
"provided by several"
Post by Eric Snow
optional ``init_module_attrs()`` method of loaders. Just to be clear,
If not passed in, ``filename`` is to the result of calling the
loader's ``get_filename()``, if available. Otherwise it stays
unset (``None``).
If not passed in, ``path`` is set to an empty list if
``is_package`` is true. Then the directory from ``filename`` is
appended to it, if possible. If ``is_package`` is false, ``path``
stays unset.
If ``cached`` is not passed in and ``filename`` is passed in,
``cached`` is derived from it. For filenames with a source suffix,
it set to the result of calling
``importlib.util.cache_from_source()``. For bytecode suffixes (e.g.
``.pyc``), ``cached`` is set to the value of ``filename``. If
``filename`` is not passed in or ``cache_from_source()`` raises
``NotImplementedError``, ``cached`` stays unset.
If not passed in, ``origin`` is set to ``filename``. Thus if
``filename`` is unset, ``origin`` stays unset.
Why is this a static constructor instead of a method like infer_values() or
an infer_values keyword-only argument to the constructor to do this if
requested?
Post by Eric Snow
``module_repr()``
Returns a repr string for the module if ``origin`` is set and
``filename`` is not set. The string refers to the value of ``origin``.
Otherwise ``module_repr()`` returns None. This indicates to the module
type's ``__repr__()`` that it should fall back to the default repr.
This makes me think that origin is an odd name if all it affects is
module_repr().
Post by Eric Snow
We could also have ``module_repr()`` produce the repr for the case where
``filename`` is set or where ``origin`` is not set, mirroring the repr
that the module type produces directly. However, the repr string is
derived from the import-related module attributes, which might be out of
sync with the spec.
[SNIP]
Post by Eric Snow
.. XXX add reload(module=None) and drop load()'s parameters entirely?
If you are going to make these semantics of making the module argument only
good for reloading then I say yes, make it a separate method.
Post by Eric Snow
.. XXX add more of importlib.reload()'s boilerplate to load()/reload()?
Backward Compatibility
----------------------
Since ``Finder.find_module()`` methods would now return a module spec
instead of loader, specs must act like the loader that would have been
returned instead. This is relatively simple to solve since the loader
is available as an attribute of the spec. We will use ``__getattr__()``
to do it.
However, ``ModuleSpec.is_package`` (an attribute) conflicts with
``InspectLoader.is_package()`` (a method). Working around this requires
a more complicated solution but is not a large obstacle. Simply making
``ModuleSpec.is_package`` a method does not reflect that is a relatively
static piece of data.
Maybe, but depending on what your "more complicated solution" it it might
be best to just give up the purity and go with the practicality.
Post by Eric Snow
``module_repr()`` also conflicts with the same
method on loaders, but that workaround is not complicated since both are
methods.
Unfortunately, the ability to proxy does not extend to ``id()``
comparisons and ``isinstance()`` tests. In the case of the return value
of ``find_module()``, we accept that break in backward compatibility.
However, we will mitigate the problem with ``isinstance()`` somewhat by
registering ``ModuleSpec`` on the loaders in ``importlib.abc``.
Actually, ModuleSpec doesn't even need to register; __instancecheck__ and
__subclasscheck__ can just be defined and delegate by calling
issubclass/isinstance on the loader as appropriate.
Post by Eric Snow
[SNIP]
Loaders
-------
Loaders will have a new method, ``exec_module(module)``. Its only job
is to "exec" the module and consequently populate the module's
namespace. It is not responsible for creating or preparing the module
object, nor for any cleanup afterward. It has no return value.
The ``load_module()`` of loaders will still work and be an active part
of the loader API. It is still useful for cases where the default
module creation/prepartion/cleanup is not appropriate for the loader.
For example, the C API for extension modules only supports the full
control of ``load_module()``. As such, ``ExtensionFileLoader`` will not
implement ``exec_module()``. In the future it may be appropriate to
produce a second C API that would support an ``exec_module()``
implementation for ``ExtensionFileLoader``. Such a change is outside
the scope of this PEP.
A loader must have at least one of ``exec_module()`` and
``load_module()`` defined.
"A load must define either ``exec_module()`` or ``load_module()``."

-Brett

[SNIP]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130811/c5c78c20/attachment-0001.html>
Eric Snow
2013-08-13 04:17:58 UTC
Permalink
Post by Brett Cannon
Post by Eric Snow
``is_package``
Whether or not the module is a package. This dynamic attribute is True
if ``path`` is set (even if empty), else it is false.
"is True if ``path`` is not None (e.g. the empty list is a "true" value),
else it is False".
Thanks. That is clearer.
Post by Brett Cannon
Post by Eric Snow
``origin``
A string for the location from which the module originates. If
``filename`` is set, ``origin`` should be set to the same value unless
some other value is more appropriate. ``origin`` is used in
``module_repr()`` if it does not match the value of ``filename``.
Using ``filename`` for this meaning would be inaccurate, since not all
modules have path-based locations. For instance, built-in modules do
not have ``__file__`` set. Yet it is useful to have a descriptive
string indicating that it originated from the interpreter as a built-in
module. So built-in modules will have ``origin`` set to ``"built-in"``.
I still don't know what you would put there for a zipfile-based loader.
Would you still put __file__ or would you put the zipfile? I ask because I
would want a way to pass along in a zipfile finder to the loader where the
zipfile is located and then the internal location of the file. Otherwise
you need to pass in the zip path separately from the internal path to the
loader constructor instead of simply passing in a ModuleSpec (e.g. see
_split_path in http://bugs.python.org/file30660/zip_importlib.diff).
For me origin makes the most sense as the "string for the location from
which the module originates". I'd think it would be the same as gets put
into __file__ right now. However, you're right that there's more useful
info that could be stored on the spec. In this case I'd expect it to be
added as an extra attribute on the spec rather than as part of the normal
ModuleSpec attributes. However, as Nick pointed out, custom attributes
currently don't have a good strategy for avoiding collisions with future
normal ModuleSpec attributes.
Post by Brett Cannon
``path``
Post by Eric Snow
The list of path entries in which to search for submodules if this
module is a package. Otherwise it is ``None``.
.. XXX add a path-based subclass?
You mean like namespace package's __path__ object? Or are you saying you
want ModuleSpec vs. PackageSpec?
More like ModuleSpec and PathModuleSpec. PathModuleSpec would have
filename, cached, and path (an associated handling), while ModuleSpec would
not. At the same time I like having a one-size-fits-all ModuleSpec if
possible, since it should probably pretty closely follow the
one-size-fits-all module type.
Post by Brett Cannon
Post by Eric Snow
ModuleSpec Methods
------------------
``from_loader(name, loader, *, is_package=None, origin=None,
filename=None, cached=None, path=None)``
.. XXX use a different name?
A factory classmethod that returns a new ``ModuleSpec`` derived from the
arguments. ``is_package`` is used inside the method to indicate that
the module is a package.
Why is this parameter instead of the other than inferring from 'path' or
loader.is_package() as you fall back on? What's the motivation?
In part it's intended to lower the barrier to entry for people learning
about the import system and getting their hands dirty. It's just more
obvious as an explicit parameter. Of course, it means there are two
parameters that basically accomplish the same thing, so perhaps it's not
worth it. Furthermore, `from_loader()` may go the way of the dodo since
the motivation for it has mostly gone away with other API changes.
Post by Brett Cannon
Just to be clear,
Post by Eric Snow
If not passed in, ``filename`` is to the result of calling the
loader's ``get_filename()``, if available. Otherwise it stays
unset (``None``).
If not passed in, ``path`` is set to an empty list if
``is_package`` is true. Then the directory from ``filename`` is
appended to it, if possible. If ``is_package`` is false, ``path``
stays unset.
If ``cached`` is not passed in and ``filename`` is passed in,
``cached`` is derived from it. For filenames with a source suffix,
it set to the result of calling
``importlib.util.cache_from_source()``. For bytecode suffixes (e.g.
``.pyc``), ``cached`` is set to the value of ``filename``. If
``filename`` is not passed in or ``cache_from_source()`` raises
``NotImplementedError``, ``cached`` stays unset.
If not passed in, ``origin`` is set to ``filename``. Thus if
``filename`` is unset, ``origin`` stays unset.
Why is this a static constructor instead of a method like infer_values()
or an infer_values keyword-only argument to the constructor to do this if
requested?
Good point. I was already planning on yanking `from_loader()`. That
kw-only argument would probably be a good fit. I'll try it out.
Post by Brett Cannon
Post by Eric Snow
``module_repr()``
Returns a repr string for the module if ``origin`` is set and
``filename`` is not set. The string refers to the value of ``origin``.
Otherwise ``module_repr()`` returns None. This indicates to the module
type's ``__repr__()`` that it should fall back to the default repr.
This makes me think that origin is an odd name if all it affects is
module_repr().
It's also informational, of course.
Post by Brett Cannon
Post by Eric Snow
We could also have ``module_repr()`` produce the repr for the case where
``filename`` is set or where ``origin`` is not set, mirroring the repr
that the module type produces directly. However, the repr string is
derived from the import-related module attributes, which might be out of
sync with the spec.
[SNIP]
Post by Eric Snow
.. XXX add reload(module=None) and drop load()'s parameters entirely?
If you are going to make these semantics of making the module argument
only good for reloading then I say yes, make it a separate method.
Yeah, I think it's settled. I like Nick's suggestion of calling it
`exec()`.
Post by Brett Cannon
Post by Eric Snow
.. XXX add more of importlib.reload()'s boilerplate to load()/reload()?
Backward Compatibility
----------------------
Since ``Finder.find_module()`` methods would now return a module spec
instead of loader, specs must act like the loader that would have been
returned instead. This is relatively simple to solve since the loader
is available as an attribute of the spec. We will use ``__getattr__()``
to do it.
However, ``ModuleSpec.is_package`` (an attribute) conflicts with
``InspectLoader.is_package()`` (a method). Working around this requires
a more complicated solution but is not a large obstacle. Simply making
``ModuleSpec.is_package`` a method does not reflect that is a relatively
static piece of data.
Maybe, but depending on what your "more complicated solution" it it might
be best to just give up the purity and go with the practicality.
It's not that complicated, but not exactly pretty:

class _TruthyFunction:
def __init__(self, func, is_true):
self.func = func
self._is_true = bool(is_true)
def __repr__(self):
return repr(self._is_true)
def __bool__(self):
return self._is_true
def __call__(self, *args, **kwargs):
return self.func(*args, **kwargs)

class ModuleSpec:
...
@property
def is_package(self):
loader = self.loader
is_package = False
if self.path is not None:
is_package = True
elif hasattr(self.loader, 'is_package'):
try:
is_package = loader.is_package(self.name)
except ImportError:
pass
# Since InspectLoader also has is_package(), we have to
# accommodate the use of the return value as a function.
def func(*args, **kwargs):
# XXX Throw a DeprecationWarning here?
return self.loader.is_package(*args, **kwargs)
return _TruthyFunction(func, is_package)
Post by Brett Cannon
Post by Eric Snow
``module_repr()`` also conflicts with the same
method on loaders, but that workaround is not complicated since both are
methods.
Unfortunately, the ability to proxy does not extend to ``id()``
comparisons and ``isinstance()`` tests. In the case of the return value
of ``find_module()``, we accept that break in backward compatibility.
However, we will mitigate the problem with ``isinstance()`` somewhat by
registering ``ModuleSpec`` on the loaders in ``importlib.abc``.
Actually, ModuleSpec doesn't even need to register; __instancecheck__ and
__subclasscheck__ can just be defined and delegate by calling
issubclass/isinstance on the loader as appropriate.
Do you mean add custom versions of those methods to importlib.abc.Loader?
That should work as well as the register approach. It won't work for all
loaders but should be good enough. I was just planning on registering
ModuleSpec on the loader in the setter for a `loader` property on
ModuleSpec.

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130812/5c5df9ef/attachment-0001.html>
Brett Cannon
2013-08-13 13:21:42 UTC
Permalink
[SNIP]
Post by Eric Snow
Post by Brett Cannon
Post by Eric Snow
``module_repr()`` also conflicts with the same
method on loaders, but that workaround is not complicated since both are
methods.
Unfortunately, the ability to proxy does not extend to ``id()``
comparisons and ``isinstance()`` tests. In the case of the return value
of ``find_module()``, we accept that break in backward compatibility.
However, we will mitigate the problem with ``isinstance()`` somewhat by
registering ``ModuleSpec`` on the loaders in ``importlib.abc``.
Actually, ModuleSpec doesn't even need to register; __instancecheck__ and
__subclasscheck__ can just be defined and delegate by calling
issubclass/isinstance on the loader as appropriate.
Do you mean add custom versions of those methods to importlib.abc.Loader?
Nope, I meant ModuleSpec because every time I have a reason to override
something it's on the object and not the class and so I forget the support
is the other way around. Argh.
Post by Eric Snow
That should work as well as the register approach. It won't work for all
loaders but should be good enough. I was just planning on registering
ModuleSpec on the loader in the setter for a `loader` property on
ModuleSpec.
But the registration is at the class level so how would that work?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130813/228a9772/attachment.html>
Eric Snow
2013-08-13 23:47:53 UTC
Permalink
Post by Brett Cannon
[SNIP]
Post by Eric Snow
Post by Brett Cannon
Post by Eric Snow
``module_repr()`` also conflicts with the same
method on loaders, but that workaround is not complicated since both are
methods.
Unfortunately, the ability to proxy does not extend to ``id()``
comparisons and ``isinstance()`` tests. In the case of the return value
of ``find_module()``, we accept that break in backward compatibility.
However, we will mitigate the problem with ``isinstance()`` somewhat by
registering ``ModuleSpec`` on the loaders in ``importlib.abc``.
Actually, ModuleSpec doesn't even need to register; __instancecheck__
and __subclasscheck__ can just be defined and delegate by calling
issubclass/isinstance on the loader as appropriate.
Do you mean add custom versions of those methods to importlib.abc.Loader?
Nope, I meant ModuleSpec because every time I have a reason to override
something it's on the object and not the class and so I forget the support
is the other way around. Argh.
Yeah, that would make things a lot easier.
Post by Brett Cannon
That should work as well as the register approach. It won't work for all
Post by Eric Snow
loaders but should be good enough. I was just planning on registering
ModuleSpec on the loader in the setter for a `loader` property on
ModuleSpec.
But the registration is at the class level so how would that work?
@property
def loader(self):
return self._loader

@loader.setter
def loader(self, loader):
try:
register = loader.__class__.register
except AttributeError:
pass
else:
register(self.__class__)
self._loader = loader

It's not pretty and it won't work on non-ABCs, but it's better than
nothing. The likelihood of someone doing an isinstance check on a loader
seems pretty low though. Of course, I'm planning on doing just that for
handling of namespace packages, but that's a little different.

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130813/00ae2e2c/attachment.html>
Nick Coghlan
2013-08-14 01:16:07 UTC
Permalink
Post by Eric Snow
On Tue, Aug 13, 2013 at 12:17 AM, Eric Snow <ericsnowcurrently at gmail.com>
[SNIP]
Post by Eric Snow
Post by Brett Cannon
Post by Eric Snow
``module_repr()`` also conflicts with the same
method on loaders, but that workaround is not complicated since both are
methods.
Unfortunately, the ability to proxy does not extend to ``id()``
comparisons and ``isinstance()`` tests. In the case of the return value
of ``find_module()``, we accept that break in backward compatibility.
However, we will mitigate the problem with ``isinstance()`` somewhat by
registering ``ModuleSpec`` on the loaders in ``importlib.abc``.
Actually, ModuleSpec doesn't even need to register; __instancecheck__
and __subclasscheck__ can just be defined and delegate by calling
issubclass/isinstance on the loader as appropriate.
Post by Eric Snow
Post by Eric Snow
Do you mean add custom versions of those methods to
importlib.abc.Loader?
Post by Eric Snow
Nope, I meant ModuleSpec because every time I have a reason to override
something it's on the object and not the class and so I forget the support
is the other way around. Argh.
Post by Eric Snow
Yeah, that would make things a lot easier.
Post by Eric Snow
That should work as well as the register approach. It won't work for
all loaders but should be good enough. I was just planning on registering
ModuleSpec on the loader in the setter for a `loader` property on
ModuleSpec.
Post by Eric Snow
But the registration is at the class level so how would that work?
@property
return self._loader
@loader.setter
register = loader.__class__.register
pass
register(self.__class__)
self._loader = loader
It's not pretty and it won't work on non-ABCs, but it's better than
nothing. The likelihood of someone doing an isinstance check on a loader
seems pretty low though. Of course, I'm planning on doing just that for
handling of namespace packages, but that's a little different.

That ends up registering ModuleSpec as an example of every loader ABC, so
it doesn't work at all. Making the importlib ABC hooks ModuleSpec aware (so
they knew to check the loader, not the spec) would be pretty easy, though.
Post by Eric Snow
-eric
_______________________________________________
Import-SIG mailing list
Import-SIG at python.org
http://mail.python.org/mailman/listinfo/import-sig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130813/f2ae34f2/attachment-0001.html>
Eric Snow
2013-08-14 03:18:35 UTC
Permalink
Post by Eric Snow
Post by Eric Snow
@property
return self._loader
@loader.setter
register = loader.__class__.register
pass
register(self.__class__)
self._loader = loader
It's not pretty and it won't work on non-ABCs, but it's better than
nothing. The likelihood of someone doing an isinstance check on a loader
seems pretty low though. Of course, I'm planning on doing just that for
handling of namespace packages, but that's a little different.
That ends up registering ModuleSpec as an example of every loader ABC, so
it doesn't work at all.
I guess it does amount to a cheap trick, allowing isinstance() checks to
pass but not necessarily providing the appropriate APIs.
Post by Eric Snow
Making the importlib ABC hooks ModuleSpec aware (so they knew to check the
loader, not the spec) would be pretty easy, though.
That's what I thought Brett was recommending earlier. I was going to
express hesitation at spreading backward-compatibility tendrils. However,
your recommendation is probably a good idea on its own. Several of the
collections ABCs do explicit API checks and they'd work well here too.
I'll add this to the PEP.

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130813/4ce42ad3/attachment.html>
PJ Eby
2013-08-15 00:27:40 UTC
Permalink
Post by Eric Snow
A High-Level View
-----------------
...
It would be really helpful if that high-level view were actually
included, as I'm having a lot of trouble wrapping my head around the
rest of the spec. For that matter, some introductory examples to
contrast "before" and "after" for something that this changes would be
really nice at about this point.
Post by Eric Snow
If any of these is set, it indicates that the module is path-based. For
reference, a path entry is a string for a location where the import
system will look for modules, e.g. the path entries in ``sys.path`` or a
package's ``__path__``).
What does "path-based" actually mean here? On the one hand, you're
saying that a path entry is on sys.path or a __path__, but then we're
using an attribute called "filename". Shouldn't it be called
path_entry or subpath, or location or something, if it's not required
to be a filename? The overlap between path = sys.path and path =
filesystem path is way too confusing here.
Post by Eric Snow
.. XXX Would a different name be better? ``path_location``?
Yeah, definitely something other than filename. ;-)

It might also help to explain that some modules can be loaded by
reference to a location, e.g. a filesystem path or a URL or something
of the sort -- having the location lets you load the module, but in
theory you could load that module under various names. In contrast,
non-located modules can't be loaded in this fashion: modules created
by a meta path loader (such as builtins), or modules dynamically
created in code. For these, the name is the only way to access them,
so they have an "origin" but not a "location".

Also, bear in mind that it's not just exotic locations like URLs that
aren't filenames. zipimport uses pseudo-filenames that pretend a
zipfile is a directory, by prepending the zipfile's filename to a path
that's within the zipfile. So, calling this "filename" is *really* a
bad idea; it's not always a filename for even stdlib importers, let
alone anything third-party!
Post by Eric Snow
``path``
The list of path entries in which to search for submodules if this
module is a package. Otherwise it is ``None``.
This should probably be called submodule_path or
submodule_search_locations or something, to avoid even *more*
overloading of the word "path". ;-)
Post by Eric Snow
.. XXX add a path-based subclass?
Why? What good would it do?
Post by Eric Snow
ModuleSpec Methods
------------------
``from_loader(name, loader, *, is_package=None, origin=None, filename=None,
cached=None, path=None)``
.. XXX use a different name?
Seems fine to me: it's consistent w/other stdlib factory method names.
Post by Eric Snow
If not passed in, ``path`` is set to an empty list if
``is_package`` is true. Then the directory from ``filename`` is
appended to it, if possible. If ``is_package`` is false, ``path``
stays unset.
How does this interact with namespace packages? Does it?
Post by Eric Snow
Sets the module's import-related attributes to the corresponding values
in the module spec. If a path-based attribute is not set on the spec,
Location-based? ;-)
Post by Eric Snow
``load(module=None, *, is_reload=False)``
This method captures the current functionality of and requirements on
``Loader.load_module()`` without any semantic changes, except one.
Reloading a module when ``exec_module()`` is available actually uses
``module`` rather than ignoring it in favor of the one in
``sys.modules``, as ``Loader.load_module()`` does.
Interesting -- this could possibly be leveraged to implement
multi-version imports.
Post by Eric Snow
``module`` is only allowed when ``is_reload`` is true.
...or not. ;)
Post by Eric Snow
This means that
``is_reload`` could be dropped as a parameter. However, doing so would
mean we could not use ``None`` to indicate that the module should be
pulled from ``sys.modules``.
Wait, what? That doesn't seem true to me: why not just use the module
or pull one according to whether it's None or not? What actual
difference does is_reload really make here?
Post by Eric Snow
Regarding the first part of ``load()``, the following describes what
happens.
I'm thinking maybe this should be parameterized to allow passing in a
'modules' dictionary other than sys.modules. This would make
multi-version imports or other "isolated environment" imports more
viable, and factor out another global element of the import system.
That way, if you implement an isolated module system, you don't have
to duplicate or subclass ModuleSpec to perform the same loading
functionality.
Post by Eric Snow
Unfortunately, the ability to proxy does not extend to ``id()``
comparisons and ``isinstance()`` tests.
Who does id() tests on loaders? isinstance() fudging, OTOH, is quite
doable. See the ProxyTypes library on PyPI for an example; it's
2.x-only but I believe somebody has done a proof-of-concept port (due
to some __special__ methods being different or missing in 3.x)
Post by Eric Snow
Finders
-------
Finders will now return ModuleSpec objects when ``find_module()`` is
called rather than loaders. For backward compatility, ``Modulespec``
objects proxy the attributes of their ``loader`` attribute.
Has anybody looked at how this change affects pkgutil's (and
setuptools') generic function-based extensions to PEP 302? Currently,
you can register specific loader types with these guys, but that'll
likely break if importlib is going to start wrapping loaders without
those tools' knowledge.

May I suggest adding a new finder method, find_module_spec() instead?
Then, implement it for finders that don't support it by calling
find_module() and wrapping the loader with a ModuleSpec. This
approach would be less disruptive to code that already uses
find_module and inspects loader types to add extension protocols.
Post by Eric Snow
Adding another similar method to avoid backward-compatibility issues
is undersireable if avoidable. The import APIs have suffered enough,
especially considering ``PathEntryFinder.find_loader()`` was just
added in Python 3.3. The approach taken by this PEP should be
sufficient to address backward-compatibility issues for
``find_module()``.
I'm not sure I'm following here: are you saying that all PEP 302
finders implemented by anyone, anywhere, must be changed *in order to
work at all*, when this lands in a *minor version change*?
Post by Eric Snow
Other Changes
-------------
This section doesn't address impact on pkgutil, which makes
significant use of the PEP 302 API.
Eric Snow
2013-08-15 07:38:11 UTC
Permalink
Post by PJ Eby
Post by Eric Snow
A High-Level View
-----------------
...
It would be really helpful if that high-level view were actually
included, as I'm having a lot of trouble wrapping my head around the
rest of the spec. For that matter, some introductory examples to
contrast "before" and "after" for something that this changes would be
really nice at about this point.
Sounds good. As to examples, do you mean how you would replace an
implementation of load_module() with one of exec_module()?
Post by PJ Eby
Post by Eric Snow
If any of these is set, it indicates that the module is path-based. For
reference, a path entry is a string for a location where the import
system will look for modules, e.g. the path entries in ``sys.path`` or a
package's ``__path__``).
What does "path-based" actually mean here? On the one hand, you're
saying that a path entry is on sys.path or a __path__, but then we're
using an attribute called "filename". Shouldn't it be called
path_entry or subpath, or location or something, if it's not required
to be a filename? The overlap between path = sys.path and path =
filesystem path is way too confusing here.
This is a really good point. I'll clean it up. I've already changed
"path" to "path_entries" and dropped "filename" in favor of "set_fileattr".
Furthermore, "file location" is a good substitute for "path" when talking
about files.
Post by PJ Eby
Post by Eric Snow
.. XXX Would a different name be better? ``path_location``?
Yeah, definitely something other than filename. ;-)
It might also help to explain that some modules can be loaded by
reference to a location, e.g. a filesystem path or a URL or something
of the sort -- having the location lets you load the module, but in
theory you could load that module under various names. In contrast,
non-located modules can't be loaded in this fashion: modules created
by a meta path loader (such as builtins), or modules dynamically
created in code. For these, the name is the only way to access them,
so they have an "origin" but not a "location".
Right. That's the point of "origin". It will be up to the loader whether
or not to use "origin" to determine a location, if any.

Also, bear in mind that it's not just exotic locations like URLs that
Post by PJ Eby
aren't filenames. zipimport uses pseudo-filenames that pretend a
zipfile is a directory, by prepending the zipfile's filename to a path
that's within the zipfile. So, calling this "filename" is *really* a
bad idea; it's not always a filename for even stdlib importers, let
alone anything third-party!
Yeah, that has always bugged me about "__file__". The upcoming revision of
the PEP uses the combo of "origin" and "set_fileattr" (a bool) instead of
"filename".
Post by PJ Eby
Post by Eric Snow
``path``
The list of path entries in which to search for submodules if this
module is a package. Otherwise it is ``None``.
This should probably be called submodule_path or
submodule_search_locations or something, to avoid even *more*
overloading of the word "path". ;-)
I came to the same conclusion and was planning on using "path_entries".
However perhaps something even more explicit, like
"submodule_search_locations", would be better. :)
Post by PJ Eby
If not passed in, ``path`` is set to an empty list if
``is_package`` is true. Then the directory from ``filename`` is
Post by Eric Snow
appended to it, if possible. If ``is_package`` is false, ``path``
stays unset.
How does this interact with namespace packages? Does it?
Namespace packages won't use this method, so nothing will be populated
dynamically.
Post by PJ Eby
Post by Eric Snow
``load(module=None, *, is_reload=False)``
This method captures the current functionality of and requirements on
``Loader.load_module()`` without any semantic changes, except one.
Reloading a module when ``exec_module()`` is available actually uses
``module`` rather than ignoring it in favor of the one in
``sys.modules``, as ``Loader.load_module()`` does.
Interesting -- this could possibly be leveraged to implement
multi-version imports.
I'm planning on splitting reload() out from load() so those semantics would
go away. However, there may be room to still provide the same
functionality. What would be needed for multi-version imports? (Is that
question opening a can of worms? <wink>)
Post by PJ Eby
Post by Eric Snow
This means that
``is_reload`` could be dropped as a parameter. However, doing so would
mean we could not use ``None`` to indicate that the module should be
pulled from ``sys.modules``.
Wait, what? That doesn't seem true to me: why not just use the module
or pull one according to whether it's None or not? What actual
difference does is_reload really make here?
With a separate reload() this point is moot.
Post by PJ Eby
Post by Eric Snow
Regarding the first part of ``load()``, the following describes what
happens.
I'm thinking maybe this should be parameterized to allow passing in a
'modules' dictionary other than sys.modules. This would make
multi-version imports or other "isolated environment" imports more
viable, and factor out another global element of the import system.
That way, if you implement an isolated module system, you don't have
to duplicate or subclass ModuleSpec to perform the same loading
functionality.
Cool idea, but couldn't this wait. I could totally see this as part of PEP
406 (import engine).
Post by PJ Eby
Post by Eric Snow
Unfortunately, the ability to proxy does not extend to ``id()``
comparisons and ``isinstance()`` tests.
Who does id() tests on loaders?
Which is why I'm not going to worry about it too much. :)
Post by PJ Eby
isinstance() fudging, OTOH, is quite
doable. See the ProxyTypes library on PyPI for an example; it's
2.x-only but I believe somebody has done a proof-of-concept port (due
to some __special__ methods being different or missing in 3.x)
The current plan is to simply implement __subclasshook__() on the various
importlib ABCs, and perhaps other loaders, to check for methods. Some of
the ABCs in collections.abc (like Iterator) do this.
Post by PJ Eby
Finders
Post by Eric Snow
-------
Finders will now return ModuleSpec objects when ``find_module()`` is
called rather than loaders. For backward compatility, ``Modulespec``
objects proxy the attributes of their ``loader`` attribute.
Has anybody looked at how this change affects pkgutil's (and
setuptools') generic function-based extensions to PEP 302? Currently,
you can register specific loader types with these guys, but that'll
likely break if importlib is going to start wrapping loaders without
those tools' knowledge.
Good point. I'll look into this.
Post by PJ Eby
May I suggest adding a new finder method, find_module_spec() instead?
Then, implement it for finders that don't support it by calling
find_module() and wrapping the loader with a ModuleSpec. This
approach would be less disruptive to code that already uses
find_module and inspects loader types to add extension protocols.
I consider this a last resort--i.e. if we can't find a way to make
find_module() work for us in a simple enough way. I just cringe at the
idea of bolting on another backward-compatibility-induced method,
particularly when it's the OOTDI and the existing name is better fit for
the new functionality than old and re-purposing find_module() is within
reach.
Post by PJ Eby
Adding another similar method to avoid backward-compatibility issues
Post by Eric Snow
is undersireable if avoidable. The import APIs have suffered enough,
especially considering ``PathEntryFinder.find_loader()`` was just
added in Python 3.3. The approach taken by this PEP should be
sufficient to address backward-compatibility issues for
``find_module()``.
I'm not sure I'm following here: are you saying that all PEP 302
finders implemented by anyone, anywhere, must be changed *in order to
work at all*, when this lands in a *minor version change*?
Existing finders and loaders will continue working as-is. I've already got
this working in a rough implementation, so it's not that big a stretch.
Post by PJ Eby
Post by Eric Snow
Other Changes
-------------
This section doesn't address impact on pkgutil, which makes
significant use of the PEP 302 API.
I'll add that in. Thanks for bringing it up. My draft implementation is
passing all the pkgutil tests, but I wouldn't be surprised if I've missed
something here.

Anyway, thanks for the feedback. I'll post an update to the PEP in the
next day or two.

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130815/50362fb1/attachment-0001.html>
Nick Coghlan
2013-08-15 15:23:45 UTC
Permalink
Post by Eric Snow
Post by PJ Eby
I'm thinking maybe this should be parameterized to allow passing in a
'modules' dictionary other than sys.modules. This would make
multi-version imports or other "isolated environment" imports more
viable, and factor out another global element of the import system.
That way, if you implement an isolated module system, you don't have
to duplicate or subclass ModuleSpec to perform the same loading
functionality.
Cool idea, but couldn't this wait. I could totally see this as part of PEP
406 (import engine).
One of the conclusions I came to from Greg's import engine work is
that the only practical way for us to get to isolated import
subsystems is either with a Decimal style thread local context based
solution, or with a split create/exec API where the loader doesn't do
any global state manipulation at all and instead operates in a
functional mode where it just returns values based on passed in
parameters (that way the import system at least has the chance to
override __import__ before running the module code). Anything else
looks like it will be too fragile (and the latter approach doesn't
necessarily work for C extensions that do imports).

This is part of why I'm keen on having this PEP expose "create" and
"exec" as separate operations on ModuleSpec, with "load" acting solely
as a convenience function for combining them with the appropriate
sys.modules manipulation.

Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
Eric Snow
2013-08-15 22:15:57 UTC
Permalink
Post by Nick Coghlan
Post by Eric Snow
Post by PJ Eby
I'm thinking maybe this should be parameterized to allow passing in a
'modules' dictionary other than sys.modules. This would make
multi-version imports or other "isolated environment" imports more
viable, and factor out another global element of the import system.
That way, if you implement an isolated module system, you don't have
to duplicate or subclass ModuleSpec to perform the same loading
functionality.
Cool idea, but couldn't this wait. I could totally see this as part of
PEP
Post by Eric Snow
406 (import engine).
One of the conclusions I came to from Greg's import engine work is
that the only practical way for us to get to isolated import
subsystems is either with a Decimal style thread local context based
solution,
I was messing around with this a while back and the thread-local context
approach was pretty easy to do.
Post by Nick Coghlan
or with a split create/exec API where the loader doesn't do
any global state manipulation at all and instead operates in a
functional mode where it just returns values based on passed in
parameters (that way the import system at least has the chance to
override __import__ before running the module code). Anything else
looks like it will be too fragile (and the latter approach doesn't
necessarily work for C extensions that do imports).
This is part of why I'm keen on having this PEP expose "create" and
"exec" as separate operations on ModuleSpec, with "load" acting solely
as a convenience function for combining them with the appropriate
sys.modules manipulation.
Ah. That helps clarify things. I'll got stew on that a bit.

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130815/c1768a35/attachment-0001.html>
Loading...