[Import-SIG] Round 2 for "A ModuleSpec Type for the Import System"

Discussion:

Eric Snow

2013-08-09 22:58:09 UTC

Here's an updated version of the PEP for ModuleSpec which addresses the
feedback I've gotten. Thanks for the help. The big open question, to me,
is whether or not to have a separate reload() method. I'll be looking into
that when I get a chance. There's also the question of a path-based
subclass, but I'm currently not convinced it's worth it.

-eric

-----------------------------------

PEP: 4XX
Title: A ModuleSpec Type for the Import System
Version: $Revision$
Last-Modified: $Date$
Author: Eric Snow <ericsnowcurrently at gmail.com>
BDFL-Delegate: ???
Discussions-To: import-sig at python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 8-Aug-2013
Python-Version: 3.4
Post-History: 8-Aug-2013
Resolution:

Abstract
========

This PEP proposes to add a new class to ``importlib.machinery`` called
``ModuleSpec``. It will contain all the import-related information
about a module without needing to load the module first. Finders will
now return a module's spec rather than a loader. The import system will
use the spec to load the module.

Motivation
==========

The import system has evolved over the lifetime of Python. In late 2002
PEP 302 introduced standardized import hooks via ``finders`` and
``loaders`` and ``sys.meta_path``. The ``importlib`` module, introduced
with Python 3.1, now exposes a pure Python implementation of the APIs
described by PEP 302, as well as of the full import system. It is now
much easier to understand and extend the import system. While a benefit
to the Python community, this greater accessibilty also presents a
challenge.

As more developers come to understand and customize the import system,
any weaknesses in the finder and loader APIs will be more impactful. So
the sooner we can address any such weaknesses the import system, the
better...and there are a couple we can take care of with this proposal.

Firstly, any time the import system needs to save information about a
module we end up with more attributes on module objects that are
generally only meaningful to the import system and occoasionally to some
people. It would be nice to have a per-module namespace to put future
import-related information. Secondly, there's an API void between
finders and loaders that causes undue complexity when encountered.

Finders are strictly responsible for providing the loader which the
import system will use to load the module. The loader is then
responsible for doing some checks, creating the module object, setting
import-related attributes, "installing" the module to ``sys.modules``,
and loading the module, along with some cleanup. This all takes place
during the import system's call to ``Loader.load_module()``. Loaders
also provide some APIs for accessing data associated with a module.

Loaders are not required to provide any of the functionality of
``load_module()`` through other methods. Thus, though the import-
related information about a module is likely available without loading
the module, it is not otherwise exposed.

Furthermore, the requirements assocated with ``load_module()`` are
common to all loaders and mostly are implemented in exactly the same
way. This means every loader has to duplicate the same boilerplate
code. ``importlib.util`` provides some tools that help with this, but
it would be more helpful if the import system simply took charge of
these responsibilities. The trouble is that this would limit the degree
of customization that ``load_module()`` facilitates. This is a gap
between finders and loaders which this proposal aims to fill.

Finally, when the import system calls a finder's ``find_module()``, the
finder makes use of a variety of information about the module that is
useful outside the context of the method. Currently the options are
limited for persisting that per-module information past the method call,
since it only returns the loader. Popular options for this limitation
are to store the information in a module-to-info mapping somewhere on
the finder itself, or store it on the loader.

Unfortunately, loaders are not required to be module-specific. On top
of that, some of the useful information finders could provide is
common to all finders, so ideally the import system could take care of
that. This is the same gap as before between finders and loaders.

As an example of complexity attributable to this flaw, the
implementation of namespace packages in Python 3.3 (see PEP 420) added
``FileFinder.find_loader()`` because there was no good way for
``find_module()`` to provide the namespace path.

The answer to this gap is a ``ModuleSpec`` object that contains the
per-module information and takes care of the boilerplate functionality
of loading the module.

(The idea gained momentum during discussions related to another PEP.[1])

Specification
=============

The goal is to address the gap between finders and loaders while
changing as little of their semantics as possible. Though some
functionality and information is moved the new ``ModuleSpec`` type,
their semantics should remain the same. However, for the sake of
clarity, those semantics will be explicitly identified.

A High-Level View
-----------------

...

ModuleSpec
----------

A new class which defines the import-related values to use when loading
the module. It closely corresponds to the import-related attributes of
module objects. ``ModuleSpec`` objects may also be used by finders and
loaders and other import-related APIs to hold extra import-related
state about the module. This greatly reduces the need to add any new
new import-related attributes to module objects, and loader ``__init__``
methods won't need to accommodate such per-module state.

Creating a ModuleSpec:

``ModuleSpec(name, loader, *, origin=None, filename=None, cached=None,
path=None)``

The parameters have the same meaning as the attributes described below.
However, not all ``ModuleSpec`` attributes are also parameters. The
passed values are set as-is. For calculated values use the
``from_loader()`` method.

ModuleSpec Attributes
---------------------

Each of the following names is an attribute on ``ModuleSpec`` objects.
A value of ``None`` indicates "not set". This contrasts with module
objects where the attribute simply doesn't exist.

While ``package`` and ``is_package`` are read-only properties, the
remaining attributes can be replaced after the module spec is created
and after import is complete. This allows for unusual cases where
modifying the spec is the best option. However, typical use should not
involve changing the state of a module's spec.

Most of the attributes correspond to the import-related attributes of
modules. Here is the mapping, followed by a description of the
attributes. The reverse of this mapping is used by
``init_module_attrs()``.

============= ===========
On ModuleSpec On Modules
============= ===========
name __name__
loader __loader__
package __package__
is_package -
origin -
filename __file__
cached __cached__
path __path__
============= ===========

``name``

The module's fully resolved and absolute name. It must be set.

``loader``

The loader to use during loading and for module data. These specific
functionalities do not change for loaders. Finders are still
responsible for creating the loader and this attribute is where it is
stored. The loader must be set.

``package``

The name of the module's parent. This is a dynamic attribute with a
value derived from ``name`` and ``is_package``. For packages it is the
value of ``name``. Otherwise it is equivalent to
``name.rpartition('.')[0]``. Consequently, a top-level module will have
give the empty string for ``package``.

``is_package``

Whether or not the module is a package. This dynamic attribute is True
if ``path`` is set (even if empty), else it is false.

``origin``

A string for the location from which the module originates. If
``filename`` is set, ``origin`` should be set to the same value unless
some other value is more appropriate. ``origin`` is used in
``module_repr()`` if it does not match the value of ``filename``.

Using ``filename`` for this meaning would be inaccurate, since not all
modules have path-based locations. For instance, built-in modules do
not have ``__file__`` set. Yet it is useful to have a descriptive
string indicating that it originated from the interpreter as a built-in
module. So built-in modules will have ``origin`` set to ``"built-in"``.

Path-based attributes:

If any of these is set, it indicates that the module is path-based. For
reference, a path entry is a string for a location where the import
system will look for modules, e.g. the path entries in ``sys.path`` or a
package's ``__path__``).

``filename``

Like ``origin``, but limited to a path-based location. If ``filename``
is set, ``origin`` should be set to the same string, unless origin is
explicitly set to something else. ``filename`` is not necessarily an
actual file name, but could be any location string based on a path
entry. Regarding the attribute name, while it is potentially
inaccurate, it is both consistent with the equivalent module attribute
and generally accurate.

.. XXX Would a different name be better? ``path_location``?

``cached``

The path-based location where the compiled code for a module should be
stored. If ``filename`` is set to a source file, this should be set to
corresponding path that PEP 3147 specifies. The
``importlib.util.source_to_cache()`` function facilitates getting the
correct value.

``path``

The list of path entries in which to search for submodules if this
module is a package. Otherwise it is ``None``.

.. XXX add a path-based subclass?

ModuleSpec Methods
------------------

``from_loader(name, loader, *, is_package=None, origin=None, filename=None,
cached=None, path=None)``

.. XXX use a different name?

A factory classmethod that returns a new ``ModuleSpec`` derived from the
arguments. ``is_package`` is used inside the method to indicate that
the module is a package. If not explicitly passed in, it is set to
``True`` if ``path`` is passed in. It falls back to using the result of
the loader's ``is_package()``, if available. Finally it defaults to
False. The remaining parameters have the same meaning as the
corresponding ``ModuleSpec`` attributes.

In contrast to ``ModuleSpec.__init__()``, which takes the arguments
as-is, ``from_loader()`` calculates missing values from the ones passed
in, as much as possible. This replaces the behavior that is currently
provided the several ``importlib.util`` functions as well as the
optional ``init_module_attrs()`` method of loaders. Just to be clear,
here is a more detailed description of those calculations::

If not passed in, ``filename`` is to the result of calling the
loader's ``get_filename()``, if available. Otherwise it stays
unset (``None``).

If not passed in, ``path`` is set to an empty list if
``is_package`` is true. Then the directory from ``filename`` is
appended to it, if possible. If ``is_package`` is false, ``path``
stays unset.

If ``cached`` is not passed in and ``filename`` is passed in,
``cached`` is derived from it. For filenames with a source suffix,
it set to the result of calling
``importlib.util.cache_from_source()``. For bytecode suffixes (e.g.
``.pyc``), ``cached`` is set to the value of ``filename``. If
``filename`` is not passed in or ``cache_from_source()`` raises
``NotImplementedError``, ``cached`` stays unset.

If not passed in, ``origin`` is set to ``filename``. Thus if
``filename`` is unset, ``origin`` stays unset.

``module_repr()``

Returns a repr string for the module if ``origin`` is set and
``filename`` is not set. The string refers to the value of ``origin``.
Otherwise ``module_repr()`` returns None. This indicates to the module
type's ``__repr__()`` that it should fall back to the default repr.

We could also have ``module_repr()`` produce the repr for the case where
``filename`` is set or where ``origin`` is not set, mirroring the repr
that the module type produces directly. However, the repr string is
derived from the import-related module attributes, which might be out of
sync with the spec.

.. XXX Is using the spec close enough? Probably not.

The implementation of the module type's ``__repr__()`` will change to
accommodate this PEP. However, the current functionality will remain to
handle the case where a module does not have a ``__spec__`` attribute.

``init_module_attrs(module)``

Sets the module's import-related attributes to the corresponding values
in the module spec. If a path-based attribute is not set on the spec,
it is not set on the module. For the rest, a ``None`` value on the spec
(aka "not set") means ``None`` will be set on the module. If any of the
attributes are already set on the module, the existing values are
replaced. The module's own ``__spec__`` is not consulted but does get
replaced with the spec on which ``init_module_attrs()`` was called.
The earlier mapping of ``ModuleSpec`` attributes to module attributes
indicates which attributes are involved on both sides.

``load(module=None, *, is_reload=False)``

This method captures the current functionality of and requirements on
``Loader.load_module()`` without any semantic changes, except one.
Reloading a module when ``exec_module()`` is available actually uses
``module`` rather than ignoring it in favor of the one in
``sys.modules``, as ``Loader.load_module()`` does.

``module`` is only allowed when ``is_reload`` is true. This means that
``is_reload`` could be dropped as a parameter. However, doing so would
mean we could not use ``None`` to indicate that the module should be
pulled from ``sys.modules``. Furthermore, ``is_reload`` makes the
intent of the call clear.

There are two parts to what happens in ``load()``. First, the module is
prepared, loaded, updated appropriately, and left available for the
second part. This is described in more detail shortly.

Second, in the case of error during a normal load (not reload) the
module is removed from ``sys.modules``. If no error happened, the
module is pulled from ``sys.modules``. This the module returned by
``load()``. Before it is returned, if it is a different object than the
one produced by the first part, attributes of the module from
``sys.modules`` are updated to reflect the spec.

Returning the module from ``sys.modules`` accommodates the ability of
the module to replace itself there while it is executing (during load).

As already noted, this is what already happens in the import system.
``load()`` is not meant to change any of this behavior.

Regarding the first part of ``load()``, the following describes what
happens. It depends on if ``is_reload`` is true and if the loader has
``exec_module()``.

For normal load with ``exec_module()`` available::

A new module is created, ``init_module_attrs()`` is called to set
its attributes, and it is set on sys.modules. At that point
the loader's ``exec_module()`` is called, after which the module
is ready for the second part of loading.

.. XXX What if the module already exists in sys.modules?

For normal load without ``exec_module()`` available::

The loader's ``load_module()`` is called and the attributes of the
module it returns are updated to match the spec.

For reload with ``exec_module()`` available::

If ``module`` is ``None``, it is pulled from ``sys.modules``. If
still ``None``, ImportError is raised. Otherwise ``exec_module()``
is called, passing in the module-to-be-reloaded.

For reload without ``exec_module()`` available::

The loader's ``load_module()`` is called and the attributes of the
module it returns are updated to match the spec.

There is some boilerplate involved when ``exec_module()`` is available,
but only the boilerplate that the import system uses currently.

If ``loader`` is not set (``None``), ``load()`` raises a ValueError. If
``module`` is passed in but ``is_reload`` is false, a ValueError is also
raises to indicate that ``load()`` was called incorrectly. There may be
use cases for calling ``load()`` in that way, but they are outside the
scope of this PEP

.. XXX add reload(module=None) and drop load()'s parameters entirely?
.. XXX add more of importlib.reload()'s boilerplate to load()/reload()?

Backward Compatibility
----------------------

Since ``Finder.find_module()`` methods would now return a module spec
instead of loader, specs must act like the loader that would have been
returned instead. This is relatively simple to solve since the loader
is available as an attribute of the spec. We will use ``__getattr__()``
to do it.

However, ``ModuleSpec.is_package`` (an attribute) conflicts with
``InspectLoader.is_package()`` (a method). Working around this requires
a more complicated solution but is not a large obstacle. Simply making
``ModuleSpec.is_package`` a method does not reflect that is a relatively
static piece of data. ``module_repr()`` also conflicts with the same
method on loaders, but that workaround is not complicated since both are
methods.

Unfortunately, the ability to proxy does not extend to ``id()``
comparisons and ``isinstance()`` tests. In the case of the return value
of ``find_module()``, we accept that break in backward compatibility.
However, we will mitigate the problem with ``isinstance()`` somewhat by
registering ``ModuleSpec`` on the loaders in ``importlib.abc``.

Subclassing
-----------

Subclasses of ModuleSpec are allowed, but should not be necessary.
Adding functionality to a custom finder or loader will likely be a
better fit and should be tried first. However, as long as a subclass
still fulfills the requirements of the import system, objects of that
type are completely fine as the return value of ``find_module()``.

Module Objects
--------------

Module objects will now have a ``__spec__`` attribute to which the
module's spec will be bound. None of the other import-related module
attributes will be changed or deprecated, though some of them could be;
any such deprecation can wait until Python 4.

``ModuleSpec`` objects will not be kept in sync with the corresponding
module object's import-related attributes. Though they may differ, in
practice they will typically be the same.

Finders
-------

Finders will now return ModuleSpec objects when ``find_module()`` is
called rather than loaders. For backward compatility, ``Modulespec``
objects proxy the attributes of their ``loader`` attribute.

Adding another similar method to avoid backward-compatibility issues
is undersireable if avoidable. The import APIs have suffered enough,
especially considering ``PathEntryFinder.find_loader()`` was just
added in Python 3.3. The approach taken by this PEP should be
sufficient to address backward-compatibility issues for
``find_module()``.

The change to ``find_module()`` applies to both ``MetaPathFinder`` and
``PathEntryFinder``. ``PathEntryFinder.find_loader()`` will be
deprecated and, for backward compatibility, implicitly special-cased if
the method exists on a finder.

Finders are still responsible for creating the loader. That loader will
now be stored in the module spec returned by ``find_module()`` rather
than returned directly. As is currently the case without the PEP, if a
loader would be costly to create, that loader can be designed to defer
the cost until later.

Loaders
-------

Loaders will have a new method, ``exec_module(module)``. Its only job
is to "exec" the module and consequently populate the module's
namespace. It is not responsible for creating or preparing the module
object, nor for any cleanup afterward. It has no return value.

The ``load_module()`` of loaders will still work and be an active part
of the loader API. It is still useful for cases where the default
module creation/prepartion/cleanup is not appropriate for the loader.

For example, the C API for extension modules only supports the full
control of ``load_module()``. As such, ``ExtensionFileLoader`` will not
implement ``exec_module()``. In the future it may be appropriate to
produce a second C API that would support an ``exec_module()``
implementation for ``ExtensionFileLoader``. Such a change is outside
the scope of this PEP.

A loader must have at least one of ``exec_module()`` and
``load_module()`` defined. If both exist on the loader,
``ModuleSpec.load()`` uses ``exec_module()`` and ignores
``load_module()``.

PEP 420 introduced the optional ``module_repr()`` loader method to limit
the amount of special-casing in the module type's ``__repr__()``. Since
this method is part of ``ModuleSpec``, it will be deprecated on loaders.
However, if it exists on a loader it will be used exclusively.

``Loader.init_module_attr()`` method, added prior to Python 3.4's
release , will be removed in favor of the same method on ``ModuleSpec``.

However, ``InspectLoader.is_package()`` will not be deprecated even
though the same information is found on ``ModuleSpec``. ``ModuleSpec``
can use it to populate its own ``is_package`` if that information is
not otherwise available. Still, it will be made optional.

The path-based loaders in ``importlib`` take arguments in their
``__init__()`` and have corresponding attributes. However, the need for
those values is eliminated. The only exception is
``FileLoader.get_filename()``, which uses ``self.path``. The signatures
for these loaders and the accompanying attributes will be deprecated.

In addition to executing a module during loading, loaders will still be
directly responsible for providing APIs concerning module-related data.

Other Changes
-------------

* The various finders and loaders provided by ``importlib`` will be
updated to comply with this proposal.

* The spec for the ``__main__`` module will reflect how the interpreter
was started. For instance, with ``-m`` the spec's name will be that of
the run module, while ``__main__.__name__`` will still be "__main__".

* We add ``importlib.find_module()`` to mirror
``importlib.find_loader()`` (which becomes deprecated).

* Deprecations in ``importlib.util``: ``set_package()``,
``set_loader()``, and ``module_for_loader()``. ``module_to_load()``
(introduced prior to Python 3.4's release) can be removed.

* ``importlib.reload()`` is changed to use ``ModuleSpec.load()``.

* ``ModuleSpec.load()`` and ``importlib.reload()`` will now make use of
the per-module import lock, whereas ``Loader.load_module()`` did not.

Reference Implementation
------------------------

A reference implementation is available at <TBD>.

References
==========

[1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html

Copyright
=========

This document has been placed in the public domain.

..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130809/8fc6e9e9/attachment-0001.html>

Nick Coghlan

2013-08-11 13:03:00 UTC

Permalink

I think this is solid enough to be worth adding to the PEPs repo now.

One piece of feedback from me (triggered by the C extension modules
discussion on python-dev): we should consider proposing a new "exec"
hook for C extension modules that could be defined instead of or in
addition to the existing PEP 3121 init hook.

Extension modules that don't rely on mutable static variables or the
PEP 3121 per-interpreter state APIs could just define the new exec
hook and get a new module instance every time they're imported. Those
that do have per-interpreter state would still get an opportunity to
run additional code after all the magic attributes have been set.

Also, to handle the extension module case, we may need to let loaders
define an optional "create_module" method that accepts the MethodSpec
object as an argument. The extension module loader would implement
this as handling the PyInit_<modulename> call. (Setting the magic
attributes according to the spec would happen automatically after the
call, so each loader wouldn't need to implement that part)

(Note: once I get back to Australia around the 22nd, I should have
time to help out more directly with this)

Post by Eric Snow
-----------------------------------
Firstly, any time the import system needs to save information about a
module we end up with more attributes on module objects that are
generally only meaningful to the import system and occoasionally to some

Typo: occoasionally

Post by Eric Snow
people. It would be nice to have a per-module namespace to put future
import-related information. Secondly, there's an API void between
finders and loaders that causes undue complexity when encountered.
Finders are strictly responsible for providing the loader which the

"are currently responsible" (since the PEP is about changing the
responsibiity of finders, this is a little unclear at present)

Post by Eric Snow
Specification
=============
The goal is to address the gap between finders and loaders while
changing as little of their semantics as possible. Though some
functionality and information is moved the new ``ModuleSpec`` type,

"moved to the new"

Post by Eric Snow
their semantics should remain the same. However, for the sake of
clarity, those semantics will be explicitly identified.
A High-Level View
-----------------
...

Not sure a high level view is needed, but you can fill this in if you want :)

Post by Eric Snow
ModuleSpec
----------
A new class which defines the import-related values to use when loading
the module. It closely corresponds to the import-related attributes of
module objects. ``ModuleSpec`` objects may also be used by finders and
loaders and other import-related APIs to hold extra import-related
state about the module. This greatly reduces the need to add any new
new import-related attributes to module objects, and loader ``__init__``
methods won't need to accommodate such per-module state.

To avoid conflicts as the spec attributes evolve in the future, would
it be worth having a "custom" field which is just an arbitrary object
reference used to pass info from the finder to the loader without
troubling the rest of the import system?

Post by Eric Snow
``ModuleSpec(name, loader, *, origin=None, filename=None, cached=None,
path=None)``
The parameters have the same meaning as the attributes described below.
However, not all ``ModuleSpec`` attributes are also parameters.
The
passed values are set as-is. For calculated values use the
``from_loader()`` method.

This paragraph isn't particularly clear. Perhaps:

"Passed in parameter values are assigned directly to the corresponding
attributes below. Other attributes not listed as parameters (such as
``package``) are read-only properties that are automatically derived
from these values.

The ``ModuleSpec.from_loader()`` class method allows a suitable
ModuleSpec instance to be easily created from a PEP 302 loader object"

Post by Eric Snow
ModuleSpec Attributes
---------------------
Each of the following names is an attribute on ``ModuleSpec`` objects.
A value of ``None`` indicates "not set". This contrasts with module
objects where the attribute simply doesn't exist.
While ``package`` and ``is_package`` are read-only properties, the
remaining attributes can be replaced after the module spec is created
and after import is complete. This allows for unusual cases where
modifying the spec is the best option. However, typical use should not
involve changing the state of a module's spec.

I'm with Brett that "is_package" should go, to be replaced by
"spec.path is not None" wherever it matters. is_package() would then
fall through to the PEP 302 loader API via __getattr__.

Post by Eric Snow
``package``
The name of the module's parent. This is a dynamic attribute with a
value derived from ``name`` and ``is_package``. For packages it is the
value of ``name``. Otherwise it is equivalent to
``name.rpartition('.')[0]``. Consequently, a top-level module will have
give the empty string for ``package``.

s/give//

Post by Eric Snow
``is_package``
Whether or not the module is a package. This dynamic attribute is True
if ``path`` is set (even if empty), else it is false.

As above (i.e. don't use it)

Post by Eric Snow
``origin``
A string for the location from which the module originates. If
``filename`` is set, ``origin`` should be set to the same value unless
some other value is more appropriate. ``origin`` is used in
``module_repr()`` if it does not match the value of ``filename``.
Using ``filename`` for this meaning would be inaccurate, since not all
modules have path-based locations. For instance, built-in modules do
not have ``__file__`` set. Yet it is useful to have a descriptive
string indicating that it originated from the interpreter as a built-in
module. So built-in modules will have ``origin`` set to ``"built-in"``.

How about we *just* have origin, with a separate "set_fileattr"
attribute to indicate "this is a discrete file, you should set
__file__"?

Also, we should explicitly note that we'll still set __file__ for zip
imports, due to backwards compatibility concerns, even though it
doesn't correspond to a valid filesystem path.

(Random thought: spec.origin + spec.cached + a cache directory setting
in zipimport would give a potentially clean way to do extension module
imports from zip archives)

Post by Eric Snow
``path``
The list of path entries in which to search for submodules if this
module is a package. Otherwise it is ``None``.

Path entries don't have to correspond to filesystem locations - they
just have to make sense to at least one path hook
(e.g. a DB URI would be a valid path entry).

Post by Eric Snow
.. XXX add a path-based subclass?

Nope :)

Post by Eric Snow
ModuleSpec Methods
------------------
``from_loader(name, loader, *, is_package=None, origin=None, filename=None,
cached=None, path=None)``
.. XXX use a different name?

I'd disallow customisation on this one - if people want to customise,
they should just query the PEP 302 APIs themselves and call the
ModuleSpec constructor directly. The use case for this one should be
to make it trivial to switch from "return loader" to "return
ModuleSpec.from_loader(loader)" in a find_module implementation.

Post by Eric Snow
In contrast to ``ModuleSpec.__init__()``, which takes the arguments
as-is, ``from_loader()`` calculates missing values from the ones passed
in, as much as possible. This replaces the behavior that is currently
provided the several ``importlib.util`` functions as well as the
optional ``init_module_attrs()`` method of loaders. Just to be clear,
If not passed in, ``filename`` is to the result of calling the
loader's ``get_filename()``, if available. Otherwise it stays
unset (``None``).
If not passed in, ``path`` is set to an empty list if
``is_package`` is true. Then the directory from ``filename`` is
appended to it, if possible. If ``is_package`` is false, ``path``
stays unset.
If ``cached`` is not passed in and ``filename`` is passed in,
``cached`` is derived from it. For filenames with a source suffix,
it set to the result of calling
``importlib.util.cache_from_source()``. For bytecode suffixes (e.g.
``.pyc``), ``cached`` is set to the value of ``filename``. If
``filename`` is not passed in or ``cache_from_source()`` raises
``NotImplementedError``, ``cached`` stays unset.
If not passed in, ``origin`` is set to ``filename``. Thus if
``filename`` is unset, ``origin`` stays unset.

Hmm, is there a reason this can't be the default constructor
behaviour? What's the value of *not* having the sensible fallbacks,
given they can always be overridden by passing in explicit values when
you want something different?

A separate "from_module(m)" constructor would probably make sense, though.

Post by Eric Snow
``module_repr()``
Returns a repr string for the module if ``origin`` is set and
``filename`` is not set. The string refers to the value of ``origin``.
Otherwise ``module_repr()`` returns None. This indicates to the module
type's ``__repr__()`` that it should fall back to the default repr.
We could also have ``module_repr()`` produce the repr for the case where
``filename`` is set or where ``origin`` is not set, mirroring the repr
that the module type produces directly. However, the repr string is
derived from the import-related module attributes, which might be out of
sync with the spec.
.. XXX Is using the spec close enough? Probably not.

I think it makes sense to always return the expected repr based on the
spec attributes, but allow a custom origin to be passed in to handle
the case where the module __file__ attribute differs from
__spec__.origin (keeping in mind I think __spec__.filename should be
replaced with __spec__.set_fileattr)

Post by Eric Snow
The implementation of the module type's ``__repr__()`` will change to
accommodate this PEP. However, the current functionality will remain to
handle the case where a module does not have a ``__spec__`` attribute.

Experience tells us that the import system should ensure the __spec__
attribute always exists (even if it has to be filled in from the
module attributes after calling load_module)

Post by Eric Snow
``load(module=None, *, is_reload=False)``

Yep, definitely needs to be a separate method. "is_reload" would
almost always be set to a boolean, which means a separate API is
likely to be better.

However, I think the separate method should be "exec()" rather than
"reload()" and require that the module always be passed in.

We could also expose a "create" method that just creates and returns
the new module object, and replace importlib.util.module_to_load with
a context manager that accepted the module as a parameter. Say
"add_to_sys", which fails if the module is already present in
sys.modules.

load() would then look something like:

def load(self):
m = self.create()
with importlib.util.add_to_sys(m):
self.exec(m)
return sys.modules[self.name]

We could also provide reload() if we wanted to:

def reload(self):
self.exec(sys.modules[self.name])
return sys.modules[self.name]

Post by Eric Snow
Subclassing
-----------
Subclasses of ModuleSpec are allowed, but should not be necessary.
Adding functionality to a custom finder or loader will likely be a
better fit and should be tried first. However, as long as a subclass
still fulfills the requirements of the import system, objects of that
type are completely fine as the return value of ``find_module()``.

We may need to do subclasses for the ABC registration backwards
compatibility hack.

Post by Eric Snow
Module Objects
--------------
Module objects will now have a ``__spec__`` attribute to which the
module's spec will be bound. None of the other import-related module
attributes will be changed or deprecated, though some of them could be;
any such deprecation can wait until Python 4.
``ModuleSpec`` objects will not be kept in sync with the corresponding
module object's import-related attributes. Though they may differ, in
practice they will typically be the same.

Worth mentioning that __main__.__spec__.name will give the real name
of module's executed with -m here rather than delaying that until the
notes at the end.

Post by Eric Snow
Finders
-------
Finders will now return ModuleSpec objects when ``find_module()`` is
called rather than loaders. For backward compatility, ``Modulespec``
objects proxy the attributes of their ``loader`` attribute.
Adding another similar method to avoid backward-compatibility issues
is undersireable if avoidable. The import APIs have suffered enough,
especially considering ``PathEntryFinder.find_loader()`` was just
added in Python 3.3. The approach taken by this PEP should be
sufficient to address backward-compatibility issues for
``find_module()``.
The change to ``find_module()`` applies to both ``MetaPathFinder`` and
``PathEntryFinder``. ``PathEntryFinder.find_loader()`` will be
deprecated and, for backward compatibility, implicitly special-cased if
the method exists on a finder.

Actually, we don't currently have anything on ModuleSpec to indicate
"this is complete, stop scanning for more path fragments" or how we
will compose multiple module specs for the individual fragments into a
combined spec for the namespace package.

Post by Eric Snow
Finders are still responsible for creating the loader. That loader will
now be stored in the module spec returned by ``find_module()`` rather
than returned directly. As is currently the case without the PEP, if a
loader would be costly to create, that loader can be designed to defer
the cost until later.
Loaders
-------
Loaders will have a new method, ``exec_module(module)``. Its only job
is to "exec" the module and consequently populate the module's
namespace. It is not responsible for creating or preparing the module
object, nor for any cleanup afterward. It has no return value.
The ``load_module()`` of loaders will still work and be an active part
of the loader API. It is still useful for cases where the default
module creation/prepartion/cleanup is not appropriate for the loader.
For example, the C API for extension modules only supports the full
control of ``load_module()``. As such, ``ExtensionFileLoader`` will not
implement ``exec_module()``. In the future it may be appropriate to
produce a second C API that would support an ``exec_module()``
implementation for ``ExtensionFileLoader``. Such a change is outside
the scope of this PEP.

As above, I think it may worth tackling this. It shouldn't be *that*
hard given the higher level changes and will solve some hard problems
at the lower level.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia

Eric Snow

2013-08-13 03:35:14 UTC

Permalink

Post by Nick Coghlan
I think this is solid enough to be worth adding to the PEPs repo now.

Sounds good.

Post by Nick Coghlan

Post by Eric Snow
Here's an updated version of the PEP for ModuleSpec which addresses the
feedback I've gotten. Thanks for the help. The big open question, to

me,

Post by Eric Snow
is whether or not to have a separate reload() method. I'll be looking

into

Post by Eric Snow
that when I get a chance. There's also the question of a path-based
subclass, but I'm currently not convinced it's worth it.

Sounds good. I expect you mean as a separate proposal...

Post by Nick Coghlan
Also, to handle the extension module case, we may need to let loaders
define an optional "create_module" method that accepts the MethodSpec
object as an argument.

I'd considered that here, whether on the loader or on ModuleSpec. My plan
was to hold off on that to stay focused on the rest of the changes.
However, I'm open to adding this to the PEP.

Post by Nick Coghlan

Post by Eric Snow
A High-Level View
-----------------
...

Not sure a high level view is needed, but you can fill this in if you want :)

Forgot that was in there. :)

Post by Nick Coghlan

I see what you're saying, but am conflicted. For some reason providing a
sub-namespace for that doesn't seem quite right. However, the alternative
runs the risk of collisions later on. Maybe we could recommend the use of
a preceding "_" for custom attributes? I'll see if I can come up with
something.

Post by Nick Coghlan

Post by Eric Snow
The parameters have the same meaning as the attributes described below.
However, not all ``ModuleSpec`` attributes are also parameters.
The
passed values are set as-is. For calculated values use the
``from_loader()`` method.

"Passed in parameter values are assigned directly to the corresponding
attributes below. Other attributes not listed as parameters (such as
``package``) are read-only properties that are automatically derived
from these values.
The ``ModuleSpec.from_loader()`` class method allows a suitable
ModuleSpec instance to be easily created from a PEP 302 loader object"

That's much better.

Post by Nick Coghlan

Post by Eric Snow
While ``package`` and ``is_package`` are read-only properties, the
remaining attributes can be replaced after the module spec is created
and after import is complete. This allows for unusual cases where
modifying the spec is the best option. However, typical use should not
involve changing the state of a module's spec.

I'm with Brett that "is_package" should go, to be replaced by
"spec.path is not None" wherever it matters. is_package() would then
fall through to the PEP 302 loader API via __getattr__.

I'm considering the recommendation, but I still feel like `is_package` as
an attribute is worth having. I see module.__spec__ as useful to more than
the import system and its hackers, and `is_package` as a value to the
broader audience that may not have learned about what __path__ means. It's
certainly not obvious that __path__ implies a package. Then again, a
person would have to be looking at __spec__ to see `is_package`, so maybe
it loses enough utility to be worth keeping.

Post by Nick Coghlan
``origin``

Post by Eric Snow
A string for the location from which the module originates. If
``filename`` is set, ``origin`` should be set to the same value unless
some other value is more appropriate. ``origin`` is used in
``module_repr()`` if it does not match the value of ``filename``.
Using ``filename`` for this meaning would be inaccurate, since not all
modules have path-based locations. For instance, built-in modules do
not have ``__file__`` set. Yet it is useful to have a descriptive
string indicating that it originated from the interpreter as a built-in
module. So built-in modules will have ``origin`` set to ``"built-in"``.

How about we *just* have origin, with a separate "set_fileattr"
attribute to indicate "this is a discrete file, you should set
__file__"?

I like that. I'll see how it works. There doesn't seem to be any reason
why you would have two distinct strings for origin and filename. In fact,
that's kind of smelly.

However, I wonder if this is where a PathModuleSpec subclass would be
meaningful. Then no flag would be necessary.

Post by Nick Coghlan
Also, we should explicitly note that we'll still set __file__ for zip
imports, due to backwards compatibility concerns, even though it
doesn't correspond to a valid filesystem path.

Hmm. So deprecate the use of __file__ for anything but actual file names?
Interesting. I was planning on just leaving the current meaning of
"location relative to a path entry".

Post by Nick Coghlan
(Random thought: spec.origin + spec.cached + a cache directory setting
in zipimport would give a potentially clean way to do extension module
imports from zip archives)

That would be cool.

Post by Nick Coghlan

Post by Eric Snow
``path``
The list of path entries in which to search for submodules if this
module is a package. Otherwise it is ``None``.

Path entries don't have to correspond to filesystem locations - they
just have to make sense to at least one path hook
(e.g. a DB URI would be a valid path entry).

Right. I didn't mean to imply that they do.

Post by Nick Coghlan

Post by Eric Snow
.. XXX add a path-based subclass?

Nope :)

I keep vacillating on this.

Post by Nick Coghlan

Post by Eric Snow
ModuleSpec Methods
------------------
``from_loader(name, loader, *, is_package=None, origin=None,

filename=None,

Post by Eric Snow
cached=None, path=None)``
.. XXX use a different name?

What do you mean by disallow customization? Make it "private"?
`from_loader()` is intended for exactly the use that you described.

Post by Nick Coghlan

I'll think about this. There was some value in it before, but with changes
to other signatures, `from_loader()` is much less useful as a separate
factory method.

Post by Nick Coghlan
A separate "from_module(m)" constructor would probably make sense, though.

I have this for internal use in the implementation, but did not expose it
since all modules should already have a spec.

Post by Nick Coghlan
``module_repr()``

Post by Eric Snow
Returns a repr string for the module if ``origin`` is set and
``filename`` is not set. The string refers to the value of ``origin``.
Otherwise ``module_repr()`` returns None. This indicates to the module
type's ``__repr__()`` that it should fall back to the default repr.
We could also have ``module_repr()`` produce the repr for the case where
``filename`` is set or where ``origin`` is not set, mirroring the repr
that the module type produces directly. However, the repr string is
derived from the import-related module attributes, which might be out of
sync with the spec.
.. XXX Is using the spec close enough? Probably not.

That's the approach that I took at first, but the module that is passed in
is not guaranteed to be a spec. Furthermore, having the spec take
precedence over the module's attrs for the repr seems like too big a
backward-compatibility risk.

Post by Nick Coghlan

Experience tells us that the import system should ensure the __spec__
attribute always exists (even if it has to be filled in from the
module attributes after calling load_module)

That's a good point. The only possible problem is for someone that creates
their own module object and expects repr to work the same as it does
currently.

Post by Nick Coghlan
``load(module=None, *, is_reload=False)``
Yep, definitely needs to be a separate method. "is_reload" would
almost always be set to a boolean, which means a separate API is
likely to be better.

Agreed.

Post by Nick Coghlan
However, I think the separate method should be "exec()" rather than
"reload()" and require that the module always be passed in.

I'll see how that looks. It seems like a better fit than just plain
`reload()`.

We could also expose a "create" method that just creates and returns

Post by Nick Coghlan
the new module object, and replace importlib.util.module_to_load with
a context manager that accepted the module as a parameter. Say
"add_to_sys", which fails if the module is already present in
sys.modules.

One of the points of ModuleSpec is to remove the need for
`module_to_load()`. I'm not convinced of the utility of a create method
like you've described other than possibly as something internal to
ModuleSpec.

Post by Nick Coghlan
m = self.create()
self.exec(m)
return sys.modules[self.name]
self.exec(sys.modules[self.name])
return sys.modules[self.name]

We may need to do subclasses for the ABC registration backwards
compatibility hack.

I was thinking of registering ModuleSpec in the setter of a `loader

Post by Nick Coghlan

Worth mentioning that __main__.__spec__.name will give the real name
of module's executed with -m here rather than delaying that until the
notes at the end.

As above, I think it may worth tackling this. It shouldn't be *that*
hard given the higher level changes and will solve some hard problems
at the lower level.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130812/eee2a8d2/attachment-0001.html>

Eric Snow

2013-08-13 03:47:27 UTC

Permalink

Accidently sent. :P

Continuing...