Eric Snow
2013-09-18 09:51:22 UTC
Hi all,
I finally got some time to update the PEP. I've simplified a few things,
most notably by making the 4 ModuleSpec methods (create, exec, load,
reload) "private".
Also notable is that the new loader method is still create_module() and
there is still no flag for is_reload on either of the loader methods. I'm
still not clear on what the flag buys us and on why anything we'd do in a
prepare_module() we couldn't do in exec_module(). I'm trying to keep this
simple. :)
Anyway, I still need to take some time to clean up the PEP formatting and
run a spell checker. I probably also missed some artifact of an older
version of the API. Otherwise I think it's in a good spot. Comments
welcome.
-eric
p.s. I also plan on getting the implementation up one of these days. :P
===============================================================
PEP: 451
Title: A ModuleSpec Type for the Import System
Version: $Revision$
Last-Modified: $Date$
Author: Eric Snow <ericsnowcurrently at gmail.com>
Discussions-To: import-sig at python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 8-Aug-2013
Python-Version: 3.4
Post-History: 8-Aug-2013, 28-Aug-2013, 18-Sep-2013
Resolution:
Abstract
========
This PEP proposes to add a new class to ``importlib.machinery`` called
``ModuleSpec``. It will be authoritative for all the import-related
information about a module, and will be available without needing to
load the module first. Finders will directly provide a module's spec
instead of a loader (which they will continue to provide indirectly).
The import machinery will be adjusted to take advantage of module specs,
including using them to load modules.
Motivation
==========
The import system has evolved over the lifetime of Python. In late 2002
PEP 302 introduced standardized import hooks via ``finders`` and
``loaders`` and ``sys.meta_path``. The ``importlib`` module, introduced
with Python 3.1, now exposes a pure Python implementation of the APIs
described by PEP 302, as well as of the full import system. It is now
much easier to understand and extend the import system. While a benefit
to the Python community, this greater accessibilty also presents a
challenge.
As more developers come to understand and customize the import system,
any weaknesses in the finder and loader APIs will be more impactful. So
the sooner we can address any such weaknesses the import system, the
better...and there are a couple we can take care of with this proposal.
Firstly, any time the import system needs to save information about a
module we end up with more attributes on module objects that are
generally only meaningful to the import system. It would be nice to
have a per-module namespace in which to put future import-related
information and to pass around within the import system. Secondly,
there's an API void between finders and loaders that causes undue
complexity when encountered.
Currently finders are strictly responsible for providing the loader,
through their find_module() method, which the import system will use to
load the module. The loader is then responsible for doing some checks,
creating the module object, setting import-related attributes,
"installing" the module to ``sys.modules``, and loading the module,
along with some cleanup. This all takes place during the import
system's call to ``Loader.load_module()``. Loaders also provide some
APIs for accessing data associated with a module.
Loaders are not required to provide any of the functionality of
``load_module()`` through other methods. Thus, though the import-
related information about a module is likely available without loading
the module, it is not otherwise exposed.
Furthermore, the requirements assocated with ``load_module()`` are
common to all loaders and mostly are implemented in exactly the same
way. This means every loader has to duplicate the same boilerplate
code. ``importlib.util`` provides some tools that help with this, but
it would be more helpful if the import system simply took charge of
these responsibilities. The trouble is that this would limit the degree
of customization that ``load_module()`` facilitates. This is a gap
between finders and loaders which this proposal aims to fill.
Finally, when the import system calls a finder's ``find_module()``, the
finder makes use of a variety of information about the module that is
useful outside the context of the method. Currently the options are
limited for persisting that per-module information past the method call,
since it only returns the loader. Popular options for this limitation
are to store the information in a module-to-info mapping somewhere on
the finder itself, or store it on the loader.
Unfortunately, loaders are not required to be module-specific. On top
of that, some of the useful information finders could provide is
common to all finders, so ideally the import system could take care of
those details. This is the same gap as before between finders and
loaders.
As an example of complexity attributable to this flaw, the
implementation of namespace packages in Python 3.3 (see PEP 420) added
``FileFinder.find_loader()`` because there was no good way for
``find_module()`` to provide the namespace search locations.
The answer to this gap is a ``ModuleSpec`` object that contains the
per-module information and takes care of the boilerplate functionality
involved with loading the module.
(The idea gained momentum during discussions related to another PEP.[1])
Specification
=============
The goal is to address the gap between finders and loaders while
changing as little of their semantics as possible. Though some
functionality and information is moved to the new ``ModuleSpec`` type,
their behavior should remain the same. However, for the sake of clarity
the finder and loader semantics will be explicitly identified.
This is a high-level summary of the changes described by this PEP. More
detail is available in later sections.
importlib.machinery.ModuleSpec (new)
------------------------------------
A specification for a module's import-system-related state.
* ModuleSpec(name, loader, \*, origin=None, loading_info=None,
is_package=None)
Attributes:
* name - a string for the name of the module.
* loader - the loader to use for loading and for module data.
* origin - a string for the location from which the module is loaded,
e.g. "builtin" for built-in modules and the filename for modules
loaded from source.
* submodule_search_locations - strings for where to find submodules,
if a package.
* loading_info - a container of extra data for use during loading.
* cached (property) - a string for where the compiled module will be
stored (see PEP 3147).
* package (RO-property) - the name of the module's parent (or None).
* has_location (RO-property) - the module's origin refers to a location.
Instance Methods:
* module_repr() - provide a repr string for the spec'ed module.
* init_module_attrs(module) - set any of a module's import-related
attributes that aren't already set.
importlib.util Additions
------------------------
* spec_from_file_location(name, location, \*, loader=None,
submodule_search_locations=None)
- factory for file-based module specs.
* from_loader(name, loader, \*, origin=None, is_package=None) - factory
based on information provided by loaders.
* spec_from_module(module, loader=None) - factory based on existing
import-related module attributes. This function is expected to be
used only in some backward-compatibility situations.
Other API Additions
-------------------
* importlib.abc.Loader.exec_module(module) will execute a module in its
own namespace. It replaces ``importlib.abc.Loader.load_module()``.
* importlib.abc.Loader.create_module(spec) (optional) will return a new
module to use for loading.
* Module objects will have a new attribute: ``__spec__``.
* importlib.find_spec(name, path=None) will return the spec for a
module.
exec_module() and create_module() should not set any import-related
module attributes. The fact that load_module() does is a design flaw
that this proposal aims to correct.
API Changes
-----------
* ``InspectLoader.is_package()`` will become optional.
Deprecations
------------
* importlib.abc.MetaPathFinder.find_module()
* importlib.abc.PathEntryFinder.find_module()
* importlib.abc.PathEntryFinder.find_loader()
* importlib.abc.Loader.load_module()
* importlib.abc.Loader.module_repr()
* The parameters and attributes of the various loaders in
importlib.machinery
* importlib.util.set_package()
* importlib.util.set_loader()
* importlib.find_loader()
Removals
--------
These were introduced prior to Python 3.4's release.
* importlib.abc.Loader.init_module_attrs()
* importlib.util.module_to_load()
Other Changes
-------------
* The import system implementation in importlib will be changed to make
use of ModuleSpec.
* Import-related module attributes (other than ``__spec__``) will no
longer be used directly by the import system.
* Import-related attributes should no longer be added to modules
directly.
* The module type's ``__repr__()`` will be thin wrapper around a pure
Python implementation which will leverage ModuleSpec.
* The spec for the ``__main__`` module will reflect the appropriate
name and origin.
Backward-Compatibility
----------------------
* If a finder does not define find_spec(), a spec is derived from
the loader returned by find_module().
* PathEntryFinder.find_loader() still takes priority over
find_module().
* Loader.load_module() is used if exec_module() is not defined.
What Will not Change?
---------------------
* The syntax and semantics of the import statement.
* Existing finders and loaders will continue to work normally.
* The import-related module attributes will still be initialized with
the same information.
* Finders will still create loaders (now storing them in specs).
* Loader.load_module(), if a module defines it, will have all the
same requirements and may still be called directly.
* Loaders will still be responsible for module data APIs.
* importlib.reload() will still overwrite the import-related attributes.
What Will Existing Finders and Loaders Have to Do Differently?
==============================================================
Immediately? Nothing. The status quo will be deprecated, but will
continue working. However, here are the things that the authors of
finders and loaders should change relative to this PEP:
* Implement ``find_spec()`` on finders.
* Implement ``exec_module()`` on loaders, if possible.
The ModuleSpec factory functions in importlib.util are intended to be
helpful for converting existing finders. ``from_loader()`` and
``from_file_location()`` are both straight-forward utilities in this
regard. In the case where loaders already expose methods for creating
and preparing modules, ``ModuleSpec.from_module()`` may be useful to
the corresponding finder.
For existing loaders, exec_module() should be a relatively direct
conversion from the non-boilerplate portion of load_module(). In some
uncommon cases the loader should also implement create_module().
ModuleSpec Users
================
``ModuleSpec`` objects has 3 distinct target audiences: Python itself,
import hooks, and normal Python users.
Python will use specs in the import machinery, in interpreter startup,
and in various standard library modules. Some modules are
import-oriented, like pkgutil, and others are not, like pickle and
pydoc. In all cases, the full ``ModuleSpec`` API will get used.
Import hooks (finders and loaders) will make use of the spec in specific
ways. First of all, finders may use the spec factory functions in
importlib.util to create spec objects. They may also directly adjust
the spec attributes after the spec is created. Secondly, the finder may
bind additional information to the spec (in finder_extras) for the
loader to consume during module creation/execution. Finally, loaders
will make use of the attributes on a spec when creating and/or executing
a module.
Python users will be able to inspect a module's ``__spec__`` to get
import-related information about the object. Generally, Python
applications and interactive users will not be using the ``ModuleSpec``
factory functions nor any the instance methods.
How Loading Will Work
=====================
This is an outline of what happens in ModuleSpec's loading
functionality::
def load(spec):
if not hasattr(spec.loader, 'exec_module'):
module = spec.loader.load_module(spec.name)
spec.init_module_attrs(module)
return sys.modules[spec.name]
module = None
if hasattr(spec.loader, 'create_module'):
module = spec.loader.create_module(spec)
if module is None:
module = ModuleType(spec.name)
spec.init_module_attrs(module)
spec._initializing = True
sys.modues[spec.name] = module
try:
spec.loader.exec_module(module)
except Exception:
del sys.modules[spec.name]
finally:
spec._initializing = False
return sys.modules[spec.name]
These steps are exactly what ``Loader.load_module()`` is already
expected to do. Loaders will thus be simplified since they will only
need to implement exec_module().
Note that we must return the module from sys.modules. During loading
the module may have replaced itself in sys.modules. Since we don't have
a post-import hook API to accommodate the use case, we have to deal with
it. However, in the replacement case we do not worry about setting the
import-related module attributes on the object. The module writer is on
their own if they are doing this.
ModuleSpec
==========
Attributes
----------
Each of the following names is an attribute on ModuleSpec objects. A
value of ``None`` indicates "not set". This contrasts with module
objects where the attribute simply doesn't exist. Most of the
attributes correspond to the import-related attributes of modules. Here
is the mapping. The reverse of this mapping is used by
ModuleSpec.init_module_attrs().
========================== ==============
On ModuleSpec On Modules
========================== ==============
name __name__
loader __loader__
package __package__
origin __file__*
cached __cached__*,**
submodule_search_locations __path__**
loading_info \-
has_location \-
========================== ==============
\* Set only if has_location is true.
\*\* Set only if the spec attribute is not None.
While package and has_location are read-only properties, the remaining
attributes can be replaced after the module spec is created and even
after import is complete. This allows for unusual cases where directly
modifying the spec is the best option. However, typical use should not
involve changing the state of a module's spec.
**origin**
origin is a string for the place from which the module originates.
Aside from the informational value, it is also used in module_repr().
The module attribute ``__file__`` has a similar but more restricted
meaning. Not all modules have it set (e.g. built-in modules). However,
``origin`` is applicable to all modules. For built-in modules it would
be set to "built-in".
**has_location**
Some modules can be loaded by reference to a location, e.g. a filesystem
path or a URL or something of the sort. Having the location lets you
load the module, but in theory you could load that module under various
names.
In contrast, non-located modules can't be loaded in this fashion, e.g.
builtin modules and modules dynamically created in code. For these, the
name is the only way to access them, so they have an "origin" but not a
"location".
This attribute reflects whether or not the module is locatable. If it
is, origin must be set to the module's location and ``__file__`` will be
set on the module. Not all locatable modules will be cachable, but most
will.
The corresponding module attribute name, ``__file__``, is somewhat
inaccurate and potentially confusion, so we will use a more explicit
combination of origin and has_location to represent the same
information. Having a separate filename is unncessary since we have
origin.
**submodule_search_locations**
The list of location strings, typically directory paths, in which to
search for submodules. If the module is a package this will be set to
a list (even an empty one). Otherwise it is ``None``.
The corresponding module attribute's name, ``__path__``, is relatively
ambiguous. Instead of mirroring it, we use a more explicit name that
makes the purpose clear.
**loading_info**
A finder may set loading_info to any value to provide additional
data for the loader to use during loading. A value of None is the
default and indicates that there is no additional data. Otherwise it
can be set to any object, such as a dict, list, or
types.SimpleNamespace, containing the relevant extra information.
For example, zipimporter could use it to pass the zip archive name
to the loader directly, rather than needing to derive it from origin
or create a custom loader for each find operation.
loading_info is meant for use by the finder and corresponding loader.
It is not guaranteed to be a stable resource for any other use.
Omitted Attributes and Methods
------------------------------
The following ModuleSpec methods are not part of the public API since
it is easy to use them incorrectly and only the import system really
needs them (i.e. they would be an attractive nuisance).
* create() - provide a new module to use for loading.
* exec(module) - execute the spec into a module namespace.
* load() - prepare a module and execute it in a protected way.
* reload(module) - re-execute a module in a protected way.
Here are other omissions:
There is no PathModuleSpec subclass of ModuleSpec that separates out
has_location, cached, and submodule_search_locations. While that might
make the separation cleaner, module objects don't have that distinction.
ModuleSpec will support both cases equally well.
While is_package would be a simple additional attribute (aliasing
``self.submodule_search_locations is not None``), it perpetuates the
artificial (and mostly erroneous) distinction between modules and
packages.
Conceivably, a ModuleSpec.load() method could optionally take a list of
modules with which to interact instead of sys.modules. That
capability is left out of this PEP, but may be pursued separately at
some other time, including relative to PEP 406 (import engine).
Likewise load() could be leveraged to implement multi-version
imports. While interesting, doing so is outside the scope of this
proposal.
Others:
* Add ModuleSpec.submodules (RO-property) - returns possible submodules
relative to the spec.
* Add ModuleSpec.loaded (RO-property) - the module in sys.module, if
any.
* Add ModuleSpec.data - a descriptor that wraps the data API of the
spec's loader.
* Also see [3].
Backward Compatibility
----------------------
ModuleSpec doesn't have any. This would be a different story if
Finder.find_module() were to return a module spec instead of loader.
In that case, specs would have to act like the loader that would have
been returned instead. Doing so would be relatively simple, but is an
unnecessary complication. It was part of earlier versions of this PEP.
Subclassing
-----------
Subclasses of ModuleSpec are allowed, but should not be necessary.
Simply setting loading_info or adding functionality to a custom
finder or loader will likely be a better fit and should be tried first.
However, as long as a subclass still fulfills the requirements of the
import system, objects of that type are completely fine as the return
value of Finder.find_spec().
Existing Types
==============
Module Objects
--------------
Other than adding ``__spec__``, none of the import-related module
attributes will be changed or deprecated, though some of them could be;
any such deprecation can wait until Python 4.
A module's spec will not be kept in sync with the corresponding import-
related attributes. Though they may differ, in practice they will
typically be the same.
One notable exception is that case where a module is run as a script by
using the ``-m`` flag. In that case ``module.__spec__.name`` will
reflect the actual module name while ``module.__name__`` will be
``__main__``.
Notably, the spec for each module instance will be unique to that
instance even if the information is identical to that of another spec.
This won't happen in general.
Finders
-------
Finders are still responsible for creating the loader. That loader will
now be stored in the module spec returned by ``find_spec()`` rather
than returned directly. As is currently the case without the PEP, if a
loader would be costly to create, that loader can be designed to defer
the cost until later.
**MetaPathFinder.find_spec(name, path=None)**
**PathEntryFinder.find_spec(name)**
Finders will return ModuleSpec objects when ``find_spec()`` is
called. This new method replaces ``find_module()`` and
``find_loader()`` (in the ``PathEntryFinder`` case). If a loader does
not have ``find_spec()``, ``find_module()`` and ``find_loader()`` are
used instead, for backward-compatibility.
Adding yet another similar method to loaders is a case of practicality.
``find_module()`` could be changed to return specs instead of loaders.
This is tempting because the import APIs have suffered enough,
especially considering ``PathEntryFinder.find_loader()`` was just
added in Python 3.3. However, the extra complexity and a less-than-
explicit method name aren't worth it.
Loaders
-------
**Loader.exec_module(module)**
Loaders will have a new method, exec_module(). Its only job
is to "exec" the module and consequently populate the module's
namespace. It is not responsible for creating or preparing the module
object, nor for any cleanup afterward. It has no return value.
exec_module() should properly handle the case where it is called more
than once. For some kinds of modules this may mean raising ImportError
every time after the first time the method is called. This is
particularly relevant for reloading, where some kinds of modules do not
support in-place reloading.
**Loader.create_module(spec)**
Loaders may also implement create_module() that will return a
new module to exec. It may return None to indicate that the default
module creation code should be used. One use case for create_module()
is to provide a module that is a subclass of the builtin module type.
Most loaders will not need to implement create_module(),
create_module() should properly handle the case where it is called more
than once for the same spec/module. This may include returning None or
raising ImportError.
Other changes:
PEP 420 introduced the optional ``module_repr()`` loader method to limit
the amount of special-casing in the module type's ``__repr__()``. Since
this method is part of ``ModuleSpec``, it will be deprecated on loaders.
However, if it exists on a loader it will be used exclusively.
``Loader.init_module_attr()`` method, added prior to Python 3.4's
release , will be removed in favor of the same method on ``ModuleSpec``.
However, ``InspectLoader.is_package()`` will not be deprecated even
though the same information is found on ``ModuleSpec``. ``ModuleSpec``
can use it to populate its own ``is_package`` if that information is
not otherwise available. Still, it will be made optional.
One consequence of ModuleSpec is that loader ``__init__`` methods will
no longer need to accommodate per-module state. The path-based loaders
in ``importlib`` take arguments in their ``__init__()`` and have
corresponding attributes. However, the need for those values is
eliminated by module specs.
In addition to executing a module during loading, loaders will still be
directly responsible for providing APIs concerning module-related data.
Other Changes
=============
* The various finders and loaders provided by importlib will be
updated to comply with this proposal.
* The spec for the ``__main__`` module will reflect how the interpreter
was started. For instance, with ``-m`` the spec's name will be that
of the run module, while ``__main__.__name__`` will still be
"__main__".
* We add ``importlib.find_spec()`` to mirror
``importlib.find_loader()`` (which becomes deprecated).
* ``importlib.reload()`` is changed to use ``ModuleSpec.load()``.
* ``importlib.reload()`` will now make use of the per-module import
lock.
Reference Implementation
========================
A reference implementation will be available at
http://bugs.python.org/issue18864.
Open Issues
==============
\* The impact of this change on pkgutil (and setuptools) needs looking
into. It has some generic function-based extensions to PEP 302. These
may break if importlib starts wrapping loaders without the tools'
knowledge.
\* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc,
inspect.
For instance, pickle should be updated in the __main__ case to look at
``module.__spec__.name``.
\* Impact on some kinds of lazy loading modules. See [3].
\* Find a better name than loading_info? Perhaps loading_data,
loader_state, or loader_info.
\* Change loader.create_module() to prepare_module()?
\* Add more explicit reloading support to exec_module() (and
prepare_module())?
References
==========
[1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html
[2] https://mail.python.org/pipermail/import-sig/2013-September/000735.html
[3] https://mail.python.org/pipermail/python-dev/2013-August/128129.html
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130918/0945f92c/attachment-0001.html>
I finally got some time to update the PEP. I've simplified a few things,
most notably by making the 4 ModuleSpec methods (create, exec, load,
reload) "private".
Also notable is that the new loader method is still create_module() and
there is still no flag for is_reload on either of the loader methods. I'm
still not clear on what the flag buys us and on why anything we'd do in a
prepare_module() we couldn't do in exec_module(). I'm trying to keep this
simple. :)
Anyway, I still need to take some time to clean up the PEP formatting and
run a spell checker. I probably also missed some artifact of an older
version of the API. Otherwise I think it's in a good spot. Comments
welcome.
-eric
p.s. I also plan on getting the implementation up one of these days. :P
===============================================================
PEP: 451
Title: A ModuleSpec Type for the Import System
Version: $Revision$
Last-Modified: $Date$
Author: Eric Snow <ericsnowcurrently at gmail.com>
Discussions-To: import-sig at python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 8-Aug-2013
Python-Version: 3.4
Post-History: 8-Aug-2013, 28-Aug-2013, 18-Sep-2013
Resolution:
Abstract
========
This PEP proposes to add a new class to ``importlib.machinery`` called
``ModuleSpec``. It will be authoritative for all the import-related
information about a module, and will be available without needing to
load the module first. Finders will directly provide a module's spec
instead of a loader (which they will continue to provide indirectly).
The import machinery will be adjusted to take advantage of module specs,
including using them to load modules.
Motivation
==========
The import system has evolved over the lifetime of Python. In late 2002
PEP 302 introduced standardized import hooks via ``finders`` and
``loaders`` and ``sys.meta_path``. The ``importlib`` module, introduced
with Python 3.1, now exposes a pure Python implementation of the APIs
described by PEP 302, as well as of the full import system. It is now
much easier to understand and extend the import system. While a benefit
to the Python community, this greater accessibilty also presents a
challenge.
As more developers come to understand and customize the import system,
any weaknesses in the finder and loader APIs will be more impactful. So
the sooner we can address any such weaknesses the import system, the
better...and there are a couple we can take care of with this proposal.
Firstly, any time the import system needs to save information about a
module we end up with more attributes on module objects that are
generally only meaningful to the import system. It would be nice to
have a per-module namespace in which to put future import-related
information and to pass around within the import system. Secondly,
there's an API void between finders and loaders that causes undue
complexity when encountered.
Currently finders are strictly responsible for providing the loader,
through their find_module() method, which the import system will use to
load the module. The loader is then responsible for doing some checks,
creating the module object, setting import-related attributes,
"installing" the module to ``sys.modules``, and loading the module,
along with some cleanup. This all takes place during the import
system's call to ``Loader.load_module()``. Loaders also provide some
APIs for accessing data associated with a module.
Loaders are not required to provide any of the functionality of
``load_module()`` through other methods. Thus, though the import-
related information about a module is likely available without loading
the module, it is not otherwise exposed.
Furthermore, the requirements assocated with ``load_module()`` are
common to all loaders and mostly are implemented in exactly the same
way. This means every loader has to duplicate the same boilerplate
code. ``importlib.util`` provides some tools that help with this, but
it would be more helpful if the import system simply took charge of
these responsibilities. The trouble is that this would limit the degree
of customization that ``load_module()`` facilitates. This is a gap
between finders and loaders which this proposal aims to fill.
Finally, when the import system calls a finder's ``find_module()``, the
finder makes use of a variety of information about the module that is
useful outside the context of the method. Currently the options are
limited for persisting that per-module information past the method call,
since it only returns the loader. Popular options for this limitation
are to store the information in a module-to-info mapping somewhere on
the finder itself, or store it on the loader.
Unfortunately, loaders are not required to be module-specific. On top
of that, some of the useful information finders could provide is
common to all finders, so ideally the import system could take care of
those details. This is the same gap as before between finders and
loaders.
As an example of complexity attributable to this flaw, the
implementation of namespace packages in Python 3.3 (see PEP 420) added
``FileFinder.find_loader()`` because there was no good way for
``find_module()`` to provide the namespace search locations.
The answer to this gap is a ``ModuleSpec`` object that contains the
per-module information and takes care of the boilerplate functionality
involved with loading the module.
(The idea gained momentum during discussions related to another PEP.[1])
Specification
=============
The goal is to address the gap between finders and loaders while
changing as little of their semantics as possible. Though some
functionality and information is moved to the new ``ModuleSpec`` type,
their behavior should remain the same. However, for the sake of clarity
the finder and loader semantics will be explicitly identified.
This is a high-level summary of the changes described by this PEP. More
detail is available in later sections.
importlib.machinery.ModuleSpec (new)
------------------------------------
A specification for a module's import-system-related state.
* ModuleSpec(name, loader, \*, origin=None, loading_info=None,
is_package=None)
Attributes:
* name - a string for the name of the module.
* loader - the loader to use for loading and for module data.
* origin - a string for the location from which the module is loaded,
e.g. "builtin" for built-in modules and the filename for modules
loaded from source.
* submodule_search_locations - strings for where to find submodules,
if a package.
* loading_info - a container of extra data for use during loading.
* cached (property) - a string for where the compiled module will be
stored (see PEP 3147).
* package (RO-property) - the name of the module's parent (or None).
* has_location (RO-property) - the module's origin refers to a location.
Instance Methods:
* module_repr() - provide a repr string for the spec'ed module.
* init_module_attrs(module) - set any of a module's import-related
attributes that aren't already set.
importlib.util Additions
------------------------
* spec_from_file_location(name, location, \*, loader=None,
submodule_search_locations=None)
- factory for file-based module specs.
* from_loader(name, loader, \*, origin=None, is_package=None) - factory
based on information provided by loaders.
* spec_from_module(module, loader=None) - factory based on existing
import-related module attributes. This function is expected to be
used only in some backward-compatibility situations.
Other API Additions
-------------------
* importlib.abc.Loader.exec_module(module) will execute a module in its
own namespace. It replaces ``importlib.abc.Loader.load_module()``.
* importlib.abc.Loader.create_module(spec) (optional) will return a new
module to use for loading.
* Module objects will have a new attribute: ``__spec__``.
* importlib.find_spec(name, path=None) will return the spec for a
module.
exec_module() and create_module() should not set any import-related
module attributes. The fact that load_module() does is a design flaw
that this proposal aims to correct.
API Changes
-----------
* ``InspectLoader.is_package()`` will become optional.
Deprecations
------------
* importlib.abc.MetaPathFinder.find_module()
* importlib.abc.PathEntryFinder.find_module()
* importlib.abc.PathEntryFinder.find_loader()
* importlib.abc.Loader.load_module()
* importlib.abc.Loader.module_repr()
* The parameters and attributes of the various loaders in
importlib.machinery
* importlib.util.set_package()
* importlib.util.set_loader()
* importlib.find_loader()
Removals
--------
These were introduced prior to Python 3.4's release.
* importlib.abc.Loader.init_module_attrs()
* importlib.util.module_to_load()
Other Changes
-------------
* The import system implementation in importlib will be changed to make
use of ModuleSpec.
* Import-related module attributes (other than ``__spec__``) will no
longer be used directly by the import system.
* Import-related attributes should no longer be added to modules
directly.
* The module type's ``__repr__()`` will be thin wrapper around a pure
Python implementation which will leverage ModuleSpec.
* The spec for the ``__main__`` module will reflect the appropriate
name and origin.
Backward-Compatibility
----------------------
* If a finder does not define find_spec(), a spec is derived from
the loader returned by find_module().
* PathEntryFinder.find_loader() still takes priority over
find_module().
* Loader.load_module() is used if exec_module() is not defined.
What Will not Change?
---------------------
* The syntax and semantics of the import statement.
* Existing finders and loaders will continue to work normally.
* The import-related module attributes will still be initialized with
the same information.
* Finders will still create loaders (now storing them in specs).
* Loader.load_module(), if a module defines it, will have all the
same requirements and may still be called directly.
* Loaders will still be responsible for module data APIs.
* importlib.reload() will still overwrite the import-related attributes.
What Will Existing Finders and Loaders Have to Do Differently?
==============================================================
Immediately? Nothing. The status quo will be deprecated, but will
continue working. However, here are the things that the authors of
finders and loaders should change relative to this PEP:
* Implement ``find_spec()`` on finders.
* Implement ``exec_module()`` on loaders, if possible.
The ModuleSpec factory functions in importlib.util are intended to be
helpful for converting existing finders. ``from_loader()`` and
``from_file_location()`` are both straight-forward utilities in this
regard. In the case where loaders already expose methods for creating
and preparing modules, ``ModuleSpec.from_module()`` may be useful to
the corresponding finder.
For existing loaders, exec_module() should be a relatively direct
conversion from the non-boilerplate portion of load_module(). In some
uncommon cases the loader should also implement create_module().
ModuleSpec Users
================
``ModuleSpec`` objects has 3 distinct target audiences: Python itself,
import hooks, and normal Python users.
Python will use specs in the import machinery, in interpreter startup,
and in various standard library modules. Some modules are
import-oriented, like pkgutil, and others are not, like pickle and
pydoc. In all cases, the full ``ModuleSpec`` API will get used.
Import hooks (finders and loaders) will make use of the spec in specific
ways. First of all, finders may use the spec factory functions in
importlib.util to create spec objects. They may also directly adjust
the spec attributes after the spec is created. Secondly, the finder may
bind additional information to the spec (in finder_extras) for the
loader to consume during module creation/execution. Finally, loaders
will make use of the attributes on a spec when creating and/or executing
a module.
Python users will be able to inspect a module's ``__spec__`` to get
import-related information about the object. Generally, Python
applications and interactive users will not be using the ``ModuleSpec``
factory functions nor any the instance methods.
How Loading Will Work
=====================
This is an outline of what happens in ModuleSpec's loading
functionality::
def load(spec):
if not hasattr(spec.loader, 'exec_module'):
module = spec.loader.load_module(spec.name)
spec.init_module_attrs(module)
return sys.modules[spec.name]
module = None
if hasattr(spec.loader, 'create_module'):
module = spec.loader.create_module(spec)
if module is None:
module = ModuleType(spec.name)
spec.init_module_attrs(module)
spec._initializing = True
sys.modues[spec.name] = module
try:
spec.loader.exec_module(module)
except Exception:
del sys.modules[spec.name]
finally:
spec._initializing = False
return sys.modules[spec.name]
These steps are exactly what ``Loader.load_module()`` is already
expected to do. Loaders will thus be simplified since they will only
need to implement exec_module().
Note that we must return the module from sys.modules. During loading
the module may have replaced itself in sys.modules. Since we don't have
a post-import hook API to accommodate the use case, we have to deal with
it. However, in the replacement case we do not worry about setting the
import-related module attributes on the object. The module writer is on
their own if they are doing this.
ModuleSpec
==========
Attributes
----------
Each of the following names is an attribute on ModuleSpec objects. A
value of ``None`` indicates "not set". This contrasts with module
objects where the attribute simply doesn't exist. Most of the
attributes correspond to the import-related attributes of modules. Here
is the mapping. The reverse of this mapping is used by
ModuleSpec.init_module_attrs().
========================== ==============
On ModuleSpec On Modules
========================== ==============
name __name__
loader __loader__
package __package__
origin __file__*
cached __cached__*,**
submodule_search_locations __path__**
loading_info \-
has_location \-
========================== ==============
\* Set only if has_location is true.
\*\* Set only if the spec attribute is not None.
While package and has_location are read-only properties, the remaining
attributes can be replaced after the module spec is created and even
after import is complete. This allows for unusual cases where directly
modifying the spec is the best option. However, typical use should not
involve changing the state of a module's spec.
**origin**
origin is a string for the place from which the module originates.
Aside from the informational value, it is also used in module_repr().
The module attribute ``__file__`` has a similar but more restricted
meaning. Not all modules have it set (e.g. built-in modules). However,
``origin`` is applicable to all modules. For built-in modules it would
be set to "built-in".
**has_location**
Some modules can be loaded by reference to a location, e.g. a filesystem
path or a URL or something of the sort. Having the location lets you
load the module, but in theory you could load that module under various
names.
In contrast, non-located modules can't be loaded in this fashion, e.g.
builtin modules and modules dynamically created in code. For these, the
name is the only way to access them, so they have an "origin" but not a
"location".
This attribute reflects whether or not the module is locatable. If it
is, origin must be set to the module's location and ``__file__`` will be
set on the module. Not all locatable modules will be cachable, but most
will.
The corresponding module attribute name, ``__file__``, is somewhat
inaccurate and potentially confusion, so we will use a more explicit
combination of origin and has_location to represent the same
information. Having a separate filename is unncessary since we have
origin.
**submodule_search_locations**
The list of location strings, typically directory paths, in which to
search for submodules. If the module is a package this will be set to
a list (even an empty one). Otherwise it is ``None``.
The corresponding module attribute's name, ``__path__``, is relatively
ambiguous. Instead of mirroring it, we use a more explicit name that
makes the purpose clear.
**loading_info**
A finder may set loading_info to any value to provide additional
data for the loader to use during loading. A value of None is the
default and indicates that there is no additional data. Otherwise it
can be set to any object, such as a dict, list, or
types.SimpleNamespace, containing the relevant extra information.
For example, zipimporter could use it to pass the zip archive name
to the loader directly, rather than needing to derive it from origin
or create a custom loader for each find operation.
loading_info is meant for use by the finder and corresponding loader.
It is not guaranteed to be a stable resource for any other use.
Omitted Attributes and Methods
------------------------------
The following ModuleSpec methods are not part of the public API since
it is easy to use them incorrectly and only the import system really
needs them (i.e. they would be an attractive nuisance).
* create() - provide a new module to use for loading.
* exec(module) - execute the spec into a module namespace.
* load() - prepare a module and execute it in a protected way.
* reload(module) - re-execute a module in a protected way.
Here are other omissions:
There is no PathModuleSpec subclass of ModuleSpec that separates out
has_location, cached, and submodule_search_locations. While that might
make the separation cleaner, module objects don't have that distinction.
ModuleSpec will support both cases equally well.
While is_package would be a simple additional attribute (aliasing
``self.submodule_search_locations is not None``), it perpetuates the
artificial (and mostly erroneous) distinction between modules and
packages.
Conceivably, a ModuleSpec.load() method could optionally take a list of
modules with which to interact instead of sys.modules. That
capability is left out of this PEP, but may be pursued separately at
some other time, including relative to PEP 406 (import engine).
Likewise load() could be leveraged to implement multi-version
imports. While interesting, doing so is outside the scope of this
proposal.
Others:
* Add ModuleSpec.submodules (RO-property) - returns possible submodules
relative to the spec.
* Add ModuleSpec.loaded (RO-property) - the module in sys.module, if
any.
* Add ModuleSpec.data - a descriptor that wraps the data API of the
spec's loader.
* Also see [3].
Backward Compatibility
----------------------
ModuleSpec doesn't have any. This would be a different story if
Finder.find_module() were to return a module spec instead of loader.
In that case, specs would have to act like the loader that would have
been returned instead. Doing so would be relatively simple, but is an
unnecessary complication. It was part of earlier versions of this PEP.
Subclassing
-----------
Subclasses of ModuleSpec are allowed, but should not be necessary.
Simply setting loading_info or adding functionality to a custom
finder or loader will likely be a better fit and should be tried first.
However, as long as a subclass still fulfills the requirements of the
import system, objects of that type are completely fine as the return
value of Finder.find_spec().
Existing Types
==============
Module Objects
--------------
Other than adding ``__spec__``, none of the import-related module
attributes will be changed or deprecated, though some of them could be;
any such deprecation can wait until Python 4.
A module's spec will not be kept in sync with the corresponding import-
related attributes. Though they may differ, in practice they will
typically be the same.
One notable exception is that case where a module is run as a script by
using the ``-m`` flag. In that case ``module.__spec__.name`` will
reflect the actual module name while ``module.__name__`` will be
``__main__``.
Notably, the spec for each module instance will be unique to that
instance even if the information is identical to that of another spec.
This won't happen in general.
Finders
-------
Finders are still responsible for creating the loader. That loader will
now be stored in the module spec returned by ``find_spec()`` rather
than returned directly. As is currently the case without the PEP, if a
loader would be costly to create, that loader can be designed to defer
the cost until later.
**MetaPathFinder.find_spec(name, path=None)**
**PathEntryFinder.find_spec(name)**
Finders will return ModuleSpec objects when ``find_spec()`` is
called. This new method replaces ``find_module()`` and
``find_loader()`` (in the ``PathEntryFinder`` case). If a loader does
not have ``find_spec()``, ``find_module()`` and ``find_loader()`` are
used instead, for backward-compatibility.
Adding yet another similar method to loaders is a case of practicality.
``find_module()`` could be changed to return specs instead of loaders.
This is tempting because the import APIs have suffered enough,
especially considering ``PathEntryFinder.find_loader()`` was just
added in Python 3.3. However, the extra complexity and a less-than-
explicit method name aren't worth it.
Loaders
-------
**Loader.exec_module(module)**
Loaders will have a new method, exec_module(). Its only job
is to "exec" the module and consequently populate the module's
namespace. It is not responsible for creating or preparing the module
object, nor for any cleanup afterward. It has no return value.
exec_module() should properly handle the case where it is called more
than once. For some kinds of modules this may mean raising ImportError
every time after the first time the method is called. This is
particularly relevant for reloading, where some kinds of modules do not
support in-place reloading.
**Loader.create_module(spec)**
Loaders may also implement create_module() that will return a
new module to exec. It may return None to indicate that the default
module creation code should be used. One use case for create_module()
is to provide a module that is a subclass of the builtin module type.
Most loaders will not need to implement create_module(),
create_module() should properly handle the case where it is called more
than once for the same spec/module. This may include returning None or
raising ImportError.
Other changes:
PEP 420 introduced the optional ``module_repr()`` loader method to limit
the amount of special-casing in the module type's ``__repr__()``. Since
this method is part of ``ModuleSpec``, it will be deprecated on loaders.
However, if it exists on a loader it will be used exclusively.
``Loader.init_module_attr()`` method, added prior to Python 3.4's
release , will be removed in favor of the same method on ``ModuleSpec``.
However, ``InspectLoader.is_package()`` will not be deprecated even
though the same information is found on ``ModuleSpec``. ``ModuleSpec``
can use it to populate its own ``is_package`` if that information is
not otherwise available. Still, it will be made optional.
One consequence of ModuleSpec is that loader ``__init__`` methods will
no longer need to accommodate per-module state. The path-based loaders
in ``importlib`` take arguments in their ``__init__()`` and have
corresponding attributes. However, the need for those values is
eliminated by module specs.
In addition to executing a module during loading, loaders will still be
directly responsible for providing APIs concerning module-related data.
Other Changes
=============
* The various finders and loaders provided by importlib will be
updated to comply with this proposal.
* The spec for the ``__main__`` module will reflect how the interpreter
was started. For instance, with ``-m`` the spec's name will be that
of the run module, while ``__main__.__name__`` will still be
"__main__".
* We add ``importlib.find_spec()`` to mirror
``importlib.find_loader()`` (which becomes deprecated).
* ``importlib.reload()`` is changed to use ``ModuleSpec.load()``.
* ``importlib.reload()`` will now make use of the per-module import
lock.
Reference Implementation
========================
A reference implementation will be available at
http://bugs.python.org/issue18864.
Open Issues
==============
\* The impact of this change on pkgutil (and setuptools) needs looking
into. It has some generic function-based extensions to PEP 302. These
may break if importlib starts wrapping loaders without the tools'
knowledge.
\* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc,
inspect.
For instance, pickle should be updated in the __main__ case to look at
``module.__spec__.name``.
\* Impact on some kinds of lazy loading modules. See [3].
\* Find a better name than loading_info? Perhaps loading_data,
loader_state, or loader_info.
\* Change loader.create_module() to prepare_module()?
\* Add more explicit reloading support to exec_module() (and
prepare_module())?
References
==========
[1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html
[2] https://mail.python.org/pipermail/import-sig/2013-September/000735.html
[3] https://mail.python.org/pipermail/python-dev/2013-August/128129.html
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130918/0945f92c/attachment-0001.html>