Discussion:
[Import-SIG] PEP 451: Big update.
Eric Snow
2013-09-18 09:51:22 UTC
Permalink
Hi all,

I finally got some time to update the PEP. I've simplified a few things,
most notably by making the 4 ModuleSpec methods (create, exec, load,
reload) "private".

Also notable is that the new loader method is still create_module() and
there is still no flag for is_reload on either of the loader methods. I'm
still not clear on what the flag buys us and on why anything we'd do in a
prepare_module() we couldn't do in exec_module(). I'm trying to keep this
simple. :)

Anyway, I still need to take some time to clean up the PEP formatting and
run a spell checker. I probably also missed some artifact of an older
version of the API. Otherwise I think it's in a good spot. Comments
welcome.

-eric

p.s. I also plan on getting the implementation up one of these days. :P

===============================================================

PEP: 451
Title: A ModuleSpec Type for the Import System
Version: $Revision$
Last-Modified: $Date$
Author: Eric Snow <ericsnowcurrently at gmail.com>
Discussions-To: import-sig at python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 8-Aug-2013
Python-Version: 3.4
Post-History: 8-Aug-2013, 28-Aug-2013, 18-Sep-2013
Resolution:


Abstract
========

This PEP proposes to add a new class to ``importlib.machinery`` called
``ModuleSpec``. It will be authoritative for all the import-related
information about a module, and will be available without needing to
load the module first. Finders will directly provide a module's spec
instead of a loader (which they will continue to provide indirectly).
The import machinery will be adjusted to take advantage of module specs,
including using them to load modules.


Motivation
==========

The import system has evolved over the lifetime of Python. In late 2002
PEP 302 introduced standardized import hooks via ``finders`` and
``loaders`` and ``sys.meta_path``. The ``importlib`` module, introduced
with Python 3.1, now exposes a pure Python implementation of the APIs
described by PEP 302, as well as of the full import system. It is now
much easier to understand and extend the import system. While a benefit
to the Python community, this greater accessibilty also presents a
challenge.

As more developers come to understand and customize the import system,
any weaknesses in the finder and loader APIs will be more impactful. So
the sooner we can address any such weaknesses the import system, the
better...and there are a couple we can take care of with this proposal.

Firstly, any time the import system needs to save information about a
module we end up with more attributes on module objects that are
generally only meaningful to the import system. It would be nice to
have a per-module namespace in which to put future import-related
information and to pass around within the import system. Secondly,
there's an API void between finders and loaders that causes undue
complexity when encountered.

Currently finders are strictly responsible for providing the loader,
through their find_module() method, which the import system will use to
load the module. The loader is then responsible for doing some checks,
creating the module object, setting import-related attributes,
"installing" the module to ``sys.modules``, and loading the module,
along with some cleanup. This all takes place during the import
system's call to ``Loader.load_module()``. Loaders also provide some
APIs for accessing data associated with a module.

Loaders are not required to provide any of the functionality of
``load_module()`` through other methods. Thus, though the import-
related information about a module is likely available without loading
the module, it is not otherwise exposed.

Furthermore, the requirements assocated with ``load_module()`` are
common to all loaders and mostly are implemented in exactly the same
way. This means every loader has to duplicate the same boilerplate
code. ``importlib.util`` provides some tools that help with this, but
it would be more helpful if the import system simply took charge of
these responsibilities. The trouble is that this would limit the degree
of customization that ``load_module()`` facilitates. This is a gap
between finders and loaders which this proposal aims to fill.

Finally, when the import system calls a finder's ``find_module()``, the
finder makes use of a variety of information about the module that is
useful outside the context of the method. Currently the options are
limited for persisting that per-module information past the method call,
since it only returns the loader. Popular options for this limitation
are to store the information in a module-to-info mapping somewhere on
the finder itself, or store it on the loader.

Unfortunately, loaders are not required to be module-specific. On top
of that, some of the useful information finders could provide is
common to all finders, so ideally the import system could take care of
those details. This is the same gap as before between finders and
loaders.

As an example of complexity attributable to this flaw, the
implementation of namespace packages in Python 3.3 (see PEP 420) added
``FileFinder.find_loader()`` because there was no good way for
``find_module()`` to provide the namespace search locations.

The answer to this gap is a ``ModuleSpec`` object that contains the
per-module information and takes care of the boilerplate functionality
involved with loading the module.

(The idea gained momentum during discussions related to another PEP.[1])


Specification
=============

The goal is to address the gap between finders and loaders while
changing as little of their semantics as possible. Though some
functionality and information is moved to the new ``ModuleSpec`` type,
their behavior should remain the same. However, for the sake of clarity
the finder and loader semantics will be explicitly identified.

This is a high-level summary of the changes described by this PEP. More
detail is available in later sections.

importlib.machinery.ModuleSpec (new)
------------------------------------

A specification for a module's import-system-related state.

* ModuleSpec(name, loader, \*, origin=None, loading_info=None,
is_package=None)

Attributes:

* name - a string for the name of the module.
* loader - the loader to use for loading and for module data.
* origin - a string for the location from which the module is loaded,
e.g. "builtin" for built-in modules and the filename for modules
loaded from source.
* submodule_search_locations - strings for where to find submodules,
if a package.
* loading_info - a container of extra data for use during loading.
* cached (property) - a string for where the compiled module will be
stored (see PEP 3147).
* package (RO-property) - the name of the module's parent (or None).
* has_location (RO-property) - the module's origin refers to a location.

Instance Methods:

* module_repr() - provide a repr string for the spec'ed module.
* init_module_attrs(module) - set any of a module's import-related
attributes that aren't already set.

importlib.util Additions
------------------------

* spec_from_file_location(name, location, \*, loader=None,
submodule_search_locations=None)
- factory for file-based module specs.
* from_loader(name, loader, \*, origin=None, is_package=None) - factory
based on information provided by loaders.
* spec_from_module(module, loader=None) - factory based on existing
import-related module attributes. This function is expected to be
used only in some backward-compatibility situations.

Other API Additions
-------------------

* importlib.abc.Loader.exec_module(module) will execute a module in its
own namespace. It replaces ``importlib.abc.Loader.load_module()``.
* importlib.abc.Loader.create_module(spec) (optional) will return a new
module to use for loading.
* Module objects will have a new attribute: ``__spec__``.
* importlib.find_spec(name, path=None) will return the spec for a
module.

exec_module() and create_module() should not set any import-related
module attributes. The fact that load_module() does is a design flaw
that this proposal aims to correct.

API Changes
-----------

* ``InspectLoader.is_package()`` will become optional.

Deprecations
------------

* importlib.abc.MetaPathFinder.find_module()
* importlib.abc.PathEntryFinder.find_module()
* importlib.abc.PathEntryFinder.find_loader()
* importlib.abc.Loader.load_module()
* importlib.abc.Loader.module_repr()
* The parameters and attributes of the various loaders in
importlib.machinery
* importlib.util.set_package()
* importlib.util.set_loader()
* importlib.find_loader()

Removals
--------

These were introduced prior to Python 3.4's release.

* importlib.abc.Loader.init_module_attrs()
* importlib.util.module_to_load()

Other Changes
-------------

* The import system implementation in importlib will be changed to make
use of ModuleSpec.
* Import-related module attributes (other than ``__spec__``) will no
longer be used directly by the import system.
* Import-related attributes should no longer be added to modules
directly.
* The module type's ``__repr__()`` will be thin wrapper around a pure
Python implementation which will leverage ModuleSpec.
* The spec for the ``__main__`` module will reflect the appropriate
name and origin.

Backward-Compatibility
----------------------

* If a finder does not define find_spec(), a spec is derived from
the loader returned by find_module().
* PathEntryFinder.find_loader() still takes priority over
find_module().
* Loader.load_module() is used if exec_module() is not defined.

What Will not Change?
---------------------

* The syntax and semantics of the import statement.
* Existing finders and loaders will continue to work normally.
* The import-related module attributes will still be initialized with
the same information.
* Finders will still create loaders (now storing them in specs).
* Loader.load_module(), if a module defines it, will have all the
same requirements and may still be called directly.
* Loaders will still be responsible for module data APIs.
* importlib.reload() will still overwrite the import-related attributes.


What Will Existing Finders and Loaders Have to Do Differently?
==============================================================

Immediately? Nothing. The status quo will be deprecated, but will
continue working. However, here are the things that the authors of
finders and loaders should change relative to this PEP:

* Implement ``find_spec()`` on finders.
* Implement ``exec_module()`` on loaders, if possible.

The ModuleSpec factory functions in importlib.util are intended to be
helpful for converting existing finders. ``from_loader()`` and
``from_file_location()`` are both straight-forward utilities in this
regard. In the case where loaders already expose methods for creating
and preparing modules, ``ModuleSpec.from_module()`` may be useful to
the corresponding finder.

For existing loaders, exec_module() should be a relatively direct
conversion from the non-boilerplate portion of load_module(). In some
uncommon cases the loader should also implement create_module().


ModuleSpec Users
================

``ModuleSpec`` objects has 3 distinct target audiences: Python itself,
import hooks, and normal Python users.

Python will use specs in the import machinery, in interpreter startup,
and in various standard library modules. Some modules are
import-oriented, like pkgutil, and others are not, like pickle and
pydoc. In all cases, the full ``ModuleSpec`` API will get used.

Import hooks (finders and loaders) will make use of the spec in specific
ways. First of all, finders may use the spec factory functions in
importlib.util to create spec objects. They may also directly adjust
the spec attributes after the spec is created. Secondly, the finder may
bind additional information to the spec (in finder_extras) for the
loader to consume during module creation/execution. Finally, loaders
will make use of the attributes on a spec when creating and/or executing
a module.

Python users will be able to inspect a module's ``__spec__`` to get
import-related information about the object. Generally, Python
applications and interactive users will not be using the ``ModuleSpec``
factory functions nor any the instance methods.


How Loading Will Work
=====================

This is an outline of what happens in ModuleSpec's loading
functionality::

def load(spec):
if not hasattr(spec.loader, 'exec_module'):
module = spec.loader.load_module(spec.name)
spec.init_module_attrs(module)
return sys.modules[spec.name]

module = None
if hasattr(spec.loader, 'create_module'):
module = spec.loader.create_module(spec)
if module is None:
module = ModuleType(spec.name)
spec.init_module_attrs(module)

spec._initializing = True
sys.modues[spec.name] = module
try:
spec.loader.exec_module(module)
except Exception:
del sys.modules[spec.name]
finally:
spec._initializing = False
return sys.modules[spec.name]

These steps are exactly what ``Loader.load_module()`` is already
expected to do. Loaders will thus be simplified since they will only
need to implement exec_module().

Note that we must return the module from sys.modules. During loading
the module may have replaced itself in sys.modules. Since we don't have
a post-import hook API to accommodate the use case, we have to deal with
it. However, in the replacement case we do not worry about setting the
import-related module attributes on the object. The module writer is on
their own if they are doing this.


ModuleSpec
==========

Attributes
----------

Each of the following names is an attribute on ModuleSpec objects. A
value of ``None`` indicates "not set". This contrasts with module
objects where the attribute simply doesn't exist. Most of the
attributes correspond to the import-related attributes of modules. Here
is the mapping. The reverse of this mapping is used by
ModuleSpec.init_module_attrs().

========================== ==============
On ModuleSpec On Modules
========================== ==============
name __name__
loader __loader__
package __package__
origin __file__*
cached __cached__*,**
submodule_search_locations __path__**
loading_info \-
has_location \-
========================== ==============

\* Set only if has_location is true.
\*\* Set only if the spec attribute is not None.

While package and has_location are read-only properties, the remaining
attributes can be replaced after the module spec is created and even
after import is complete. This allows for unusual cases where directly
modifying the spec is the best option. However, typical use should not
involve changing the state of a module's spec.

**origin**

origin is a string for the place from which the module originates.
Aside from the informational value, it is also used in module_repr().

The module attribute ``__file__`` has a similar but more restricted
meaning. Not all modules have it set (e.g. built-in modules). However,
``origin`` is applicable to all modules. For built-in modules it would
be set to "built-in".

**has_location**

Some modules can be loaded by reference to a location, e.g. a filesystem
path or a URL or something of the sort. Having the location lets you
load the module, but in theory you could load that module under various
names.

In contrast, non-located modules can't be loaded in this fashion, e.g.
builtin modules and modules dynamically created in code. For these, the
name is the only way to access them, so they have an "origin" but not a
"location".

This attribute reflects whether or not the module is locatable. If it
is, origin must be set to the module's location and ``__file__`` will be
set on the module. Not all locatable modules will be cachable, but most
will.

The corresponding module attribute name, ``__file__``, is somewhat
inaccurate and potentially confusion, so we will use a more explicit
combination of origin and has_location to represent the same
information. Having a separate filename is unncessary since we have
origin.

**submodule_search_locations**

The list of location strings, typically directory paths, in which to
search for submodules. If the module is a package this will be set to
a list (even an empty one). Otherwise it is ``None``.

The corresponding module attribute's name, ``__path__``, is relatively
ambiguous. Instead of mirroring it, we use a more explicit name that
makes the purpose clear.

**loading_info**

A finder may set loading_info to any value to provide additional
data for the loader to use during loading. A value of None is the
default and indicates that there is no additional data. Otherwise it
can be set to any object, such as a dict, list, or
types.SimpleNamespace, containing the relevant extra information.

For example, zipimporter could use it to pass the zip archive name
to the loader directly, rather than needing to derive it from origin
or create a custom loader for each find operation.

loading_info is meant for use by the finder and corresponding loader.
It is not guaranteed to be a stable resource for any other use.

Omitted Attributes and Methods
------------------------------

The following ModuleSpec methods are not part of the public API since
it is easy to use them incorrectly and only the import system really
needs them (i.e. they would be an attractive nuisance).

* create() - provide a new module to use for loading.
* exec(module) - execute the spec into a module namespace.
* load() - prepare a module and execute it in a protected way.
* reload(module) - re-execute a module in a protected way.

Here are other omissions:

There is no PathModuleSpec subclass of ModuleSpec that separates out
has_location, cached, and submodule_search_locations. While that might
make the separation cleaner, module objects don't have that distinction.
ModuleSpec will support both cases equally well.

While is_package would be a simple additional attribute (aliasing
``self.submodule_search_locations is not None``), it perpetuates the
artificial (and mostly erroneous) distinction between modules and
packages.

Conceivably, a ModuleSpec.load() method could optionally take a list of
modules with which to interact instead of sys.modules. That
capability is left out of this PEP, but may be pursued separately at
some other time, including relative to PEP 406 (import engine).

Likewise load() could be leveraged to implement multi-version
imports. While interesting, doing so is outside the scope of this
proposal.

Others:

* Add ModuleSpec.submodules (RO-property) - returns possible submodules
relative to the spec.
* Add ModuleSpec.loaded (RO-property) - the module in sys.module, if
any.
* Add ModuleSpec.data - a descriptor that wraps the data API of the
spec's loader.
* Also see [3].


Backward Compatibility
----------------------

ModuleSpec doesn't have any. This would be a different story if
Finder.find_module() were to return a module spec instead of loader.
In that case, specs would have to act like the loader that would have
been returned instead. Doing so would be relatively simple, but is an
unnecessary complication. It was part of earlier versions of this PEP.

Subclassing
-----------

Subclasses of ModuleSpec are allowed, but should not be necessary.
Simply setting loading_info or adding functionality to a custom
finder or loader will likely be a better fit and should be tried first.
However, as long as a subclass still fulfills the requirements of the
import system, objects of that type are completely fine as the return
value of Finder.find_spec().


Existing Types
==============

Module Objects
--------------

Other than adding ``__spec__``, none of the import-related module
attributes will be changed or deprecated, though some of them could be;
any such deprecation can wait until Python 4.

A module's spec will not be kept in sync with the corresponding import-
related attributes. Though they may differ, in practice they will
typically be the same.

One notable exception is that case where a module is run as a script by
using the ``-m`` flag. In that case ``module.__spec__.name`` will
reflect the actual module name while ``module.__name__`` will be
``__main__``.

Notably, the spec for each module instance will be unique to that
instance even if the information is identical to that of another spec.
This won't happen in general.

Finders
-------

Finders are still responsible for creating the loader. That loader will
now be stored in the module spec returned by ``find_spec()`` rather
than returned directly. As is currently the case without the PEP, if a
loader would be costly to create, that loader can be designed to defer
the cost until later.

**MetaPathFinder.find_spec(name, path=None)**

**PathEntryFinder.find_spec(name)**

Finders will return ModuleSpec objects when ``find_spec()`` is
called. This new method replaces ``find_module()`` and
``find_loader()`` (in the ``PathEntryFinder`` case). If a loader does
not have ``find_spec()``, ``find_module()`` and ``find_loader()`` are
used instead, for backward-compatibility.

Adding yet another similar method to loaders is a case of practicality.
``find_module()`` could be changed to return specs instead of loaders.
This is tempting because the import APIs have suffered enough,
especially considering ``PathEntryFinder.find_loader()`` was just
added in Python 3.3. However, the extra complexity and a less-than-
explicit method name aren't worth it.

Loaders
-------

**Loader.exec_module(module)**

Loaders will have a new method, exec_module(). Its only job
is to "exec" the module and consequently populate the module's
namespace. It is not responsible for creating or preparing the module
object, nor for any cleanup afterward. It has no return value.

exec_module() should properly handle the case where it is called more
than once. For some kinds of modules this may mean raising ImportError
every time after the first time the method is called. This is
particularly relevant for reloading, where some kinds of modules do not
support in-place reloading.

**Loader.create_module(spec)**

Loaders may also implement create_module() that will return a
new module to exec. It may return None to indicate that the default
module creation code should be used. One use case for create_module()
is to provide a module that is a subclass of the builtin module type.
Most loaders will not need to implement create_module(),

create_module() should properly handle the case where it is called more
than once for the same spec/module. This may include returning None or
raising ImportError.

Other changes:

PEP 420 introduced the optional ``module_repr()`` loader method to limit
the amount of special-casing in the module type's ``__repr__()``. Since
this method is part of ``ModuleSpec``, it will be deprecated on loaders.
However, if it exists on a loader it will be used exclusively.

``Loader.init_module_attr()`` method, added prior to Python 3.4's
release , will be removed in favor of the same method on ``ModuleSpec``.

However, ``InspectLoader.is_package()`` will not be deprecated even
though the same information is found on ``ModuleSpec``. ``ModuleSpec``
can use it to populate its own ``is_package`` if that information is
not otherwise available. Still, it will be made optional.

One consequence of ModuleSpec is that loader ``__init__`` methods will
no longer need to accommodate per-module state. The path-based loaders
in ``importlib`` take arguments in their ``__init__()`` and have
corresponding attributes. However, the need for those values is
eliminated by module specs.

In addition to executing a module during loading, loaders will still be
directly responsible for providing APIs concerning module-related data.


Other Changes
=============

* The various finders and loaders provided by importlib will be
updated to comply with this proposal.
* The spec for the ``__main__`` module will reflect how the interpreter
was started. For instance, with ``-m`` the spec's name will be that
of the run module, while ``__main__.__name__`` will still be
"__main__".
* We add ``importlib.find_spec()`` to mirror
``importlib.find_loader()`` (which becomes deprecated).
* ``importlib.reload()`` is changed to use ``ModuleSpec.load()``.
* ``importlib.reload()`` will now make use of the per-module import
lock.


Reference Implementation
========================

A reference implementation will be available at
http://bugs.python.org/issue18864.


Open Issues
==============

\* The impact of this change on pkgutil (and setuptools) needs looking
into. It has some generic function-based extensions to PEP 302. These
may break if importlib starts wrapping loaders without the tools'
knowledge.

\* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc,
inspect.

For instance, pickle should be updated in the __main__ case to look at
``module.__spec__.name``.

\* Impact on some kinds of lazy loading modules. See [3].

\* Find a better name than loading_info? Perhaps loading_data,
loader_state, or loader_info.

\* Change loader.create_module() to prepare_module()?

\* Add more explicit reloading support to exec_module() (and
prepare_module())?


References
==========

[1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html

[2] https://mail.python.org/pipermail/import-sig/2013-September/000735.html

[3] https://mail.python.org/pipermail/python-dev/2013-August/128129.html


Copyright
=========

This document has been placed in the public domain.

..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130918/0945f92c/attachment-0001.html>
Brett Cannon
2013-09-18 14:57:44 UTC
Permalink
Looking good! Comments inline.
Post by Eric Snow
Hi all,
I finally got some time to update the PEP. I've simplified a few things,
most notably by making the 4 ModuleSpec methods (create, exec, load,
reload) "private".
Also notable is that the new loader method is still create_module() and
there is still no flag for is_reload on either of the loader methods. I'm
still not clear on what the flag buys us and on why anything we'd do in a
prepare_module() we couldn't do in exec_module(). I'm trying to keep this
simple. :)
Anyway, I still need to take some time to clean up the PEP formatting and
run a spell checker. I probably also missed some artifact of an older
version of the API. Otherwise I think it's in a good spot. Comments
welcome.
-eric
p.s. I also plan on getting the implementation up one of these days. :P
===============================================================
PEP: 451
Title: A ModuleSpec Type for the Import System
Version: $Revision$
Last-Modified: $Date$
Author: Eric Snow <ericsnowcurrently at gmail.com>
Discussions-To: import-sig at python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 8-Aug-2013
Python-Version: 3.4
Post-History: 8-Aug-2013, 28-Aug-2013, 18-Sep-2013
[SNIP]
Post by Eric Snow
Specification
=============
The goal is to address the gap between finders and loaders while
changing as little of their semantics as possible. Though some
functionality and information is moved to the new ``ModuleSpec`` type,
their behavior should remain the same. However, for the sake of clarity
the finder and loader semantics will be explicitly identified.
This is a high-level summary of the changes described by this PEP. More
detail is available in later sections.
importlib.machinery.ModuleSpec (new)
------------------------------------
A specification for a module's import-system-related state.
* ModuleSpec(name, loader, \*, origin=None, loading_info=None,
is_package=None)
* name - a string for the name of the module.
* loader - the loader to use for loading and for module data.
Just drop the "and for module data"; sentence is awkward with it and is a
margin use-case.
Post by Eric Snow
* origin - a string for the location from which the module is loaded,
e.g. "builtin" for built-in modules and the filename for modules
loaded from source.
* submodule_search_locations - strings for where to find submodules,
if a package.
Very subtle hint that it's a sequence of of strings; might want to make it
more explicit that it's a list.
Post by Eric Snow
* loading_info - a container of extra data for use during loading.
* cached (property) - a string for where the compiled module will be
stored (see PEP 3147).
* package (RO-property) - the name of the module's parent (or None).
* has_location (RO-property) - the module's origin refers to a location.
* module_repr() - provide a repr string for the spec'ed module.
* init_module_attrs(module) - set any of a module's import-related
attributes that aren't already set.
importlib.util Additions
------------------------
* spec_from_file_location(name, location, \*, loader=None,
submodule_search_locations=None)
- factory for file-based module specs.
* from_loader(name, loader, \*, origin=None, is_package=None) - factory
based on information provided by loaders.
* spec_from_module(module, loader=None) - factory based on existing
import-related module attributes. This function is expected to be
used only in some backward-compatibility situations.
Other API Additions
-------------------
* importlib.abc.Loader.exec_module(module) will execute a module in its
own namespace. It replaces ``importlib.abc.Loader.load_module()``.
* importlib.abc.Loader.create_module(spec) (optional) will return a new
module to use for loading.
* Module objects will have a new attribute: ``__spec__``.
* importlib.find_spec(name, path=None) will return the spec for a
module.
exec_module() and create_module() should not set any import-related
module attributes. The fact that load_module() does is a design flaw
that this proposal aims to correct.
This is a rather jarring place to make this statement since you're just
outlining API additions, not design decisions.
Post by Eric Snow
API Changes
-----------
* ``InspectLoader.is_package()`` will become optional.
Deprecations
------------
* importlib.abc.MetaPathFinder.find_module()
* importlib.abc.PathEntryFinder.find_module()
* importlib.abc.PathEntryFinder.find_loader()
* importlib.abc.Loader.load_module()
* importlib.abc.Loader.module_repr()
* The parameters and attributes of the various loaders in
importlib.machinery
* importlib.util.set_package()
* importlib.util.set_loader()
* importlib.find_loader()
Yay to all of this! =)
Post by Eric Snow
Removals
--------
These were introduced prior to Python 3.4's release.
* importlib.abc.Loader.init_module_attrs()
* importlib.util.module_to_load()
Other Changes
-------------
* The import system implementation in importlib will be changed to make
use of ModuleSpec.
* Import-related module attributes (other than ``__spec__``) will no
longer be used directly by the import system.
* Import-related attributes should no longer be added to modules
directly.
* The module type's ``__repr__()`` will be thin wrapper around a pure
Python implementation which will leverage ModuleSpec.
"be a thin"
Post by Eric Snow
* The spec for the ``__main__`` module will reflect the appropriate
name and origin.
Backward-Compatibility
----------------------
* If a finder does not define find_spec(), a spec is derived from
the loader returned by find_module().
* PathEntryFinder.find_loader() still takes priority over
find_module().
* Loader.load_module() is used if exec_module() is not defined.
What Will not Change?
---------------------
* The syntax and semantics of the import statement.
* Existing finders and loaders will continue to work normally.
* The import-related module attributes will still be initialized with
the same information.
* Finders will still create loaders (now storing them in specs).
* Loader.load_module(), if a module defines it, will have all the
same requirements and may still be called directly.
* Loaders will still be responsible for module data APIs.
* importlib.reload() will still overwrite the import-related attributes.
What Will Existing Finders and Loaders Have to Do Differently?
==============================================================
Immediately? Nothing. The status quo will be deprecated, but will
continue working. However, here are the things that the authors of
* Implement ``find_spec()`` on finders.
* Implement ``exec_module()`` on loaders, if possible.
The ModuleSpec factory functions in importlib.util are intended to be
helpful for converting existing finders. ``from_loader()`` and
``from_file_location()`` are both straight-forward utilities in this
regard. In the case where loaders already expose methods for creating
and preparing modules, ``ModuleSpec.from_module()`` may be useful to
the corresponding finder.
For existing loaders, exec_module() should be a relatively direct
conversion from the non-boilerplate portion of load_module(). In some
uncommon cases the loader should also implement create_module().
ModuleSpec Users
================
``ModuleSpec`` objects has 3 distinct target audiences: Python itself,
import hooks, and normal Python users.
"has" -> "have"
Post by Eric Snow
Python will use specs in the import machinery, in interpreter startup,
and in various standard library modules. Some modules are
import-oriented, like pkgutil, and others are not, like pickle and
pydoc. In all cases, the full ``ModuleSpec`` API will get used.
Import hooks (finders and loaders) will make use of the spec in specific
ways. First of all, finders may use the spec factory functions in
importlib.util to create spec objects. They may also directly adjust
the spec attributes after the spec is created. Secondly, the finder may
bind additional information to the spec (in finder_extras) for the
loader to consume during module creation/execution. Finally, loaders
will make use of the attributes on a spec when creating and/or executing
a module.
Python users will be able to inspect a module's ``__spec__`` to get
import-related information about the object. Generally, Python
applications and interactive users will not be using the ``ModuleSpec``
factory functions nor any the instance methods.
How Loading Will Work
=====================
This is an outline of what happens in ModuleSpec's loading
module = spec.loader.load_module(spec.name)
spec.init_module_attrs(module)
return sys.modules[spec.name]
module = None
module = spec.loader.create_module(spec)
module = ModuleType(spec.name)
spec.init_module_attrs(module)
spec._initializing = True
sys.modues[spec.name] = module
spec.loader.exec_module(module)
del sys.modules[spec.name]
spec._initializing = False
return sys.modules[spec.name]
These steps are exactly what ``Loader.load_module()`` is already
expected to do. Loaders will thus be simplified since they will only
need to implement exec_module().
Two things. One, it's not exactly what loaders do as that _initializing is
done by import itself. Any specific reason you added it here?

Two, you forgot to re-raise the exception in the except clause.
Post by Eric Snow
Note that we must return the module from sys.modules. During loading
the module may have replaced itself in sys.modules. Since we don't have
a post-import hook API to accommodate the use case, we have to deal with
it. However, in the replacement case we do not worry about setting the
import-related module attributes on the object. The module writer is on
their own if they are doing this.
ModuleSpec
==========
Attributes
----------
Each of the following names is an attribute on ModuleSpec objects. A
value of ``None`` indicates "not set". This contrasts with module
objects where the attribute simply doesn't exist. Most of the
attributes correspond to the import-related attributes of modules. Here
is the mapping. The reverse of this mapping is used by
ModuleSpec.init_module_attrs().
========================== ==============
On ModuleSpec On Modules
========================== ==============
name __name__
loader __loader__
package __package__
origin __file__*
cached __cached__*,**
submodule_search_locations __path__**
loading_info \-
has_location \-
========================== ==============
\* Set only if has_location is true.
\*\* Set only if the spec attribute is not None.
"Set on the module if the spec"
Post by Eric Snow
While package and has_location are read-only properties, the remaining
attributes can be replaced after the module spec is created and even
after import is complete. This allows for unusual cases where directly
modifying the spec is the best option. However, typical use should not
involve changing the state of a module's spec.
**origin**
origin is a string for the place from which the module originates.
Aside from the informational value, it is also used in module_repr().
The module attribute ``__file__`` has a similar but more restricted
meaning. Not all modules have it set (e.g. built-in modules). However,
``origin`` is applicable to all modules. For built-in modules it would
be set to "built-in".
**has_location**
Some modules can be loaded by reference to a location, e.g. a filesystem
path or a URL or something of the sort. Having the location lets you
load the module, but in theory you could load that module under various
names.
In contrast, non-located modules can't be loaded in this fashion, e.g.
builtin modules and modules dynamically created in code. For these, the
name is the only way to access them, so they have an "origin" but not a
"location".
This attribute reflects whether or not the module is locatable. If it
is, origin must be set to the module's location and ``__file__`` will be
set on the module. Not all locatable modules will be cachable, but most
will.
The corresponding module attribute name, ``__file__``, is somewhat
inaccurate and potentially confusion,
"confusion" -> "confusing"
Post by Eric Snow
so we will use a more explicit
combination of origin and has_location to represent the same
information. Having a separate filename is unncessary since we have
origin.
Quote 'origin' so you don't read it like it should have been written "we
have an origin".
Post by Eric Snow
**submodule_search_locations**
The list of location strings, typically directory paths, in which to
search for submodules. If the module is a package this will be set to
a list (even an empty one). Otherwise it is ``None``.
The corresponding module attribute's name, ``__path__``, is relatively
ambiguous. Instead of mirroring it, we use a more explicit name that
makes the purpose clear.
**loading_info**
A finder may set loading_info to any value to provide additional
data for the loader to use during loading. A value of None is the
default and indicates that there is no additional data. Otherwise it
can be set to any object, such as a dict, list, or
types.SimpleNamespace, containing the relevant extra information.
For example, zipimporter could use it to pass the zip archive name
to the loader directly, rather than needing to derive it from origin
or create a custom loader for each find operation.
loading_info is meant for use by the finder and corresponding loader.
It is not guaranteed to be a stable resource for any other use.
Omitted Attributes and Methods
------------------------------
The following ModuleSpec methods are not part of the public API since
it is easy to use them incorrectly and only the import system really
needs them (i.e. they would be an attractive nuisance).
* create() - provide a new module to use for loading.
* exec(module) - execute the spec into a module namespace.
* load() - prepare a module and execute it in a protected way.
* reload(module) - re-execute a module in a protected way.
If they are not part of the public API they should have a leading
underscore.
Post by Eric Snow
There is no PathModuleSpec subclass of ModuleSpec that separates out
has_location, cached, and submodule_search_locations. While that might
make the separation cleaner, module objects don't have that distinction.
ModuleSpec will support both cases equally well.
While is_package would be a simple additional attribute (aliasing
``self.submodule_search_locations is not None``), it perpetuates the
artificial (and mostly erroneous) distinction between modules and
packages.
Conceivably, a ModuleSpec.load() method could optionally take a list of
modules with which to interact instead of sys.modules. That
capability is left out of this PEP, but may be pursued separately at
some other time, including relative to PEP 406 (import engine).
Likewise load() could be leveraged to implement multi-version
imports. While interesting, doing so is outside the scope of this
proposal.
* Add ModuleSpec.submodules (RO-property) - returns possible submodules
relative to the spec.
* Add ModuleSpec.loaded (RO-property) - the module in sys.module, if
any.
* Add ModuleSpec.data - a descriptor that wraps the data API of the
spec's loader.
* Also see [3].
Backward Compatibility
----------------------
ModuleSpec doesn't have any. This would be a different story if
Finder.find_module() were to return a module spec instead of loader.
In that case, specs would have to act like the loader that would have
been returned instead. Doing so would be relatively simple, but is an
unnecessary complication. It was part of earlier versions of this PEP.
Subclassing
-----------
Subclasses of ModuleSpec are allowed, but should not be necessary.
Simply setting loading_info or adding functionality to a custom
finder or loader will likely be a better fit and should be tried first.
However, as long as a subclass still fulfills the requirements of the
import system, objects of that type are completely fine as the return
value of Finder.find_spec().
[SNIP]
Post by Eric Snow
Open Issues
==============
\* The impact of this change on pkgutil (and setuptools) needs looking
into. It has some generic function-based extensions to PEP 302. These
may break if importlib starts wrapping loaders without the tools'
knowledge.
\* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc,
inspect.
For instance, pickle should be updated in the __main__ case to look at
``module.__spec__.name``.
\* Impact on some kinds of lazy loading modules. See [3].
\* Find a better name than loading_info? Perhaps loading_data,
loader_state, or loader_info.
loader_state or loader_data get my vote.
Post by Eric Snow
\* Change loader.create_module() to prepare_module()?
-0 from me.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130918/24dbb4e5/attachment-0001.html>
Eric Snow
2013-09-19 05:06:27 UTC
Permalink
Post by Brett Cannon
Looking good! Comments inline.
Thanks for the feedback, Brett. I fixed everything you pointed out. Also,
I'm going with loader_state. :)

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130918/a58d3f73/attachment.html>
Nick Coghlan
2013-09-18 16:08:57 UTC
Permalink
Post by Eric Snow
Hi all,
I finally got some time to update the PEP. I've simplified a few things,
most notably by making the 4 ModuleSpec methods (create, exec, load, reload)
"private".
Also notable is that the new loader method is still create_module() and
there is still no flag for is_reload on either of the loader methods. I'm
still not clear on what the flag buys us and on why anything we'd do in a
prepare_module() we couldn't do in exec_module(). I'm trying to keep this
simple. :)
The point is to give the invoker of the loader a chance to muck about
with the module state before actually executing the module. For
example, runpy and the updated extension loader API could use this to
support execution of compiled Cython modules with -m.

Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
Eric Snow
2013-09-18 22:14:13 UTC
Permalink
Post by Eric Snow
Post by Eric Snow
Hi all,
I finally got some time to update the PEP. I've simplified a few things,
most notably by making the 4 ModuleSpec methods (create, exec, load,
reload)
Post by Eric Snow
"private".
Also notable is that the new loader method is still create_module() and
there is still no flag for is_reload on either of the loader methods.
I'm
Post by Eric Snow
still not clear on what the flag buys us and on why anything we'd do in a
prepare_module() we couldn't do in exec_module(). I'm trying to keep
this
Post by Eric Snow
simple. :)
The point is to give the invoker of the loader a chance to muck about
with the module state before actually executing the module. For
example, runpy and the updated extension loader API could use this to
support execution of compiled Cython modules with -m.
That makes sense. A loader.create_module() method (not called during
reload) gives you that. I'm all for that. I'm just not clear on why it
needs to be more than that.

My understanding of the proposed prepare_module() is it would always be
called right before exec_module(), whether it be load or reload (there
would be no create_module()). Then in that case, can't loaders just roll
their prepare_module() implementation into the beginning of exec_module()
(even call spec.init_module_attrs() directly)? What's the advantage to
splitting that out in the Loader API? I know I'm missing something here.
(Maybe I shouldn't try to work on the PEP so late at night!)

...after further consideration...

I expect it's so that during reload the loader can indicate "don't reload
in-place, load into this module instead!" So the module passed in to
exec_module() would end up being different from the existing module in
sys.modules. However, can't exec_module() simply exec into the module that
it would have returned from prepare_module() and then directly stick it
into sys.modules?

...after further consideration...

Okay, maybe I'm seeing it. Would it be something like the following?

#-- start prepare_module() example --

class ModuleSpec:
...
def _load(self):
# This is basically the same as the PEP currently defines it.
module = self.loader.prepare_module(self) # I prefer create_module
for this.
if module is None:
module = ModuleType(self.name)
self.init_module_attrs(module)
# skipping some boilerplate
sys.modules[self.name] = module
self.loader.exec_module(module)
return sys.modules[self.name]

def _reload(self, module):
# This is where it gets different.
prepared = self.loader.prepare_module(self, module)
if prepared is not None:
self.init_module_attrs(prepared)
module = prepared
sys.modules[self.name] = module
self.loader.exec_module(module)
return sys.modules[self.name]

class SomeLoader:

def prepare_module(self, spec, module=None):
if self.never_ever_been_loaded_before_not_even_in_subinterpreters(
spec.name):
self.initialize_stuff(spec)
return MyCustomModule(spec.name)

def exec_module(self, module):
# Do exec stuff here.

#-- end prepare_module() example --

(Note that _load() and _reload() could share more code than they do, but
regardless...)

Contrast that with what the PEP specifies currently.

#-- start current PEP example --

class ModuleSpec:
...
def _create(self):
module = self.loader.create_module(self)
if module is None:
module = ModuleType(self.name)
self.init_module_attrs(module)
return module

def _load(self):
module = self._create()
# skipping boilerplate
self.loader.exec_module(module)
return sys.modules[self.name]

def _reload(self, module):
self.loader.exec_module(module)
return sys.modules[self.name]

class SomeLoader:

def create_module(self, spec):
if self.never_ever_been_loaded_before_not_even_in_subinterpreters(
spec.name):
self.initialize_stuff(spec)
return MyCustomModule(spec.name)

def exec_module(self, module):
if not
self.never_ever_been_loaded_before_not_even_in_subinterpreters(spec.name):
module = module.__spec__._create()
# or module = self.create_module(spec);
spec.init_module_attrs(module)
sys.modules[module.__name__] = module
# Do exec stuff here.

#-- end current PEP example --

The way I see it, in the latter example the ModuleSpec is easier to follow,
without making exec_module() that much more complicated.

Regardless, at this point I'm seeing prepare_module() as a formal API for
"use *this* module instead of what you would use by default." While
create_module() provides that for the loading case, prepare_module() also
provides it explicitly for the reloading case. Consequently, in the reload
case prepare_module() does eliminate the boilerplate that exec_module()
otherwise must accommodate. That's probably the biggest reason to go there.

I wonder if we could instead wrap that bit in a ModuleSpec helper method
that loaders can call in exec_module():

def _new_module_for_reload(self):
module = self._create()
sys.modules[self.name] = module

FWIW, I think create_module() is still an appropriate (and better) name
regardless of where it's used.

At this point I still would rather stick with what the PEP currently
specifies, but I'm going ruminate on the reload case--e,g, re-read your
message about reload strategies as well as your response to my message
about module lifecycles. I think I have a more context to fit them into
the big picture here.

Not to leave anything out, is there any reason we shouldn't punt right now
on the whole reload mechanics issue and bundle it with the PEP on improving
extension modules? I'd like to wrap up ModuleSpec and see about the .ref
PEP that started all this. Plus I think this PEP is hitting the limit of a
mentally bite-size proposal. I've been lamentably busy of late so I'm
worried about expanding them PEP. However, I'm open to more discussion on
supporting other reload strategies, particularly if you think this PEP
should not move forward with having settled the issue.

BTW, thanks for diving into the extension module questions (you and
Stefan). Those discussions have helped improve this PEP. :)

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130918/cbb2a535/attachment.html>
Nick Coghlan
2013-09-19 01:01:19 UTC
Permalink
Yeah, I preferred the "prepare_module" name when I thought the extension
loader returned the cached module object directly. It doesn't, it returns a
copy, so "create_module" is fine.

Also agreed on deferring reload behavioural improvements to a separate PEP.
As noted in my other email, I think an advisory "this isn't going to work"
API is a better idea now, since even pure Python modules don't always
support reloading.

And +1 to "loader_state" as the helper attribute name.

Cheers,
Nick.
Post by Eric Snow
Post by Eric Snow
Post by Eric Snow
Hi all,
I finally got some time to update the PEP. I've simplified a few
things,
Post by Eric Snow
most notably by making the 4 ModuleSpec methods (create, exec, load,
reload)
Post by Eric Snow
"private".
Also notable is that the new loader method is still create_module() and
there is still no flag for is_reload on either of the loader methods.
I'm
Post by Eric Snow
still not clear on what the flag buys us and on why anything we'd do in
a
Post by Eric Snow
prepare_module() we couldn't do in exec_module(). I'm trying to keep
this
Post by Eric Snow
simple. :)
The point is to give the invoker of the loader a chance to muck about
with the module state before actually executing the module. For
example, runpy and the updated extension loader API could use this to
support execution of compiled Cython modules with -m.
That makes sense. A loader.create_module() method (not called during
reload) gives you that. I'm all for that. I'm just not clear on why it
needs to be more than that.
My understanding of the proposed prepare_module() is it would always be
called right before exec_module(), whether it be load or reload (there
would be no create_module()). Then in that case, can't loaders just roll
their prepare_module() implementation into the beginning of exec_module()
(even call spec.init_module_attrs() directly)? What's the advantage to
splitting that out in the Loader API? I know I'm missing something here.
(Maybe I shouldn't try to work on the PEP so late at night!)
...after further consideration...
I expect it's so that during reload the loader can indicate "don't reload
in-place, load into this module instead!" So the module passed in to
exec_module() would end up being different from the existing module in
sys.modules. However, can't exec_module() simply exec into the module that
it would have returned from prepare_module() and then directly stick it
into sys.modules?
...after further consideration...
Okay, maybe I'm seeing it. Would it be something like the following?
#-- start prepare_module() example --
...
# This is basically the same as the PEP currently defines it.
module = self.loader.prepare_module(self) # I prefer
create_module for this.
module = ModuleType(self.name)
self.init_module_attrs(module)
# skipping some boilerplate
sys.modules[self.name] = module
self.loader.exec_module(module)
return sys.modules[self.name]
# This is where it gets different.
prepared = self.loader.prepare_module(self, module)
self.init_module_attrs(prepared)
module = prepared
sys.modules[self.name] = module
self.loader.exec_module(module)
return sys.modules[self.name]
if self.never_ever_been_loaded_before_not_even_in_subinterpreters(
self.initialize_stuff(spec)
return MyCustomModule(spec.name)
# Do exec stuff here.
#-- end prepare_module() example --
(Note that _load() and _reload() could share more code than they do, but
regardless...)
Contrast that with what the PEP specifies currently.
#-- start current PEP example --
...
module = self.loader.create_module(self)
module = ModuleType(self.name)
self.init_module_attrs(module)
return module
module = self._create()
# skipping boilerplate
self.loader.exec_module(module)
return sys.modules[self.name]
self.loader.exec_module(module)
return sys.modules[self.name]
if self.never_ever_been_loaded_before_not_even_in_subinterpreters(
self.initialize_stuff(spec)
return MyCustomModule(spec.name)
if not
module = module.__spec__._create()
# or module = self.create_module(spec);
spec.init_module_attrs(module)
sys.modules[module.__name__] = module
# Do exec stuff here.
#-- end current PEP example --
The way I see it, in the latter example the ModuleSpec is easier to
follow, without making exec_module() that much more complicated.
Regardless, at this point I'm seeing prepare_module() as a formal API for
"use *this* module instead of what you would use by default." While
create_module() provides that for the loading case, prepare_module() also
provides it explicitly for the reloading case. Consequently, in the reload
case prepare_module() does eliminate the boilerplate that exec_module()
otherwise must accommodate. That's probably the biggest reason to go there.
I wonder if we could instead wrap that bit in a ModuleSpec helper method
module = self._create()
sys.modules[self.name] = module
FWIW, I think create_module() is still an appropriate (and better) name
regardless of where it's used.
At this point I still would rather stick with what the PEP currently
specifies, but I'm going ruminate on the reload case--e,g, re-read your
message about reload strategies as well as your response to my message
about module lifecycles. I think I have a more context to fit them into
the big picture here.
Not to leave anything out, is there any reason we shouldn't punt right now
on the whole reload mechanics issue and bundle it with the PEP on improving
extension modules? I'd like to wrap up ModuleSpec and see about the .ref
PEP that started all this. Plus I think this PEP is hitting the limit of a
mentally bite-size proposal. I've been lamentably busy of late so I'm
worried about expanding them PEP. However, I'm open to more discussion on
supporting other reload strategies, particularly if you think this PEP
should not move forward with having settled the issue.
BTW, thanks for diving into the extension module questions (you and
Stefan). Those discussions have helped improve this PEP. :)
-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130919/3d10df23/attachment-0001.html>
Eric Snow
2013-09-19 05:13:02 UTC
Permalink
Post by Nick Coghlan
Yeah, I preferred the "prepare_module" name when I thought the extension
loader returned the cached module object directly. It doesn't, it returns a
copy, so "create_module" is fine.
Cool.
Post by Nick Coghlan
Also agreed on deferring reload behavioural improvements to a separate PEP.
Sounds good.
Post by Nick Coghlan
As noted in my other email, I think an advisory "this isn't going to work"
API is a better idea now, since even pure Python modules don't always
support reloading.
What do you mean by "advisory" API?
Post by Nick Coghlan
And +1 to "loader_state" as the helper attribute name.
That's settled then! Thanks for the feedback.

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130918/ae9f9906/attachment.html>
Eric Snow
2013-09-19 05:38:23 UTC
Permalink
I'm thinking that it may be useful to have ModuleSpec inherit from str and
set it to the module name. Then the spec could be passed directly to those
loader APIs that take the module name. Thoughts?

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130918/1e854cef/attachment-0001.html>
Nick Coghlan
2013-09-19 08:17:27 UTC
Permalink
Post by Eric Snow
I'm thinking that it may be useful to have ModuleSpec inherit from str and
set it to the module name. Then the spec could be passed directly to those
loader APIs that take the module name. Thoughts?
I think I'd need to see the code you think it would simplify before
saying yes (since my default answer is "No, inheriting from str is an
unnecessary hack").

Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
Eric Snow
2013-09-19 16:42:01 UTC
Permalink
Post by Nick Coghlan
Post by Eric Snow
I'm thinking that it may be useful to have ModuleSpec inherit from str
and
Post by Eric Snow
set it to the module name. Then the spec could be passed directly to
those
Post by Eric Snow
loader APIs that take the module name. Thoughts?
I think I'd need to see the code you think it would simplify before
saying yes (since my default answer is "No, inheriting from str is an
unnecessary hack").
I would generally be -1 on some hacks.
Especially, str subclasses can leak to unsuspected places and create
weird issues (I remember an issue with BeautifulSoup, IIRC, which
returned str subclasses which kept whole HTML trees alive: by passing
those str objects around you would create yourself a huge memory leak).
Agreed. I've done it in other projects for backward-compatibility reasons,
but that doesn't really apply here. That's interesting about memory leaks.
I would not have expected that.

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130919/3f0a3f66/attachment.html>
Antoine Pitrou
2013-09-19 08:21:22 UTC
Permalink
Le Wed, 18 Sep 2013 23:38:23 -0600,
Post by Eric Snow
I'm thinking that it may be useful to have ModuleSpec inherit from
str and set it to the module name. Then the spec could be passed
directly to those loader APIs that take the module name. Thoughts?
I would generally be -1 on some hacks.
Especially, str subclasses can leak to unsuspected places and create
weird issues (I remember an issue with BeautifulSoup, IIRC, which
returned str subclasses which kept whole HTML trees alive: by passing
those str objects around you would create yourself a huge memory leak).

Regards

Antoine.
Antoine Pitrou
2013-09-19 10:22:09 UTC
Permalink
Hi,
Post by Eric Snow
origin - a string for the location from which the module is loaded,
e.g. "builtin" for built-in modules and the filename for modules
loaded from source.
Filename or filepath? What if the module is stored in e.g. a ZIP file?
Post by Eric Snow
submodule_search_locations - list of strings for where to find
submodules, if a package (None otherwise).
Why isn't is_package exposed as an attribute too?
Post by Eric Snow
cached (property) - a string for where the compiled module will be
stored
"where" is a filesystem location?
(absolute? relative to the origin?)
Post by Eric Snow
has_location (RO-property) - the module's origin refers to a location.
filesystem location? What about ZIP files?
Post by Eric Snow
spec_from_file_location(name, location, *, loader=None,
submodule_search_locations=None) - factory for file-based module specs
What does it mean? Is it able to make "intelligent" decisions depending
on e.g. whether the module is an extension module or a pure Python
module?
Post by Eric Snow
from_loader(name, loader, *, origin=None, is_package=None) - factory
based on information provided by loaders.
That description is rather unhelpful.
Post by Eric Snow
importlib.find_spec(name, path=None) will return the spec for a module.
Is the module supposed to be already loaded or not? How is the spec
"found"?

Regards

Antoine.
Paul Moore
2013-09-19 11:28:24 UTC
Permalink
Post by Antoine Pitrou
Post by Eric Snow
origin - a string for the location from which the module is loaded,
e.g. "builtin" for built-in modules and the filename for modules
loaded from source.
Filename or filepath? What if the module is stored in e.g. a ZIP file?
I haven't been following this thread closely, but this is a good
point. There is a general issue that for modules loaded off sys.path,
the module "location" needs to be somehow jammed into a string form
(the absolute path for files, zip/file/path.zip/location/in/zipfile
for zipfiles, but potentially anything at all for custom loaders) and
for things loaded off sys.meta_path there's no need for any concept of
path at all (that's how builtins, frozen modules et al work).

It's worth being clear on both how this origin should be constructed
in the general case (for the guidance of people implementing
non-standard importers) and what users of the data can assume when
using the data (can they split the value on os.sep or '/', for
example, or is it in effect an opaque token).

Some of the blame for all this being vague at the moment is down to me
- when we were writing PEP 302, I wasn't brave enough to claim that
path entries could be opaque token values, but I didn't want to insist
that all importers had to follow a specific structure. So I ignored
the issue and we just ended up with normal paths, and zipfiles which
treat the zipfile as a pseudo-directory. And no examples of corner
cases to keep people honest. My apologies for that...

Paul
Eric Snow
2013-09-19 19:30:18 UTC
Permalink
Post by Paul Moore
Post by Antoine Pitrou
Post by Eric Snow
origin - a string for the location from which the module is loaded,
e.g. "builtin" for built-in modules and the filename for modules
loaded from source.
Filename or filepath? What if the module is stored in e.g. a ZIP file?
I haven't been following this thread closely, but this is a good
point. There is a general issue that for modules loaded off sys.path,
the module "location" needs to be somehow jammed into a string form
(the absolute path for files, zip/file/path.zip/location/in/zipfile
for zipfiles, but potentially anything at all for custom loaders) and
for things loaded off sys.meta_path there's no need for any concept of
path at all (that's how builtins, frozen modules et al work).
It's worth being clear on both how this origin should be constructed
in the general case (for the guidance of people implementing
non-standard importers) and what users of the data can assume when
using the data (can they split the value on os.sep or '/', for
example, or is it in effect an opaque token).
Actually, "origin" is meant to be pretty unconstrained string. It only has
2 explicit purposes: use in spec.module_repr() and as the value of __file__
when spec.has_location is true. The loader may use "origin" however it
likes. Presumably the finder would populate origin in whatever format the
loader needs (if the loader even needs "origin"), but that's between the
finder and loader. If the loader needs even more info, the finder can just
stick it into the spec's loader_state attribute.
Post by Paul Moore
Some of the blame for all this being vague at the moment is down to me
- when we were writing PEP 302, I wasn't brave enough to claim that
path entries could be opaque token values, but I didn't want to insist
that all importers had to follow a specific structure. So I ignored
the issue and we just ended up with normal paths, and zipfiles which
treat the zipfile as a pseudo-directory. And no examples of corner
cases to keep people honest. My apologies for that...
As Nick pointed out, the "loader_state" attribute of ModuleSpec objects is
meant to be the container for any extra data the loader needs.

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130919/d9bbf94f/attachment-0001.html>
Brett Cannon
2013-09-19 14:11:52 UTC
Permalink
Post by Antoine Pitrou
Hi,
Post by Eric Snow
origin - a string for the location from which the module is loaded,
e.g. "builtin" for built-in modules and the filename for modules
loaded from source.
Filename or filepath? What if the module is stored in e.g. a ZIP file?
I think this would be what __file__ would be set to for zipfiles, so for
zip files it would be e.g. /some/file.zip/path/to/module.py
Post by Antoine Pitrou
Post by Eric Snow
submodule_search_locations - list of strings for where to find
submodules, if a package (None otherwise).
Why isn't is_package exposed as an attribute too?
It's redundant. The test for whether something is a package is literally
``submodule_search_locations is not None``. It just doesn't isn't
complicated enough to warrant another attribute. Plus being a package isn't
as important per-se as a concept as much as having a search path.
Post by Antoine Pitrou
Post by Eric Snow
cached (property) - a string for where the compiled module will be
stored
"where" is a filesystem location?
(absolute? relative to the origin?)
It's what http://docs.python.org/3/library/imp.html#imp.cache_from_source would
return.
Post by Antoine Pitrou
Post by Eric Snow
has_location (RO-property) - the module's origin refers to a location.
filesystem location? What about ZIP files?
It's a flag to basically say that origin contains what __file__ should be.

-Brett
Post by Antoine Pitrou
Post by Eric Snow
spec_from_file_location(name, location, *, loader=None,
submodule_search_locations=None) - factory for file-based module specs
What does it mean? Is it able to make "intelligent" decisions depending
on e.g. whether the module is an extension module or a pure Python
module?
Post by Eric Snow
from_loader(name, loader, *, origin=None, is_package=None) - factory
based on information provided by loaders.
That description is rather unhelpful.
Post by Eric Snow
importlib.find_spec(name, path=None) will return the spec for a module.
Is the module supposed to be already loaded or not? How is the spec
"found"?
Regards
Antoine.
_______________________________________________
Import-SIG mailing list
Import-SIG at python.org
https://mail.python.org/mailman/listinfo/import-sig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130919/7c130e02/attachment.html>
Nick Coghlan
2013-09-19 14:30:29 UTC
Permalink
Post by Brett Cannon
Post by Antoine Pitrou
Hi,
Post by Eric Snow
origin - a string for the location from which the module is loaded,
e.g. "builtin" for built-in modules and the filename for modules
loaded from source.
Filename or filepath? What if the module is stored in e.g. a ZIP file?
I think this would be what __file__ would be set to for zipfiles, so for
zip files it would be e.g. /some/file.zip/path/to/module.py
Post by Brett Cannon
Post by Antoine Pitrou
Post by Eric Snow
submodule_search_locations - list of strings for where to find
submodules, if a package (None otherwise).
Why isn't is_package exposed as an attribute too?
It's redundant. The test for whether something is a package is literally
``submodule_search_locations is not None``. It just doesn't isn't
complicated enough to warrant another attribute. Plus being a package isn't
as important per-se as a concept as much as having a search path.
Post by Brett Cannon
Post by Antoine Pitrou
Post by Eric Snow
cached (property) - a string for where the compiled module will be
stored
"where" is a filesystem location?
(absolute? relative to the origin?)
It's what http://docs.python.org/3/library/imp.html#imp.cache_from_source would
return.
Post by Brett Cannon
Post by Antoine Pitrou
Post by Eric Snow
has_location (RO-property) - the module's origin refers to a location.
filesystem location? What about ZIP files?
It's a flag to basically say that origin contains what __file__ should be.
Thus indicating that get_data() on the loader can be used sensibly. Perhaps
we could just make setting __file__ conditional on the loader defining
get_data, rather than having it be a spec attribute?

I also suggest that we adopt the convention of using angle brackets in
non-location origins. So names like "<builtin>" and "<frozen>".

To respond to something Paul said, our completely opaque token is
"loader_state", origin is still intended to be a human readable string.

Cheers,
Nick.
Post by Brett Cannon
-Brett
Post by Antoine Pitrou
Post by Eric Snow
spec_from_file_location(name, location, *, loader=None,
submodule_search_locations=None) - factory for file-based module specs
What does it mean? Is it able to make "intelligent" decisions depending
on e.g. whether the module is an extension module or a pure Python
module?
Post by Eric Snow
from_loader(name, loader, *, origin=None, is_package=None) - factory
based on information provided by loaders.
That description is rather unhelpful.
Post by Eric Snow
importlib.find_spec(name, path=None) will return the spec for a module.
Is the module supposed to be already loaded or not? How is the spec
"found"?
Regards
Antoine.
_______________________________________________
Import-SIG mailing list
Import-SIG at python.org
https://mail.python.org/mailman/listinfo/import-sig
_______________________________________________
Import-SIG mailing list
Import-SIG at python.org
https://mail.python.org/mailman/listinfo/import-sig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130920/e499813a/attachment-0001.html>
Antoine Pitrou
2013-09-19 14:48:54 UTC
Permalink
Le Fri, 20 Sep 2013 00:30:29 +1000,
Post by Nick Coghlan
I also suggest that we adopt the convention of using angle brackets in
non-location origins. So names like "<builtin>" and "<frozen>".
+1. They stand out much better.

Regards

Antoine.
Eric Snow
2013-09-19 19:42:57 UTC
Permalink
Post by Nick Coghlan
On Thu, Sep 19, 2013 at 6:22 AM, Antoine Pitrou <solipsis at pitrou.net>
Post by Antoine Pitrou
Post by Eric Snow
has_location (RO-property) - the module's origin refers to a location.
filesystem location? What about ZIP files?
It's a flag to basically say that origin contains what __file__ should
be.
Thus indicating that get_data() on the loader can be used sensibly.
Perhaps we could just make setting __file__ conditional on the loader
defining get_data, rather than having it be a spec attribute?
I'd still like to keep an explicit "has_location" as a clear, informational
declaration. How about we always set it to True if loader.get_data exists?
I think you proposed this before and it got lost in the shuffle.
Post by Nick Coghlan
I also suggest that we adopt the convention of using angle brackets in
non-location origins. So names like "<builtin>" and "<frozen>".
Well, I'm already having module_repr() do that. I've thought of this
before, but decided it was better to have the separate "has_location"
attribute. Then there is no ambiguity between the origin of a
non-locatable module and a locatable one that happens to have bookend angle
brackets. I will make sure the spec is explicit about the angle brackets
in module_repr().

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130919/d25f0270/attachment.html>
Eric Snow
2013-09-19 19:52:05 UTC
Permalink
Post by Eric Snow
Post by Nick Coghlan
I also suggest that we adopt the convention of using angle brackets in
non-location origins. So names like "<builtin>" and "<frozen>".
Well, I'm already having module_repr() do that.
Actually no I wasn't. The current repr for the sys module is "<module
'sys' (built-in)>". Adding the angle brackets would change that. It's not
a big deal to me either way. I actually kind of like the idea of using
angle brackets (by convention) on a non-locatable origin. It just changes
existing reprs and can be ambiguous in the (unlikely) situation I
described. I'm leaning toward not doing the angle brackets, but I can be
swayed. :)

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130919/5a5860ef/attachment.html>
Eric Snow
2013-09-19 19:12:06 UTC
Permalink
Hi Antoine,

Thanks for the feedback. Comments inline.
Post by Antoine Pitrou
Post by Eric Snow
origin - a string for the location from which the module is loaded,
e.g. "builtin" for built-in modules and the filename for modules
loaded from source.
Filename or filepath? What if the module is stored in e.g. a ZIP file?
As Brett mentioned, it would be whatever is currently bound to __file__.
Keep in mind that the two things I listed are just examples of the sorts
of things that would go into "origin". The point of "origin" is actually
explained in more detail further on in the PEP.
Post by Antoine Pitrou
Post by Eric Snow
submodule_search_locations - list of strings for where to find
submodules, if a package (None otherwise).
Why isn't is_package exposed as an attribute too?
We had some discussion on this on a previous revision of the PEP.
Initially I had is_package as a property of ModuleSpec. However, we came
to the agreement that whether or not the spec represents a package is not
very important once you have the spec. This contrasts with the is_package
parameter to ModuleSpec which is useful since it represents a set of things
that should be effected on the new spec object. Ultimately Nick put it
best when he said that we need to de-emphasize the superficial
package/module distinction, not enshrine it as an attribute. The PEP
actually addresses the question of is_package in the "Omitted Attributes
and Methods" section.
Post by Antoine Pitrou
Post by Eric Snow
cached (property) - a string for where the compiled module will be
stored
"where" is a filesystem location?
(absolute? relative to the origin?)
As Brett noted (and the module attribute table further on indicates), this
is the same as the __cache__ attribute of modules.
Post by Antoine Pitrou
Post by Eric Snow
has_location (RO-property) - the module's origin refers to a location.
filesystem location? What about ZIP files?
Also as Brett indicated, this is a flag that indicates that "origin" should
be copied into __file__
on corresponding module objects. However, the summary is pretty unclear.
I'll fix that.
Post by Antoine Pitrou
Post by Eric Snow
spec_from_file_location(name, location, *, loader=None,
submodule_search_locations=None) - factory for file-based module specs
What does it mean? Is it able to make "intelligent" decisions depending
on e.g. whether the module is an extension module or a pure Python
module?
It does make some intelligent decisions. Otherwise a finder would just
call ModuleSpec directly. (All three factory functions are there for the
convenience of finders.) I'll add some explanation on what those decisions
entail and also clarify the summary.
Post by Antoine Pitrou
Post by Eric Snow
from_loader(name, loader, *, origin=None, is_package=None) - factory
based on information provided by loaders.
That description is rather unhelpful.
Likewise I'll add more explanation for this as well as improve the summary.
Post by Antoine Pitrou
Post by Eric Snow
importlib.find_spec(name, path=None) will return the spec for a module.
Is the module supposed to be already loaded or not? How is the spec
"found"?
This function is the replacement for importlib.find_loader(). Instead of
returning a loader it
returns a spec. Otherwise it's the same. I'll make the summary more clear.

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130919/abcd3e5b/attachment.html>
Continue reading on narkive:
Loading...