[Import-SIG] PEP 420: Implicit Namespace Packages

Post by Eric V. Smith
This reflects (I hope!) the discussions at PyCon. My plan is to produce
an implementation based on the importlib code, and then flush out pieces
of the PEP.
In particular, I want to make sure the PEP addresses the various
objections that were raised, especially by Nick.

Obviously thanks for writing this up, Eric! I have the following comments
(some of which I would fix myself but I lack hg repo access ATM) ...

In Terminology, can you put the terms when you define them in quotes, e.g. 'The
term "distribution" refers to ...'?

"setuptools provides a similar function pkg_resources.declare_namespace"
should either have a "named" added in there or a comma.

"As vendors might chose(sic) to".

You should mention that this will do away with the ImportWarning of
discovering a directory lacking an __init__.py file.

As for the effects on path hooks, there are none. =) It's actually the
finders that they return which need to change. Either finders need to be
updated to return something other than None to signal they have a directory
which works for the name (maybe the string for what should go into
__path__?) or another method on finders which is called if
finder.find_module() returns None (like finder.find_namespace() which
returns the directory name or None). Then you need to update
importlib._bootstrap.PathFinder to handle one of the two approaches to
create the module and set it with some __loader__ (which really doesn't
need to do much more than construct a module with the proper attributes
since there is nothing to execute) like
importlib.machinery.NamespaceLoader(name, *paths). Using a specific class
in import already has precedence thanks to NullImporter.

If you want performance then you go with the returning of a string by
finder.find_module() since the finder can keep track of finding a directory
w/o an __init__.py when it tries looking for a module. Import can do a
hasattr check on non-None return values to decide if it got back a loader
or a path for a namespace. If you don't like what the return value to mean
based on it being None or having a specific attribute then you would want
the new method at the (potential) cost of another stat call. Or maybe
someone can think of some other approach.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120419/bf2a4baa/attachment.html>

Eric V. Smith

2012-04-19 22:10:40 UTC

Post by Brett Cannon
In Terminology, can you put the terms when you define them in quotes,
e.g. 'The term "distribution" refers to ...'?
"setuptools provides a similar function pkg_resources.declare_namespace"
should either have a "named" added in there or a comma.
"As vendors might chose(sic) to".

I've made these grammar changes. I'll update based on the rest of your
comments tomorrow.

Thanks!

Eric.

Post by Brett Cannon
You should mention that this will do away with the ImportWarning of
discovering a directory lacking an __init__.py file.
As for the effects on path hooks, there are none. =) It's actually the
finders that they return which need to change. Either finders need to be
updated to return something other than None to signal they have a
directory which works for the name (maybe the string for what should go
into __path__?) or another method on finders which is called if
finder.find_module() returns None (like finder.find_namespace() which
returns the directory name or None). Then you need to update
importlib._bootstrap.PathFinder to handle one of the two approaches to
create the module and set it with some __loader__ (which really doesn't
need to do much more than construct a module with the proper attributes
since there is nothing to execute) like
importlib.machinery.NamespaceLoader(name, *paths). Using a specific
class in import already has precedence thanks to NullImporter.
If you want performance then you go with the returning of a string by
finder.find_module() since the finder can keep track of finding a
directory w/o an __init__.py when it tries looking for a module. Import
can do a hasattr check on non-None return values to decide if it got
back a loader or a path for a namespace. If you don't like what the
return value to mean based on it being None or having a specific
attribute then you would want the new method at the (potential) cost of
another stat call. Or maybe someone can think of some other approach.

Eric V. Smith

2012-04-19 22:59:56 UTC

Post by Brett Cannon
You should mention that this will do away with the ImportWarning of
discovering a directory lacking an __init__.py file.

Done.

Post by Brett Cannon
As for the effects on path hooks, there are none. =) It's actually the
finders that they return which need to change. Either finders need to be
updated to return something other than None to signal they have a
directory which works for the name (maybe the string for what should go
into __path__?) or another method on finders which is called if
finder.find_module() returns None (like finder.find_namespace() which
returns the directory name or None). Then you need to update
importlib._bootstrap.PathFinder to handle one of the two approaches to
create the module and set it with some __loader__ (which really doesn't
need to do much more than construct a module with the proper attributes
since there is nothing to execute) like
importlib.machinery.NamespaceLoader(name, *paths). Using a specific
class in import already has precedence thanks to NullImporter.
If you want performance then you go with the returning of a string by
finder.find_module() since the finder can keep track of finding a
directory w/o an __init__.py when it tries looking for a module. Import
can do a hasattr check on non-None return values to decide if it got
back a loader or a path for a namespace. If you don't like what the
return value to mean based on it being None or having a specific
attribute then you would want the new method at the (potential) cost of
another stat call. Or maybe someone can think of some other approach.

Changing finder.find_module() to return a string seems the best thing to do.

Barry and I (and hopefully Jason Coombs) are going to try and get
together and sprint on this in the near future. I might wait to update
the PEP on the affect on finders until we're done.

Thanks again.

Eric.

Eric Snow

2012-04-19 21:21:32 UTC

Nice work, Eric. PEP 420 is quite clear. I appreciate that not many
words are spent on contrasting it with PEP 402. I agree that the PEP
needs to be clear on Nick's concerns, one way or the other (especially
as they relate to PEP 395). I don't recall any satisfactory
resolution on that. Looking forward to hearing more on this.

-eric

p.s. how often do the PEPs get rebuilt? I saw the PEP as it came
across the commits list, but it's not showing up on the site.

Nick Coghlan

2012-04-20 03:56:50 UTC

Post by Eric V. Smith
This reflects (I hope!) the discussions at PyCon. My plan is to produce
an implementation based on the importlib code, and then flush out pieces
of the PEP.

This paragraph in the "Rationale" section is confusing:

"Namespace packages need to be installed in one of two ways: either
all portions of a namespace will be combined into a single directory
(and therefore a single entry in sys.path), or each portion will be
installed in its own directory (and each portion will have a distinct
sys.path entry)."

I would combine this with the following paragraph to make a single
cohesive explanation of the problem that needs to be solved:

"Namespace packages are designed to support being split across
multiple directories (and hence found via multiple sys.path entries).
In this configuration, it doesn't matter if multiple portions all
provide an __init__.py file, so long as each portion correctly
initialises the namespace package. However, Linux distribution vendors
(amongst others) prefer to combine the separate portions and install
them all into the *same* filesystem directory. This creates a
potential for conflict, as the portions are now attempting to provide
the *same* file on the target system - something that is not allowed
by many package managers. Allowing implicit namespace packages means
that the requirement to provide an __init__.py file can be dropped
completely, and affected portions can be installed into a common
directory or split across multiple directories as distributions see
fit."

Post by Eric V. Smith
In particular, I want to make sure the PEP addresses the various
objections that were raised, especially by Nick.

Yep. I'm happy with the conclusions we reached in the previous
discussion, but PEP 420 does need to describe them. Here's the gist of
it for the four points listed:

- for the first point, "practicality beats purity" pretty much carries
the day as far the Zen goes

- for the second point, the minor backwards compatibility risks are
acknowledged and accepted. My initial objection was based on a
misunderstanding of the consensus proposal. Once it was clarified that
the only "incompatibility" is that an import may now succeed where it
previously would have failed, I was no longer concerned. In contrast
to PEP 402, PEP 420 deliberately chooses to preserve consistent
behaviour of "import foo; import foo.bar" and "import foo.bar; import
foo", seeing that as being more important than preventing the
successful import of an empty (or otherwise non-package) subdirectory
of a sys.path location. This does mean some try/except import blocks
may need to updated to check the imported module or package for an
expected attribute or subpackage rather than just checking that the
import works, but has the major advantage of making the revised import
model much cleaner and easier to understand.

- the final two points will be addressed by having PEP 395 propose the
production of better *error messages* rather than introducing any
additional magic to the initialisation of sys.path[0] (see
http://mail.python.org/pipermail/import-sig/2012-March/000442.html).
The "are we in a package subdirectory?" heuristic mentioned in that
message will be based on this suggestion from Eric Snow:
http://mail.python.org/pipermail/import-sig/2012-March/000438.html

Cheers,
Nick.

--
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

Eric V. Smith

2012-04-20 10:21:31 UTC

Post by Eric V. Smith
This reflects (I hope!) the discussions at PyCon. My plan is to produce
an implementation based on the importlib code, and then flush out pieces
of the PEP.

"Namespace packages need to be installed in one of two ways: either
all portions of a namespace will be combined into a single directory
(and therefore a single entry in sys.path), or each portion will be
installed in its own directory (and each portion will have a distinct
sys.path entry)."
I would combine this with the following paragraph to make a single
"Namespace packages are designed to support being split across
multiple directories (and hence found via multiple sys.path entries).
In this configuration, it doesn't matter if multiple portions all
provide an __init__.py file, so long as each portion correctly
initialises the namespace package. However, Linux distribution vendors
(amongst others) prefer to combine the separate portions and install
them all into the *same* filesystem directory. This creates a
potential for conflict, as the portions are now attempting to provide
the *same* file on the target system - something that is not allowed
by many package managers. Allowing implicit namespace packages means
that the requirement to provide an __init__.py file can be dropped
completely, and affected portions can be installed into a common
directory or split across multiple directories as distributions see
fit."

That does read much better. Thanks.

Post by Eric V. Smith
In particular, I want to make sure the PEP addresses the various
objections that were raised, especially by Nick.

Yep. I'm happy with the conclusions we reached in the previous
discussion, but PEP 420 does need to describe them. Here's the gist of

<discussion deleted>

I'll add these after I go back and re-read the original thread.

Eric.

Nick Coghlan

2012-04-20 04:04:59 UTC

One other thing I noticed: "There is no mechanism to recompute the
__path__ once a namespace package has been created."

This isn't really true - pkgutil.extend_path() can still be used to
update a namespace package path. Perhaps change it to:

"There is no mechanism to automatically recompute the __path__ if
sys.path is altered after a namespace package has already been
created. However, existing namespace utilities (like
pkgutil.extend_path()) can be used to update them explicitly if
desired."

Also, as a general matter of readability, adding double backticks
around attributes, functions and filenames to get them displayed in
monospace can be quite helpful.

Cheers,
Nick.

--
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

Eric V. Smith

2012-04-20 10:15:30 UTC

One other thing I noticed: "There is no mechanism to recompute the
__path__ once a namespace package has been created."
This isn't really true - pkgutil.extend_path() can still be used to
"There is no mechanism to automatically recompute the __path__ if
sys.path is altered after a namespace package has already been
created. However, existing namespace utilities (like
pkgutil.extend_path()) can be used to update them explicitly if
desired."

Done. Thanks!

Post by Nick Coghlan
Also, as a general matter of readability, adding double backticks
around attributes, functions and filenames to get them displayed in
monospace can be quite helpful.

Agreed. That's a work in progress.

Eric.

PJ Eby

2012-04-21 17:06:23 UTC

Post by Nick Coghlan
"There is no mechanism to automatically recompute the __path__ if
sys.path is altered after a namespace package has already been
created. However, existing namespace utilities (like
pkgutil.extend_path()) can be used to update them explicitly if
desired."

Btw, was there ever an explicit rejection of the "namespace package
__path__ is an auto-updating iterable instead of a list" approach, or did
it even come up in the consensus discussion?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120421/ef68c2e9/attachment.html>

Eric Snow

2012-04-21 18:49:16 UTC

Btw, was there ever an explicit rejection of the "namespace package __path__
is an auto-updating iterable instead of a list" approach, or did it even
come up in the consensus discussion?

Pretty sure it didn't come up, but it sounds like Eric Smith has
considered it. PEP 420 currently has this to say:

"There is no mechanism to automatically recompute the __path__ if
sys.path is altered after a namespace package has already been
created. However, existing namespace utilities (like
pkgutil.extend_path) can be used to update them explicitly if
desired." [1]

-eric

[1] http://www.python.org/dev/peps/pep-0420/#id9

"Martin v. Löwis"

2012-04-21 20:50:23 UTC

Post by Eric Snow

Btw, was there ever an explicit rejection of the "namespace package __path__
is an auto-updating iterable instead of a list" approach, or did it even
come up in the consensus discussion?

Pretty sure it didn't come up, but it sounds like Eric Smith has
considered it.

There was a sort of bulk-rejection of "fancy features", IIRC. It wasn't
clear to us which of the many additional features of PEP 402 was really
important to you, so the consensus was to start with the minimum, and
extend as actual use cases become apparent.

For some of the PEP 402 features, we identified "concurrent versions"
as the use case (i.e. pkg_resources.require). The consensus was that
this use case can be ignored.

Eric is right that the specific question of a dynamic __path__ was not
discussed.

Regards,
Martin

Eric V. Smith

2012-04-22 01:06:45 UTC

Post by Eric Snow

Btw, was there ever an explicit rejection of the "namespace package __path__
is an auto-updating iterable instead of a list" approach, or did it even
come up in the consensus discussion?

What's the use case for this?

Post by Eric Snow
Pretty sure it didn't come up, but it sounds like Eric Smith has
considered it.

I don't recall this issue specifically, but I agree with Martin that
we're trying to start with a minimal feature set.

Post by "Martin v. LÃ¶wis"
Eric is right that the specific question of a dynamic __path__ was not
discussed.

Furthermore, given how __path__ is built, by one-at-a-time remembering
the path entries that have a foo directory but no foo/__init__.py, I'm
not sure how you'd turn that into some auto-updating iterable.

Eric.

Nick Coghlan

2012-04-22 05:26:06 UTC

Post by Eric V. Smith
Furthermore, given how __path__ is built, by one-at-a-time remembering
the path entries that have a foo directory but no foo/__init__.py, I'm
not sure how you'd turn that into some auto-updating iterable.

You just have to remember all your namespace packages somewhere and
then use a list subclass that triggers a rescan whenever the contents
change.

Personally, I'm happier with the basic behaviour being that
dynamically updating sys.path while the program is running can be a
bit hit-or-miss in terms of what recognises the change.

Longer term, rather than introducing magical side effects for sys.path
manipulation, I think the better solution is to expose a more
object-oriented API for manipulating the import system state that
takes care of maintaining the state invariants, invalidating caches
when appropriate and triggering updates to package __path__ entries.
Hence, PEP 406 (currently deferred) and its import engine API.

Cheers,
Nick.

--
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

PJ Eby

2012-04-22 21:10:27 UTC

You just have to remember all your namespace packages somewhere and
then use a list subclass that triggers a rescan whenever the contents
change.

Not necessary if you set __path__ to an iterable that caches a tuple of its
parent package __path__ (or sys.path), and compares that against the
current value before iterating. If it's changed, you walk the parent and
rescan, otherwise iterate over your cached value. I posted a sketch to
Python-Dev the first time 402 discussion happened there.

The consequences of making namespace package __path__ iterable are less
problematic, I believe, than changing the type of sys.path: almost no code
manipulates __path__ as anything but an iterable, and code that does is
broken for namespace packages anyway, because accessing specific offsets
won't give you what you think you're looking for. So you get noisy
breakage instead of quiet breakage in such cases (as would happen with
using lists for __path__).

If for some reason you want to explicitly change a namespace package's
__path__, you could just reset __path__ to list(__path__), and proceed from
there -- which is the recommended idiom for using extend_path, anyway.

Personally, I'm happier with the basic behaviour being that

Post by Nick Coghlan
dynamically updating sys.path while the program is running can be a
bit hit-or-miss in terms of what recognises the change.

pkg_resources supports dynamic updating today, so the idea here was to make
it possible to do away with that. (It only supports updating if it's the
one doing the sys.path manipulation, however.)

I think there should be *some* blessed API(s) to force the updating,
though, even if it's not automatic or dynamic. extend_path() really isn't
the right tool for the job.

The main argument in favor of automatic updating is that it more closely
matches naive expectations of users coming from other languages. (Although
to be honest I'm not 100% certain that those other languages actually do
change their lookups that dynamically.)

Anyway, the sketch (using PEP 402's importer protocol; not updated for 420)
was something like:

class VirtualPath:
__slots__ = ('__name__', '_parent', '_last_seen', '_path')

def __init__(self, name, parent_path):
self.__name__ = name
self._parent = parent_path
self._path = self._last_seen = ()

def _fail(self, *args, **kw):
raise TypeError(self.__name__+" is a virtual package")

__getitem__ = __setitem__ = __delitem__ = append = extend = insert =
_fail

def _calculate(self):
with _ImportLockContext():
parent = tuple(self._parent)
if parent != self._last_seen:
items = []
name = self.__name__
for entry in parent:
importer = get_importer(entry)
if hasattr(importer, 'get_subpath'):
item = importer.get_subpath(name)
if item is not None:
items.append(item)
self._last_seen = parent
self._path = tuple(items)
return self._path

def __iter__(self):
return iter(self._calculate())

def __len__(self):
return len(self._calculate())

def __repr__(self):
return "VirtualPath" + repr((self.__name__, self._parent))

def __contains__(self, item):
return item in self._calculate()

Using these objects in place of lists for __path__ objects would then do
the trick.

(And of course, you'd want to change "Virtual" to "Namespace" throughout, I
suppose. ;-) )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120422/d5f4ae8a/attachment.html>

Michael Foord

2012-04-22 23:51:59 UTC

So a namespace package is a directory (tree) on sys.path. For a standard
Python install how will these be installed?

If you need to install "foo.bar" and "foo.baz" will distutils and packaging
do the right thing? (And what specifically is the right thing for Python's
own package management tools - merging the namespace packages or keeping
them separate somehow?)

setuptools creates a new directory for each installed package and adds this
directory to sys.path using pth files. It's a bit of a hack, but it allows
namespace packages to co-exist.

Michael

Post by Eric V. Smith
Eric.
_______________________________________________
Import-SIG mailing list
Import-SIG at python.org
http://mail.python.org/mailman/listinfo/import-sig

--
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120423/20279c6d/attachment.html>

PJ Eby

2012-04-23 00:29:58 UTC

Post by Michael Foord

I don't know about 3.x distutils or packaging specifically, but I do know
that 2.x distutils will install packages compatibly with this approach if
you list the child packages but NOT the namespace package in your setup.py.
So if one distribution lists 'foo.bar' and the other lists 'foo.baz', but
*neither* lists 'foo', then the subpackages will be installed without a
foo/__init__.py, and that will make it work.

If packaging and 3.x distutils inherit this behavior from the 2.x
distutils, then that would be the simplest way to do it. (And if you
install to different directories, the parts will get merged.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120422/22c5467e/attachment-0001.html>

Nick Coghlan

2012-04-23 01:08:55 UTC

Post by Michael Foord

<lib_dir>/site-packages/foo/bar
<lib_dir>/site-packages/foo/baz

The whole point of dropping the __init__.py file requirement is that
merging the namespace portions becomes trivial, so you don't need to
worry about sys.path hackery in the normal case - you can just install
them into a common directory (adding it on install if it doesn't exist
yet, removing it on uninstall if the only remaining contents are the
__pycache__ subdirectory).

However, for zipfile distribution, or running from a source checkout,
you could instead provide them as <app_dir>/foo/bar and
<app_dir>/foo/baz and they would still be accessible as "foo.bar" and
"foo.baz". Basically, PEP 420 should mean that managing subpackages
and submodules becomes a *lot* more like managing top level packages
and modules.

Agreed the packaging implications should be specified clearly in the
PEP, though (especially the install/uninstall behaviour when namespace
portions get merged into a single directory).

Cheers,
Nick.

--
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

Carl Meyer

2012-04-28 00:52:53 UTC

One clarity issue in the PEP:

"If the scan along the parent path completes without finding a module or
package, then a namespace package is created."

This seems incomplete, and should say something like:

"If the scan along the parent path completes with finding a module or
package, *but at least one directory was recorded,* then a namespace
package is created."

The current wording seems to imply that any failed import would always
cause the creation of a namespace package with an empty __path__, which
I presume is not the intent.

Carl

Eric V. Smith

2012-04-28 10:27:19 UTC

Post by Carl Meyer

"If the scan along the parent path completes without finding a module or
package, then a namespace package is created."
"If the scan along the parent path completes with finding a module or
package, *but at least one directory was recorded,* then a namespace
package is created."
The current wording seems to imply that any failed import would always
cause the creation of a namespace package with an empty __path__, which
I presume is not the intent.

Completely agree. I changed "but" to "and", but otherwise used it
as-is. It's checked in.

Thanks!

Eric.

Eric V. Smith

2012-05-01 22:00:28 UTC

I'm working on finishing up the PEP 420 work. I think the PEP itself is
complete. If you have any comments, please send them to me or this list.

The implementation at features/pep-420 has been merged with the recent
importlib changes to the 3.3 branch. I've implemented support in the
import machinery itself, as well as modified the filesystem finder
(FileFinder) and the zipimport finder.

About the only question I have is: Is everyone okay with the changes to
the finders, described in the PEP? Basically they now return a string in
addition to a loader or None. If they return a string, then the string
represents the path of a possible namespace package portion. The change
is backward compatible: unmodified finders will just be unable to
participate in a namespace package.

Barry Warsaw, Jason Coombs, and I are sprinting this Thursday. We'll
focus on adding tests, and maybe documentation if we have time. If
anyone has any concerns I'd like to hear them before then so that we can
work on addressing them.

The changes themselves are very small. I think the diff is a total of
maybe 40 lines of code. Yury Selivanov had mentioned backporting to 3.2
(which I assume would be an unsupported-by-python-dev effort). I
actually don't think it would be all that complicated.

Eric.

Brett Cannon

2012-05-02 02:22:03 UTC

Post by Eric V. Smith
I'm working on finishing up the PEP 420 work. I think the PEP itself is
complete. If you have any comments, please send them to me or this list.
The implementation at features/pep-420 has been merged with the recent
importlib changes to the 3.3 branch. I've implemented support in the
import machinery itself, as well as modified the filesystem finder
(FileFinder) and the zipimport finder.
About the only question I have is: Is everyone okay with the changes to
the finders, described in the PEP? Basically they now return a string in
addition to a loader or None. If they return a string, then the string
represents the path of a possible namespace package portion. The change
is backward compatible: unmodified finders will just be unable to
participate in a namespace package.

I obviously okay with the change. =) So this email is just a +1 in support
of this work and a thanks for coding it up and seeing this through!

-Brett

Post by Eric V. Smith
Barry Warsaw, Jason Coombs, and I are sprinting this Thursday. We'll
focus on adding tests, and maybe documentation if we have time. If
anyone has any concerns I'd like to hear them before then so that we can
work on addressing them.
The changes themselves are very small. I think the diff is a total of
maybe 40 lines of code. Yury Selivanov had mentioned backporting to 3.2
(which I assume would be an unsupported-by-python-dev effort). I
actually don't think it would be all that complicated.

Ignoring that the classes he would need to access are technically private,
backporting should be no more than a subclass and an extra stat call by
FileFinder if None is returned.

-Brett

Post by Eric V. Smith
Eric.
_______________________________________________
Import-SIG mailing list
Import-SIG at python.org
http://mail.python.org/mailman/listinfo/import-sig

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120501/2422cd68/attachment.html>

"Martin v. Löwis"

2012-05-02 07:17:00 UTC

Post by Eric V. Smith
About the only question I have is: Is everyone okay with the changes to
the finders, described in the PEP?

It looks good to me. It's a somewhat surprising change, but I can see no
flaw in it.

Regards,
Martin

Eric V. Smith

2012-05-02 10:23:17 UTC

Post by Eric V. Smith
About the only question I have is: Is everyone okay with the changes to
the finders, described in the PEP?

It looks good to me. It's a somewhat surprising change, but I can see no
flaw in it.

Surprising in that any change to find_module is needed, or surprising
that it now returns one of {None, loader, str}?

If it's the latter: yeah, it's a little strange. But find_module knows
something that the caller needs to be told. It seemed easiest to add
another possible return type. Any other suggestions?

Eric.

PJ Eby

2012-05-02 17:06:27 UTC

Post by Eric V. Smith
If it's the latter: yeah, it's a little strange. But find_module knows
something that the caller needs to be told. It seemed easiest to add
another possible return type. Any other suggestions?

It seems quite elegant to me.

I do see one point of concern with the spec, though. At one point it says
that finders must return a path without a trailing separator, but at
another it says the package __file__ will contain a separator.

This strikes me as inconsistent, and also incompatible with
non-filesystem-based finder implementations. The import machinery *must
not* assume that import path strings are filenames, so it is wrong for the
import machinery to add a path separator that the finder did not include.

IOW, I don't think the spec can assume or guarantee anything about the
strings returned by finders: it MUST treat them as opaque strings. If this
means that there can't be any meaningful __file__ for a namespace package,
I think we will have to live with that.

The only alternative I see is to delegate the string manipulation back to
the finders, or to change the return value from a string to a (file, path)
tuple, wherein 'file' is the value to be used as __file__, and 'path' is
the value to be used in __path__.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120502/116a9478/attachment.html>

Eric V. Smith

2012-05-02 17:24:21 UTC

Post by PJ Eby
I do see one point of concern with the spec, though. At one point it
says that finders must return a path without a trailing separator, but
at another it says the package __file__ will contain a separator.
This strikes me as inconsistent, and also incompatible with
non-filesystem-based finder implementations. The import machinery *must
not* assume that import path strings are filenames, so it is wrong for
the import machinery to add a path separator that the finder did not
include.
IOW, I don't think the spec can assume or guarantee anything about the
strings returned by finders: it MUST treat them as opaque strings. If
this means that there can't be any meaningful __file__ for a namespace
package, I think we will have to live with that.

I've come to the same conclusion myself. I actually had a draft of the
PEP that removed the word "directory", at which point it becomes obvious
that you're adding a path separator to something that might not be a
path name.

Post by PJ Eby
The only alternative I see is to delegate the string manipulation back
to the finders, or to change the return value from a string to a (file,
path) tuple, wherein 'file' is the value to be used as __file__, and
'path' is the value to be used in __path__.

I don't see the value of __file__ at all in the case of namespace
packages. If it's just a hint that it's a namespace package, I think it
would be better to set __file__ to None. That would noisily break some
code that isn't likely to work anyway.

Eric.

Brett Cannon

2012-05-02 17:53:44 UTC

Problem is that None for __file__ would be a unique use here. Frozen
modules, for instance, typically say "<frozen>" for __file__. Now part of
the reason (I suspect) this is done is that this was the only way to tell
how the module was created, but with __loader__ now on all modules this is
redundant. So perhaps this fake value for __file__ is just outdated and not
worth perpetuating?

I vote for using __file__ as None as suggested and having people infer how
the module was created from __loader__.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120502/7649d6bd/attachment.html>

PJ Eby

2012-05-02 21:05:51 UTC

Either None or a missing attribute is fine with me. (One advantage to the
missing attribute is that it fails at the exact point where the inspecting
code needs fixing, whereas the None will get passed on to some other code
before the error manifests itsefl.)

By the way, I finished reading the rest of the PEP, and with regard to
auto-updating paths, I want to mention that it wasn't me who originally
brought up issues about auto-update, it was someone on Python-Dev, and the
use cases were discussed there. Also, I would challenge the argument about
it being a major block to implementation, since the implementation is
straightforward (and TONS simpler than setuptools' approach to the problem).

More to the point, though, supporting auto-updates *later* is not really an
option, since we'd be changing the rules on people, and invalidating
whatever workarounds people come up with for manually updating the path.
If namespace package __path__ objects start out as some other type than
lists, then there's no change to trip anyone up later.

I guess my point is that if we're not going to do auto-updates from the
start, it's kind of going to rule it out in the long term as well, so if
that's the intention it should be explicitly addressed. I don't want to
see it just get ruled out by default due to not being done now, and then
not being able to be done later.

That's why my earlier question was about whether it had been discussed or
not -- there was previous discussion on it in the 402 context, and it was
left as an open issue pending BDFL comment on the basic idea of 402. Since
then, the basic idea of treating init-less directories as namespace
packages has been blessed, so now it's time to get the auto-updates
yea-or-nay question ruled on as well.

The implementation is pretty trivial; see PEP 402 version of it here:

http://mail.python.org/pipermail/import-sig/2012-April/000473.html

...and the PEP 420 version is even simpler, since instead of looking for a
'get_subpath()' method on the finders, it should just call find_module()
and check for a string return.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120502/10cef13c/attachment.html>

Eric V. Smith

2012-05-03 00:58:27 UTC

Post by Eric V. Smith
I don't see the value of __file__ at all in the case of namespace
packages. If it's just a hint that it's a namespace package, I think it
would be better to set __file__ to None. That would noisily break some
code that isn't likely to work anyway.
Either None or a missing attribute is fine with me. (One advantage to
the missing attribute is that it fails at the exact point where the
inspecting code needs fixing, whereas the None will get passed on to
some other code before the error manifests itsefl.)

I can go either way on this, but would lean toward __file__ not being
set. Brett: what's your opinion?

Post by Eric V. Smith
By the way, I finished reading the rest of the PEP, and with regard to
auto-updating paths, I want to mention that it wasn't me who originally
brought up issues about auto-update, it was someone on Python-Dev, and
the use cases were discussed there. Also, I would challenge the
argument about it being a major block to implementation, since the
implementation is straightforward (and TONS simpler than setuptools'
approach to the problem).
I guess my point is that if we're not going to do auto-updates from the
start, it's kind of going to rule it out in the long term as well, so if
that's the intention it should be explicitly addressed. I don't want to
see it just get ruled out by default due to not being done now, and then
not being able to be done later.

Okay. I'll take a look at it tomorrow to see what's involved and if
we're backing ourselves into a corner or not.

Thanks.

Eric.

Barry Warsaw

2012-05-03 01:23:55 UTC

I can go either way on this, but would lean toward __file__ not being
set. Brett: what's your opinion?

I rather like __file__ not existing, although I haven't really thought about
the practical effects. PJE makes a good argument though.

-Barry

PJ Eby

2012-05-03 04:37:25 UTC

Post by Eric V. Smith
I don't see the value of __file__ at all in the case of namespace
packages. If it's just a hint that it's a namespace package, I

think it

Post by Eric V. Smith
would be better to set __file__ to None. That would noisily break

some

Post by Eric V. Smith
code that isn't likely to work anyway.
Either None or a missing attribute is fine with me. (One advantage to
the missing attribute is that it fails at the exact point where the
inspecting code needs fixing, whereas the None will get passed on to
some other code before the error manifests itsefl.)

I can go either way on this, but would lean toward __file__ not being
set. Brett: what's your opinion?

I rather like __file__ not existing, although I haven't really thought about
the practical effects. PJE makes a good argument though.

There's a counterargument that I realized later: PEP 302 currently requires
that __file__ be set, AND that it be a string. "The privilege of not
having a __file__ attribute at all is reserved for built-in modules."

(Of course, that argues equally against __file__ being None, so I'm not
sure it helps any to point that out!)

Still, code that expects to do something with a package's __file__ is
*going* to break somehow with a namespace package, so it's probably better
for it to break sooner rather than later.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120503/ffa02e84/attachment-0001.html>

Nick Coghlan

2012-05-03 06:23:34 UTC

Post by PJ Eby
Still, code that expects to do something with a package's __file__ is
*going* to break somehow with a namespace package, so it's probably better
for it to break sooner rather than later.

My own preference is for markers like "<frozen>", "<namespace>" and "<builtin>".

They're significantly nicer to deal with when dumping module state for
diagnostic purposes. If I get a KeyError on __file__, or an
AttributeError on NoneType when all I'm trying to do is display data,
it's annoying.

Standardising on a pattern also opens up the possibility of doing
something meaningful with it in get_data() later. One of the
guarantees of PEP 302 if that you should be able to do this:

data_ref = os.path.join(__file__, relative_ref)
data = __loader__.get_data(data_ref)

That should really only blow up in get_data(), *not* on the
os.path.join step. Ideally, you should also be able to do this:

data_ref = os.path.join(mod.__file__, relative_ref)
data = mod.__loader__.get_data(data_ref)

I see it as being similar to the mandatory file attribute on code
objects - placeholders like "<stdin>" and "<string>" are a lot more
informative when errors occur than just using None, even though
neither of them is a valid filesystem path.

Cheers,
Nick.

--
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

Brett Cannon

2012-05-03 14:48:43 UTC

Post by PJ Eby
Still, code that expects to do something with a package's __file__ is
*going* to break somehow with a namespace package, so it's probably

better

Post by PJ Eby
for it to break sooner rather than later.

I'm going to roll my replies all into this email to keep things simple.

So, to the people not wanting to set __file__, that (probably) won't fly
because it has been documented for years that built-in modules are the only
things that don't define __file__. Or we at least need to explain to people
how to tell the difference in a backwards-compatible fashion (e.g.
``module.__name__ in sys.builtin_module_names``).

Post by Nick Coghlan
My own preference is for markers like "<frozen>", "<namespace>" and "<builtin>".

So I would have said that had experience with the stdlib not big me on
this. In my situation, the trace module was checking file, and if __file__
didn't contain "<frozen>" or "<doctest" it would try to read it as a path,
and then error out if it couldn't open the file. Now I updated it to
startswith('<') and endswith('>'), but I wonder how many people made a
similar whitelist approach. And while having __file__ to None or
non-existent will take about the same amount of time to fix, it is less
prone to silly whitelisting like what the trace module had.

Post by Nick Coghlan
They're significantly nicer to deal with when dumping module state for
diagnostic purposes. If I get a KeyError on __file__, or an
AttributeError on NoneType when all I'm trying to do is display data,
it's annoying.
Standardising on a pattern also opens up the possibility of doing
something meaningful with it in get_data() later. One of the
data_ref = os.path.join(__file__, relative_ref)
data = __loader__.get_data(data_ref)
That should really only blow up in get_data(), *not* on the
data_ref = os.path.join(mod.__file__, relative_ref)
data = mod.__loader__.get_data(data_ref)
I see it as being similar to the mandatory file attribute on code
objects - placeholders like "<stdin>" and "<string>" are a lot more
informative when errors occur than just using None, even though
neither of them is a valid filesystem path.

But that's because there are no other introspection options to tell where
the module originated, unlike modules which have __loader__.

Post by Nick Coghlan
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
_______________________________________________
Import-SIG mailing list
Import-SIG at python.org
http://mail.python.org/mailman/listinfo/import-sig

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120503/f0661c26/attachment.html>

Brett Cannon

2012-05-03 15:09:10 UTC

Post by PJ Eby
Still, code that expects to do something with a package's __file__ is
*going* to break somehow with a namespace package, so it's probably

better

Post by PJ Eby
for it to break sooner rather than later.

I'm going to roll my replies all into this email to keep things simple.
So, to the people not wanting to set __file__, that (probably) won't fly
because it has been documented for years that built-in modules are the only
things that don't define __file__. Or we at least need to explain to people
how to tell the difference in a backwards-compatible fashion (e.g.
``module.__name__ in sys.builtin_module_names``).

Post by Nick Coghlan
My own preference is for markers like "<frozen>", "<namespace>" and "<builtin>".

So I would have said that had experience with the stdlib not big me on
this.

That should say "So I would have agreed with that had my experience with
the stdlib in bootstrapping importlib not caused me to disagree."

Don't try to multi-task at work while in the middle of writing an email is
the lesson there. =)

-Brett

In my situation, the trace module was checking file, and if __file__ didn't

Post by Brett Cannon
contain "<frozen>" or "<doctest" it would try to read it as a path, and
then error out if it couldn't open the file. Now I updated it to
startswith('<') and endswith('>'), but I wonder how many people made a
similar whitelist approach. And while having __file__ to None or
non-existent will take about the same amount of time to fix, it is less
prone to silly whitelisting like what the trace module had.

But that's because there are no other introspection options to tell where
the module originated, unlike modules which have __loader__.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120503/3fa32c6b/attachment-0001.html>

Barry Warsaw

2012-05-03 16:15:41 UTC

Post by Brett Cannon
So, to the people not wanting to set __file__, that (probably) won't fly
because it has been documented for years that built-in modules are the only
things that don't define __file__.

Okay, but *why* is this the rule, other than that PEP 302 says it? IOW, PEP
302 doesn't give much of a rationale for the rule, and I suspect it just
reflected the reality back in 2002.

Post by Brett Cannon
Or we at least need to explain to people how to tell the difference in a
backwards-compatible fashion.

Definitely, and I think that would be fine to include in PEP 420.

Post by Brett Cannon
So I would have said that had experience with the stdlib not big me on
this. In my situation, the trace module was checking file, and if __file__
didn't contain "<frozen>" or "<doctest" it would try to read it as a path,
and then error out if it couldn't open the file. Now I updated it to
startswith('<') and endswith('>'), but I wonder how many people made a
similar whitelist approach. And while having __file__ to None or
non-existent will take about the same amount of time to fix, it is less
prone to silly whitelisting like what the trace module had.

See what I mean about arbitrary and underdocumented? :)

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120503/11bae3e6/attachment.pgp>

Brett Cannon

2012-05-03 16:49:23 UTC

Post by Brett Cannon
So, to the people not wanting to set __file__, that (probably) won't fly
because it has been documented for years that built-in modules are the

only

Post by Brett Cannon
things that don't define __file__.

Okay, but *why* is this the rule, other than that PEP 302 says it? IOW, PEP
302 doesn't give much of a rationale for the rule, and I suspect it just
reflected the reality back in 2002.

Exactly. I am willing to be that historically it's just because that was
the only way you could tell what was or was not a built-in module.

Post by Brett Cannon
Or we at least need to explain to people how to tell the difference in a
backwards-compatible fashion.

Definitely, and I think that would be fine to include in PEP 420.

See what I mean about arbitrary and underdocumented? :)

I don't remind me about "arbitrary and underdocumented" when it comes to
the import system. =P

-Brett

Post by Barry Warsaw
Cheers,
-Barry
_______________________________________________
Import-SIG mailing list
Import-SIG at python.org
http://mail.python.org/mailman/listinfo/import-sig

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120503/11dbd7aa/attachment.html>

martin

2012-05-04 00:11:02 UTC

Okay, but *why* is this the rule, other than that PEP 302 says it?

I think it predates PEP 302 by a decade or so. You might also ask why
the keyword is "def", and not "define" (other than that the Grammar says
so). It's a natural thing, also: If the module comes from the file system,
it has an __file__ attribute, else it's built-in.

Regards,
Martin

Barry Warsaw

2012-05-04 14:51:49 UTC

Post by martin
I think it predates PEP 302 by a decade or so. You might also ask why
the keyword is "def", and not "define" (other than that the Grammar says
so). It's a natural thing, also: If the module comes from the file system,
it has an __file__ attribute, else it's built-in.

Sure, that makes sense in a 2002 world where we didn't have importlib and all
the modernization of the import system. Today, it's not only antiquated, it's
also not necessarily true. We're already significantly overhauling the import
machinery, so I think it's entirely reasonable to relax this constraint.

See my previous post for a proposal.

-Barry

Paul Moore

2012-05-04 15:16:14 UTC

Sure, that makes sense in a 2002 world where we didn't have importlib and all
the modernization of the import system. ?Today, it's not only antiquated, it's
also not necessarily true. ?We're already significantly overhauling the import
machinery, so I think it's entirely reasonable to relax this constraint.

When we wrote PEP 302, so much code assumed that modules lived in the
filesystem that we had very little room for manoeuvre, One of the
goals of PEP 302 (in my mind, at least) was to disrupt the mindset
that assumed this. Now, Brett's implementation of importlib has made
that a reality - code that assumes modules live in a filesystem should
have a really good justification for doing so (and document the
limitation, ideally). I suspect you'll still break a reasonable amount
of code like this, but that's probably OK, as it's less of a breakage,
and more of a case of the existing code not anticipating cases that
never existed before.

Post by Barry Warsaw
See my previous post for a proposal.

+1 and I'd also explicitly allow for loaders to assign other "private"
metadata as well as __file__, if only to avoid the spectre of __file__
being a base64-encoded pickled object :-)

I wonder whether treating repr specially is the best way, though -
maybe have a loader method "code_location" which is defined as being a
human-readable, but otherwise unspecified string. The key use case is
for repr, but it might be useful elsewhere (IDE tooltips or some such
usage spring to mind).

Paul.

Barry Warsaw

2012-05-04 19:52:58 UTC

Post by Paul Moore
+1 and I'd also explicitly allow for loaders to assign other "private"
metadata as well as __file__, if only to avoid the spectre of __file__
being a base64-encoded pickled object :-)

That's in PEP 420 now too.

Post by Paul Moore
I wonder whether treating repr specially is the best way, though -
maybe have a loader method "code_location" which is defined as being a
human-readable, but otherwise unspecified string. The key use case is
for repr, but it might be useful elsewhere (IDE tooltips or some such
usage spring to mind).

Maybe, but I think this is the simplest thing possible, which solves an
existing use case. :)

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120504/5f770ef1/attachment.pgp>

Nick Coghlan

2012-05-03 22:20:16 UTC

I'd still prefer to just officially bless the existing "<whatever>"
convention for non-filesystem imports over encouraging type checks on
__loader__ or defining a new introspection interface for loaders.

If we say "this is the stdlib convention" people are going to start using
the same check as is now used in traceback.py

The precedent is there with code objects, and I think it's a good example
to follow.

Cheers,
Nick.

--
Sent from my phone, thus the relative brevity :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120504/f0f3cdd6/attachment-0001.html>

Guido van Rossum

2012-05-03 22:43:40 UTC

Post by Nick Coghlan
I'd still prefer to just officially bless the existing "<whatever>"
convention for non-filesystem imports over encouraging type checks on
__loader__ or defining a new introspection interface for loaders.
If we say "this is the stdlib convention" people are going to start using
the same check as is now used in traceback.py
The precedent is there with code objects, and I think it's a good example to
follow.
Cheers,
Nick.
--
Sent from my phone, thus the relative brevity :)
_______________________________________________
Import-SIG mailing list
Import-SIG at python.org
http://mail.python.org/mailman/listinfo/import-sig

--
--Guido van Rossum (python.org/~guido)

PJ Eby

2012-05-04 00:05:15 UTC

Note that this messes with the idea of using the first directory as
filename -- anybody who joins with os.path.dirname(__file__) is going to
get a mess (on regular filesystem paths), which is (I'm guessing) why the
trailing separator idea was proposed in the first place.

Which kind of brings us full circle on that point. I suppose we could just
say screw it, anybody implementing VFS importers had darn well better
understand os.path.join and friends, since PEP 302 requires it for get_data
anyway.

Still seems like a wart, but oh well. OTOH, maybe it's better for people
munging __file__ to get a weird error all the time with namespace packages,
instead of something that works some of the time, and fails later?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120503/f9266230/attachment.html>

Nick Coghlan

2012-05-04 01:05:16 UTC

Yep. It also means VFS importers are officially free to put all the
metadata they want inside the angle brackets, secure in the knowledge
that everyone else should be treating it as an opaque blob. It then
becomes a way for them to pass necessary info to get_data() *without*
having to create distinct loader instances for every module.

Arguably, we should also be adding the angle brackets in zipimporter
(since those aren't real filesystem paths).

Still seems like a wart, but oh well.? OTOH, maybe it's better for people
munging __file__ to get a weird error all the time with namespace packages,
instead of something that works some of the time, and fails later?

Right. Otherwise we'd get layout dependent behaviour where dubious
cross-portion references worked if all portions were installed to the
same path segment, but then failed if they were split across multiple
segments.

Cheers,
Nick.

--
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

Eric V. Smith

2012-05-04 01:21:44 UTC

Post by PJ Eby
Still seems like a wart, but oh well. OTOH, maybe it's better for people
munging __file__ to get a weird error all the time with namespace packages,
instead of something that works some of the time, and fails later?

Under no circumstances should anyone be looking at __file__ for a
namespace package in order to find a related file. We should do
something that causes this to always break.

Eric.

Barry Warsaw

2012-05-04 14:56:56 UTC

Post by Nick Coghlan
Yep. It also means VFS importers are officially free to put all the
metadata they want inside the angle brackets, secure in the knowledge
that everyone else should be treating it as an opaque blob. It then
becomes a way for them to pass necessary info to get_data() *without*
having to create distinct loader instances for every module.

Ooh! I can't wait for the __file__ set to a pickle to steganographically
communicate secret messages to get_data(). :)

-Barry

Barry Warsaw

2012-05-04 14:34:50 UTC

The thing is, that convention is at best meaningless and at worst misleading.
I also don't think it gives you all the diagnosis support you really want.

The PEP 302 rule (reservation of no __file__ only for built-ins) is a
historical relic for which no good rationale exists. Forgetting that for a
moment, it simply makes no sense for a module that wasn't loaded from a file
system path to have an __file__ attribute.

It's also not true even today. At our PEP 420 sprint we noticed importlib

type(sys)('foo')

That module isn't a built-in and doesn't have an __file__. It also
doesn't have an __loader__, but oh well.

(BTW, Brett, that's pretty clever. :)

It seemed to us that the only reasonable semantics for such modules is that
__file__ is None or __file__ is missing. Not setting __file__ is better
though because you get appropriate exceptions at the place where you make the
initial mistake (i.e. assuming every module has an __file__). If you set
__file__ to None, you may instead get cryptic messages in os.path.join() for
example.

So, what about the "diagnostics" use case? Certainly a very important use
case is the repr of module objects. In the case of modules loaded from the
file system, I definitely want to know where the file lives, and the repr is a
great way to see that. For other modules, you do want to know something about
how that module was created, and having a repr that gives a good indication of
that is very useful. But you can easily do that without a contrived __file__
(more on that below).

What about other introspection use cases? Relying on __file__
programmatically might be a convenient shorthand, but knowing the loader (via
__loader__ if available) is more helpful, because that tells you more about
how that module actually came into existence.

The value of __file__ is really under the purview of the loader anyway.
Consider a hypothetical database loader (or even many different third party
database loaders). Of what use is an __file__ that says '<database>'? That
way leads to uncertainty, and namespace collisions, for example if both a
SQLite loader and a PostgreSQL loader wanted to use the '<database>' value.
In either case, maybe you'd prefer to know what the database url is, or maybe
the query that produced the module, or some combination there of.
Overloading all that into a contrived __file__ seems wrong.

I would prefer if the requirement were relaxed, and we simply allowed the
loaders to set __file__ to whatever they think is appropriate, which would
include allowing them to not setting __file__ at all.

It's actually easy to give modules a reasonable repr even without __file__. I
have a branch in the PEP 420 feature repo which implements the following rules
for module object reprs:

* Use mod.__file__ if it exists
* Otherwise, get the module's __loader__
* If the module has no loader, then just return the module's name. E.g.

type(sys)('foo')

<module 'foo'>
* Define a new optional method on loaders, called module_repr() that
takes the module as an argument. Use whatever this returns as the
module's repr.
* As a last fallback, just use the repr of the loader as part of the module's
repr.

I'm not particularly married to this implementation, but it seems reasonably
backward compatible, and flexible enough to support useful alternatives. For
example, the BuiltinImporter could define its module_repr() like so:

@classmethod
def module_repr(cls, module):
return '<module {} (built-in)>'.format(module.__name__)

Specifically, my proposed elaboration on PEP 420 is this:

* Explicitly leave the assignment of __file__ to the loader.
* Allow loaders to not set __file__
* Add an optional API to loaders, module_repr() as defined above.

Cheers,
-Barry

PJ Eby

2012-05-04 14:56:56 UTC

Post by Barry Warsaw
* Explicitly leave the assignment of __file__ to the loader.
* Allow loaders to not set __file__
* Add an optional API to loaders, module_repr() as defined above.

+1 on all the above, plus getting rid of __file__ for namespace packages.
Seems like an elegant solution to the problems involved, and allows DB or
other importers to make their own attributes like __dsn__ or __url__, but
still have a decent repr.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120504/6bb4ac0c/attachment.html>

Barry Warsaw

2012-05-04 19:11:05 UTC

Post by PJ Eby

Post by Barry Warsaw
* Explicitly leave the assignment of __file__ to the loader.
* Allow loaders to not set __file__
* Add an optional API to loaders, module_repr() as defined above.

Yes, exactly.

It seems like there's general consensus about the basic proposal; I'll update
the PEP so Guido has specific language to pronounce on.

I want to make one change to what I posted. If m.__loader__.module_repr()
exists, I want to give it a first crack at producing the repr. This means
that __file__ is used as a fallback, not as the first step.

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120504/629b8420/attachment.pgp>

Nick Coghlan

2012-05-04 15:14:13 UTC

?* Explicitly leave the assignment of __file__ to the loader.
?* Allow loaders to not set __file__
?* Add an optional API to loaders, module_repr() as defined above.

I can accept that approach on one condition: the PEP 420
implementation comes with the long-overdue migration of the definition
of the import system semantics into the language reference.

The main sticking point preventing that in the past has been that
nobody wanted to document all the caveats and special cases needed to
accurately describe CPython's behaviour. For 3.3+, no such caveats are
necessary, since Brett's importlib efforts mean that even the default
import system follows the rules.

The proposed update will require changes to the description of the
import semantics, anyway, so rather than making those changes directly
in PEP 302, it would be better to document them in the language
reference and update PEP 302 with a note to say that, for 3.3+, it is
no longer the authoritative source.

Cheers,
Nick.

--
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

Eric V. Smith

2012-05-04 15:17:13 UTC

Post by Barry Warsaw
* Explicitly leave the assignment of __file__ to the loader.
* Allow loaders to not set __file__
* Add an optional API to loaders, module_repr() as defined above.

I can accept that approach on one condition: the PEP 420
implementation comes with the long-overdue migration of the definition
of the import system semantics into the language reference.
The main sticking point preventing that in the past has been that
nobody wanted to document all the caveats and special cases needed to
accurately describe CPython's behaviour. For 3.3+, no such caveats are
necessary, since Brett's importlib efforts mean that even the default
import system follows the rules.
The proposed update will require changes to the description of the
import semantics, anyway, so rather than making those changes directly
in PEP 302, it would be better to document them in the language
reference and update PEP 302 with a note to say that, for 3.3+, it is
no longer the authoritative source.

We did discuss this yesterday at the sprint. I'm all for it, and I think
the others were, too.

I'm not keen on tying all of this to PEP 420 acceptance or rejection,
but it's not the end of the world.

Eric.

Barry Warsaw

2012-05-04 19:56:51 UTC

Post by Eric V. Smith
I'm not keen on tying all of this to PEP 420 acceptance or rejection,
but it's not the end of the world.

I think the PEP should be pronounced on before the documentation is written.
If Guido wants to make changes to the spec, it's better not to waste effort.

Are there any more open issues? Are we ready to ask Guido to pronounce?

I think the feature branch is in pretty good shape, but we can delay merging
it to the main trunk (assuming the PEP gets accepted) until we have more tests
and a first draft of the import semantics documentation. I don't mind working
in the feature branch for a little while longer.

Cheers,
-Barry

PJ Eby

2012-05-04 21:02:16 UTC

Post by Barry Warsaw
Are there any more open issues?

Maybe not on this particular subproposal, but IIUC, Eric was still looking
at the feasibility of doing auto-updates when parent paths change.

(Unless I'm mistaken, my sketch for PEP 402 should only need a bit of
hacking to allow setting the initial calculated path, so that there's not
an extra scan when a namespace package is initialized, and a change to make
it use find_module() instead of PEP 402's get_subpath(). Well, that, and
renaming "virtual packages" back to "namespace packages" in the error
messages and such.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120504/58f1a2d4/attachment.html>

Eric V. Smith

2012-05-04 21:13:47 UTC

On Fri, May 4, 2012 at 3:56 PM, Barry Warsaw <barry at python.org
Are there any more open issues?
Maybe not on this particular subproposal, but IIUC, Eric was still
looking at the feasibility of doing auto-updates when parent paths change.
(Unless I'm mistaken, my sketch for PEP 402 should only need a bit of
hacking to allow setting the initial calculated path, so that there's
not an extra scan when a namespace package is initialized, and a change
to make it use find_module() instead of PEP 402's get_subpath(). Well,
that, and renaming "virtual packages" back to "namespace packages" in
the error messages and such.)

I'm looking at it and have it mostly implemented for PEP 420. I still
need to refactor out some code so I can re-use the path-building code
that's currently in PathFinder.find_module. It looks simple enough.

Eric.

Paul Moore

2012-05-04 15:23:37 UTC

?* Explicitly leave the assignment of __file__ to the loader.
?* Allow loaders to not set __file__
?* Add an optional API to loaders, module_repr() as defined above.

I can accept that approach on one condition: the PEP 420
implementation comes with the long-overdue migration of the definition
of the import system semantics into the language reference.

That would be a *very* good idea. Whether PEP 420 should be held
hostage to this, I don't know, but I think it should be targeted as a
key item for 3.3. Just having a reference to what the language
actually guarantees would be immensely useful. I did actually try to
do this once, but my head exploded :-) (I'd be willing to help out
with it, but I don't know where it would fit in the docs - could
anyone suggest a basic location and structure, and I could try to
write some words to go into it?)

On a somewhat related note, does anyone know how well oddities like
jython's ability to import Java classes (and IronPython for .Net
classes) fit any such rules?

Paul.

Barry Warsaw

2012-05-04 19:07:37 UTC

?* Explicitly leave the assignment of __file__ to the loader.
?* Allow loaders to not set __file__
?* Add an optional API to loaders, module_repr() as defined above.

I can accept that approach on one condition: the PEP 420
implementation comes with the long-overdue migration of the definition
of the import system semantics into the language reference.

I think you were listening in our sprint Nick! :)

One of the downsides of the PEP process is that sometimes the PEP will end up
being the definitive documentation for a new feature. This sucks for many
reasons, including that PEPs don't live in the source tree and they end up
getting pretty out-of-date as time goes by.

PEP 302 suffers quite a bit from historical rot, but also from lots of
superfluous text that doesn't make it easy to understand exactly what is going
on.

At our sprint, we all agreed that it would be much better for there to be
documentation about the import system's semantics in the language reference
guide. I think "Import System" is important enough to warrant a top-level
chapter, probably either before or after "Execution Model". Section 6.11
describes the import statement, but I'd probably refactor large bits of that
into the "Import System" chapter, and leave $6.11 to describe the import
statement specifically.

I mentioned at the sprint that I'd be willing to work on such a document.
It's likely more than a one-person-operation, but I'd be happy to take a crack
at a first draft once PEP 420 gets accepted.

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120504/775eee26/attachment.pgp>

fwierzbicki

2012-05-04 16:00:52 UTC

It's also not true even today. ?At our PEP 420 sprint we noticed importlib
? ?>>> type(sys)('foo')
That module isn't a built-in and doesn't have an __file__. ?It also
doesn't have an __loader__, but oh well.
(BTW, Brett, that's pretty clever. :)

Too clever for Jython at them moment :) -- which leads me to ask:
Should I consider this a a feature of the sys module? It doesn't look
too hard to do, and I really want importlib to work when Jython starts
on Jython3 (I'm hoping to seriously start that this summer - Jython
2.7 is progressing well).

-Frank

Brett Cannon

2012-05-04 16:21:36 UTC

On Fri, May 4, 2012 at 12:00 PM, fwierzbicki at gmail.com <

Post by Barry Warsaw
It's also not true even today. At our PEP 420 sprint we noticed

importlib

type(sys)('foo')

That module isn't a built-in and doesn't have an __file__. It also
doesn't have an __loader__, but oh well.
(BTW, Brett, that's pretty clever. :)

Should I consider this a a feature of the sys module?

No, this is an ability of types.ModuleType (which I don't have access to in
importlib, so I just inlined the call). This works for any module in
CPython.

Post by Eric V. Smith
It doesn't look
too hard to do, and I really want importlib to work when Jython starts
on Jython3 (I'm hoping to seriously start that this summer - Jython
2.7 is progressing well).

I've actually been meaning to email the various VMs to have them look over
importlib to see if there are any sticking points that are obvious so we
can fix them now instead of waiting until a point release when the first VM
other than CPython tries to use importlib.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120504/53f3c195/attachment-0001.html>

fwierzbicki

2012-05-04 16:28:55 UTC

Post by fwierzbicki
Should I consider this a a feature of the sys module?

No, this is an ability of types.ModuleType (which I don't have access to in
importlib, so I just inlined the call). This works for any module in
CPython.

Ah of course, and our ModuleType works just fine for this. The Jython
sys module is fake sadly. Perhaps 3.x will be the time to finally make
it a real module... it's been a fake module with a comment at the top
to make it a real module for longer than I've been involved.

Post by fwierzbicki

type(os)('foo')

<module 'foo' (built-in)>

-Frank

Brett Cannon

2012-05-04 17:32:53 UTC

On Fri, May 4, 2012 at 12:28 PM, fwierzbicki at gmail.com <

Post by fwierzbicki

Post by fwierzbicki
Should I consider this a a feature of the sys module?

No, this is an ability of types.ModuleType (which I don't have access to

Post by Brett Cannon
importlib, so I just inlined the call). This works for any module in
CPython.

Post by fwierzbicki

type(os)('foo')

OK, so of the CPython built-in modules that importlib uses (sys, _imp,
_warnings, _io, marshal, builtins, posix/nt), which are an actual module in
Jython?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120504/9488672e/attachment-0001.html>

fwierzbicki

2012-05-04 19:44:29 UTC

Sorry for the dup Brett - I still mess up on the new gmail interface
sometimes :(

Post by Brett Cannon
OK, so of the CPython built-in modules that importlib uses (sys, _imp,
_warnings, _io, marshal, builtins, posix/nt), which are an actual module in
Jython?

I'll start with the bad:

builtins would be hard to turn into a module - however __builtin__ is
a module and works well.
nt is not likely to get implemented, we pretend nt is a posix with missing bits.

The ok:
posix is not a currently a true module, but can probably be turned
into one without too much trouble -- I will need to investigate.
_imp is not exposed as a module, but I think this will be a necessary
and acceptable step to integrate with importlib (and I don't think it
should be too hard given the benefits).

The good:
marshal and _io are already true modules.
_warnings will be when I get around to implementing it - probably next
week :) -- if I run out of time it may end up just being the same as
the python version (but that will still make it a true module).

-Frank

Eric V. Smith

2012-05-03 15:00:26 UTC

My own preference is for markers like "<frozen>", "<namespace>" and "<builtin>".

It looks like "<frozen>" is indeed used, but built in modules do not set
__file__. So I don't really see that as a precedent for setting it to
something, but I do agree with most of your points below.

While I embrace the pattern, I don't see how it could ever work for a
namespace package. The defining quality is that the namespace package
itself doesn't contain any files. And NamespaceLoader doesn't define
get_data for this reason.

Post by Nick Coghlan
I see it as being similar to the mandatory file attribute on code
objects - placeholders like "<stdin>" and "<string>" are a lot more
informative when errors occur than just using None, even though
neither of them is a valid filesystem path.

So the 4 options on the table are:
1. Add a (possibly meaningless) trailing slash character.
2. Use None.
3. Do not set it.
4. Set it to "<namespace>".

We'll discuss it today at our sprint.

PJ Eby

2012-05-03 16:11:00 UTC

Post by Nick Coghlan
Standardising on a pattern also opens up the possibility of doing
something meaningful with it in get_data() later. One of the
data_ref = os.path.join(__file__, relative_ref)
data = __loader__.get_data(data_ref)

Um, namespace package modules shouldn't have a __loader__ either, should
they?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120503/19dfcf40/attachment.html>

Brett Cannon

2012-05-03 16:47:39 UTC

Post by PJ Eby

Um, namespace package modules shouldn't have a __loader__ either, should
they?

No, they should (and PEP 302 now requires that). Namespace modules are
loaded by a loader, and thus should have it defined. It's all the other
optional interfaces that they don't need to have (e.g. NamespaceLoader
should have importlib.abc.Loader and probably none of the other ABCs).

Post by PJ Eby
_______________________________________________
Import-SIG mailing list
Import-SIG at python.org
http://mail.python.org/mailman/listinfo/import-sig

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120503/dd090958/attachment.html>

martin

2012-05-03 08:37:02 UTC

Post by Eric V. Smith
I can go either way on this, but would lean toward __file__ not being
set. Brett: what's your opinion?

I'd like to recall that we were explicitly discussion this question at
PyCon, and (IIRC) I proposed that it be None, and Guido pronounced that
it shall be the path to the first portion. So if you now want to change it,
you should check with him again.

Regards,
Martin

Eric V. Smith

2012-05-03 12:28:03 UTC

Post by martin

Post by Eric V. Smith
I can go either way on this, but would lean toward __file__ not being
set. Brett: what's your opinion?

I recall that, and I also recall advocating None.

I see the process as:

- come to a consensus here
- update the PEP, documenting this discussion
- update the implementation
- get Guido to rule on the PEP

Eric.

"Martin v. Löwis"

2012-05-02 18:32:09 UTC

Post by Eric V. Smith
About the only question I have is: Is everyone okay with the changes to
the finders, described in the PEP?

It looks good to me. It's a somewhat surprising change, but I can see no
flaw in it.

Surprising in that any change to find_module is needed, or surprising
that it now returns one of {None, loader, str}?

Both, actually. I had expected that new API (i.e. a new method of some
kind) would be necessary, so it has elegance that this is not required.
OTOH, explicit type checking is despised in the OO world, and varying
result types are disliked by Guido van Rossum (not sure whether this
reservation applies to this case as well, or only to cases where the
return type depends on the parameter types).

Regards,
Martin

Barry Warsaw

2012-05-02 18:50:05 UTC

Both, actually. I had expected that new API (i.e. a new method of some kind)
would be necessary, so it has elegance that this is not required. OTOH,
explicit type checking is despised in the OO world, and varying result types
are disliked by Guido van Rossum (not sure whether this reservation applies
to this case as well, or only to cases where the return type depends on the
parameter types).

My understanding (and I'm sure Guido will correct me if I'm wrong) is that
it's the latter: return type should not depend on function argument values.

-Barry

Brett Cannon

2012-05-02 19:40:41 UTC

Post by "Martin v. LÃ¶wis"
Both, actually. I had expected that new API (i.e. a new method of some

kind)

Post by "Martin v. LÃ¶wis"
would be necessary, so it has elegance that this is not required. OTOH,
explicit type checking is despised in the OO world, and varying result

types

Post by "Martin v. LÃ¶wis"
are disliked by Guido van Rossum (not sure whether this reservation

applies

Post by "Martin v. LÃ¶wis"
to this case as well, or only to cases where the return type depends on

the

Post by "Martin v. LÃ¶wis"
parameter types).

My understanding (and I'm sure Guido will correct me if I'm wrong) is that
it's the latter: return type should not depend on function argument values.

This is how I interpreted Guido's preference (e.g. return bytes or str
based on whether an argument(s) is bytes or str).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120502/b576d334/attachment.html>

Brett Cannon

2012-05-02 18:53:17 UTC

Post by Eric V. Smith
About the only question I have is: Is everyone okay with the changes to

Post by Eric V. Smith
the finders, described in the PEP?

It looks good to me. It's a somewhat surprising change, but I can see no
flaw in it.

Surprising in that any change to find_module is needed, or surprising
that it now returns one of {None, loader, str}?

You actually don't need to explicitly type-check and instead can rely on
duck typing::

if loader is None: continue
elif hasattr(loader, 'load_module'): return loader
else:
namespace.append(loader)
continue
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120502/a050b4d1/attachment.html>

Eric V. Smith

2012-05-02 19:28:42 UTC

Post by Brett Cannon
You actually don't need to explicitly type-check and instead can rely on
if loader is None: continue
elif hasattr(loader, 'load_module'): return loader
namespace.append(loader)
continue

While I agree that this accomplishes the job, I don't think it's any
more readable than the existing code:

if isinstance(loader, str):
namespace.append(loader)
elif loader:
return loader

(with the case of None causing the code to loop)

But I'm open to changing it.

As to the three return types: Given that find_module() has all of the
information, I don't think it makes sense to add another method. And for
backward compatibility, we need to keep the {None, loader} return types.
If you agree that adding another method is wasteful (it will have to do
most of the same work as find_module(), or cache its result), then I
think adding a str return type makes the most sense.

I can't foresee this ever causing an actual problem. No one is going to
subclass a loader from str (famous last words, I know!).

Eric.

Brett Cannon

2012-05-02 19:39:47 UTC

While I agree that this accomplishes the job, I don't think it's any
namespace.append(loader)
return loader
(with the case of None causing the code to loop)
But I'm open to changing it.

I honestly don't care. I just wanted to point out to Martin that if he
wanted a more interface check over type check it's totally doable.

Post by Eric V. Smith
As to the three return types: Given that find_module() has all of the
information, I don't think it makes sense to add another method. And for
backward compatibility, we need to keep the {None, loader} return types.
If you agree that adding another method is wasteful (it will have to do
most of the same work as find_module(), or cache its result), then I
think adding a str return type makes the most sense.
I can't foresee this ever causing an actual problem. No one is going to
subclass a loader from str (famous last words, I know!).

Just as I know PJE is going to point out that your loader test won't work
if a loader happens to be false and thus you should do an explicit ``is not
None`` check.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120502/1b5c946b/attachment.html>

Eric V. Smith

2012-05-02 19:47:37 UTC

Post by Eric V. Smith
I can't foresee this ever causing an actual problem. No one is going to
subclass a loader from str (famous last words, I know!).
Just as I know PJE is going to point out that your loader test won't
work if a loader happens to be false and thus you should do an explicit
``is not None`` check.

Good one! I'll make that change.

Antoine Pitrou

2012-05-04 22:47:11 UTC

Hello,

On Thu, 19 Apr 2012 16:18:21 -0400

Post by Eric V. Smith
This reflects (I hope!) the discussions at PyCon. My plan is to produce
an implementation based on the importlib code, and then flush out pieces
of the PEP.

I don't understand why PEP 382 was rejected. There doesn't seem to be
any obvious argument against it. The mechanism is simple, explicit and
unambiguous. As PEP 382 points out:

?At the discussion at PyCon DE 2011, people remarked that having an
explicit declaration of a directory as contributing to a package is a
desirable property, rather than an obstactle. In particular, Jython
developers noticed that Jython could easily mistake a directory that is
a Java package as being a Python package, if there is no need to
declare Python packages.?

The "directory.pyp" scheme is highly unlikely to conflict with
unrelated uses of a ".pyp" directory extension. It's also easy to use,
and avoids oddities in the lookup algorithm such as ?if the scan
completes without returning a module or package, and at least one
directory was recorded, then a namespace package is created?.

On the other hand, PEP 420 provides potential for confusion (for
example, if the standard "test" package is not installed, trying
to import it could end up importing some other arbitrary "test"
directory on the path as a namespace package), without seeming to have
any obvious advantage over PEP 382.

Unless there are clear advantages over PEP 382, I'm -1 on this PEP, and
would like to see PEP 382 revived.

Regards

Antoine.

Nick Coghlan

2012-05-05 06:27:26 UTC

Post by Antoine Pitrou
Unless there are clear advantages over PEP 382, I'm -1 on this PEP, and
would like to see PEP 382 revived.

I raised this question as well, and the PEP as written doesn't do a
great job of summarising the thread that addressed it.

There were two counterpoints raised that I found compelling:

A. Guido simply doesn't like directory extensions. I have to agree
with him that using them to handle packaging would be a weird and
unusual approach, and, well, he *does* get to play the BDFL card in
cases like this.

B. Current version control systems are still pretty abysmal when it
comes to coping with directory renames, and we want to avoid
unnecessary stumbling blocks on the migration path from the current
pkgutil.extend_path() based namespace packages to the new native
system.

With PEP 382, the migration path is:
1. delete all __init__.py files from namespace package portions
2. rename the directories for all namespace package portions to append
the ".pyp" extension

With PEP 420, the migration path is:
1. delete all __init__.py files from namespace package portions
2. there is no step 2

The extra step required by the PEP 382 approach is exactly the kind of
pointless revision history noise that PEP 414's reintroduction of
explicit Unicode literals is designed to eliminate from Python 2 to
Python 3 migrations.

Between "Guido doesn't like directory suffixes" and "version control
systems are still fairly bad at handling directory renames", I changed
my own opinion on PEP 420 from -1 to +0. If we'd been starting from a
clean slate with no language history or migration of existing projects
to account for, then my opinion would be different, but given where we
are today, I find the pragmatic argument in favour of simply losing
the explicit markers compelling.

Cheers,
Nick.

--
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

Antoine Pitrou

2012-05-05 10:33:03 UTC

On Sat, 5 May 2012 16:27:26 +1000

Post by Antoine Pitrou
Unless there are clear advantages over PEP 382, I'm -1 on this PEP, and
would like to see PEP 382 revived.

I raised this question as well, and the PEP as written doesn't do a
great job of summarising the thread that addressed it.
A. Guido simply doesn't like directory extensions. I have to agree
with him that using them to handle packaging would be a weird and
unusual approach, and, well, he *does* get to play the BDFL card in
cases like this.

Well, I agree that "foo.pyp" isn't very pretty, but that's a pretty
minor argument. At least it's explicit.
(of course, another marker could have been chosen: for example
an empty "foo/__namespace__.py", or whatever else floats our
boat of aesthetics)

Post by Nick Coghlan
B. Current version control systems are still pretty abysmal when it
comes to coping with directory renames, and we want to avoid
unnecessary stumbling blocks on the migration path from the current
pkgutil.extend_path() based namespace packages to the new native
system.

Isn't that baseless? AFAIU all modern DVCS should cope correctly with
a directory rename. Even SVN may be ok.

If anything, I'd like to see data points about these "current version
control systems" being "pretty abysmal [!] when it comes to coping
with directory renames".

(preferably something else than a 2007 rant by Mark Shuttleworth in
order to justify bzr's existence :-))

Post by Nick Coghlan
The extra step required by the PEP 382 approach is exactly the kind of
pointless revision history noise that PEP 414's reintroduction of
explicit Unicode literals is designed to eliminate from Python 2 to
Python 3 migrations.

Except that noone *has* to migrate to namespace packages. These are
fairly rare and only useful for a couple of big projects.
(I've only heard about Zope using them; Twisted AFAICT doesn't)

Even then, renaming a directory is hardly comparable to the hurdle of
migrating unicode literals from Python 2 to Python 3. The analogy
sounds melodramatic.

Post by Nick Coghlan
Between "Guido doesn't like directory suffixes" and "version control
systems are still fairly bad at handling directory renames", I changed
my own opinion on PEP 420 from -1 to +0.

This doesn't address PEP 420's issues, which will still come to bite us
in 10 years: the potential for confusion, the weirdness of the lookup
algorithm.

Post by Nick Coghlan
If we'd been starting from a
clean slate with no language history or migration of existing projects
to account for, then my opinion would be different, but given where we
are today, I find the pragmatic argument in favour of simply losing
the explicit markers compelling.

The real pragmatic argument would be to avoid creating maintenance and
support issues for the future, IMO.

Regards

Antoine.

Nick Coghlan

2012-05-05 12:12:51 UTC

Post by Antoine Pitrou
If anything, I'd like to see data points about these "current version
control systems" being "pretty abysmal [!] when it comes to coping
with directory renames".

It's really irrelevant. The real deciding factor is that Guido didn't
like the scheme proposed in PEP 382, so he rejected it.

However, my personal experience with both git and hg is that renaming
files still generates an awful lot of diff noise - none of them have
formal rename, they still fake it with "remove and add" the same way
subversion does. "abysmal" is really too strong a word (they're much
better than CVS), but it's still a far cry from formal rename
tracking.

Since we *want* people to eventually drop their custom namespace
package systems in favour of the standard one, it makes sense to make
the migration path as smooth as possible. Requiring people to do a
mass rename of files makes it unnecessarily difficult for them to make
that transition.

Post by Antoine Pitrou
This doesn't address PEP 420's issues, which will still come to bite us
in 10 years: the potential for confusion, the weirdness of the lookup
algorithm.

Believe me, I sympathise - PEP 420 getting accepted is going to mean I
have to make some fairly major changes to PEP 395 before I can propose
it for 3.4. However, the proposed mechanism in PEP 420 basically just
brings Python's import system into line with the way that C, Java,
Perl, etc all already work, so I predict the "maintenance and support
issues for the future" as a result of this change aren't going to be
severe (particularly once I revise PEP 395 to be primarily a proposal
for better error reporting in various error cases relating to
__main__). I've also started Tools/scripts/import_diagnostics.py -
initially just to help me while trying to eliminate the
_frozen_importlib vs importlib._bootstrap duplication, but longer term
I hope to see some more sophisticated commands get added so that
people can easily get better info if their imports start doing strange
things.

After the last discussion, I now believe that accepting *either* PEP
382 or 420 will lead to an acceptable long term outcome. While my own
preferences still favour the explicit approach in PEP 382, I can also
acknowledge that PEP 420 has its own attractive features, most notably
that it:
- is more consistent with the module systems of other languages
- has a greater chance of completely displacing existing namespace
package mechanisms in the long term
- is significantly more intuitive than PEP 382, since almost nothing
else uses directory extensions, so any scheme relying on them is going
to feel awkward and unintuitive to beginners and veterans alike (and
we can't use a shared marker file, since getting rid of __init__.py is
the entire point of these PEPs, and using a *set* of marker files with
a common extension clutters the filesystem and means we have to do
pattern matching on directory listings during import instead of being
able to use simple stat calls and exact string matches).

Cheers,
Nick.

--
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

Antoine Pitrou

2012-05-05 12:32:26 UTC

On Sat, 5 May 2012 22:12:51 +1000

Post by Nick Coghlan
It's really irrelevant. The real deciding factor is that Guido didn't
like the scheme proposed in PEP 382, so he rejected it.
However, my personal experience with both git and hg is that renaming
files still generates an awful lot of diff noise - none of them have
formal rename, they still fake it with "remove and add" the same way
subversion does.

That doesn't seem to make any difference in practice:

$ cat a
a\nb\n
$ hg mv a b
$ hg di
diff --git a/a b/b
rename from a
rename to b

(there's no "awful lot of diff noise" above)

Post by Nick Coghlan
"abysmal" is really too strong a word (they're much
better than CVS), but it's still a far cry from formal rename
tracking.

Well, you should come up with well-defined situations where this is
a problem, or you are making a purity argument.

(I'm still baffled that FUD about VCS capabilities has a weight in the
discussion of a Python PEP; yes, they're much better than CVS :-))

Post by Nick Coghlan
Requiring people to do a
mass rename of files makes it unnecessarily difficult for them to make
that transition.

Renaming a directory should not be "unnecessarily difficult" by any
stretch of the word, especially for a developer of something as large
as a project requiring namespace packages.

Any x.y -> x.y+1 transition is harder than renaming a directory for any
such large Python project.

Post by Nick Coghlan
However, the proposed mechanism in PEP 420 basically just
brings Python's import system into line with the way that C, Java,
Perl, etc all already work, so I predict the "maintenance and support
issues for the future" as a result of this change aren't going to be
severe

Python's import system is different from these languages', so the
implications are not the same either. The very fact that PEP 420 has to
propose a deferred detection of namespace packages compared to other
kinds of importable objects (modules, packages) proves it.

Post by Nick Coghlan
- is significantly more intuitive than PEP 382, since almost nothing
else uses directory extensions, so any scheme relying on them is going
to feel awkward and unintuitive to beginners and veterans alike (and
we can't use a shared marker file, since getting rid of __init__.py is
the entire point of these PEPs, and using a *set* of marker files with
a common extension clutters the filesystem and means we have to do
pattern matching on directory listings during import instead of being
able to use simple stat calls and exact string matches).

"clutters the filesystem"? We're talking about a little-used feature
here.

As for "simple stat calls" instead of "directory listings", I suggest
you take a look at current importlib, because it uses directory
listings in order to avoid stat calls :-)

Regards

Antoine.

Nick Coghlan

2012-05-05 13:18:20 UTC

Post by Antoine Pitrou
On Sat, 5 May 2012 22:12:51 +1000

$ cat a
a\nb\n
$ hg mv a b
$ hg di
diff --git a/a b/b
rename from a
rename to b
(there's no "awful lot of diff noise" above)

Now rename zope/ to zope.pyp/ in a full Zope checkout and see how much
noise you get.

Besides, I have yet to have any VCS (git and hg included) get a rename
right. My bad experiences with renames is one element that has helped
me to come to terms with the fact that PEP 382 is dead and PEP 420 is
going to replace it.

If your experiences differ, then fine, that's not going to help you
accept the decision. But it doesn't matter *how* you come to terms
with it, only that you do. That's really the only option here: Guido
has flat out rejected PEP 382 because he doesn't like the idea of
directory extensions. It's not coming back.

However, the PEP 420 authors should probably take note that the two
most interested people that weren't in the room at the language summit
still don't find that the PEP text explains the situation all that
well :)

Cheers,
Nick.

--
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

Antoine Pitrou

2012-05-05 13:37:12 UTC

On Sat, 5 May 2012 23:18:20 +1000

Post by Nick Coghlan
If your experiences differ, then fine, that's not going to help you
accept the decision. But it doesn't matter *how* you come to terms
with it, only that you do. That's really the only option here: Guido
has flat out rejected PEP 382 because he doesn't like the idea of
directory extensions. It's not coming back.

Then perhaps PEP 420 should be rejected too, because of the
complication it introduces.

I've done a Google code search and there doesn't seem to be much more
than a dozen projects using namespace packages (Zope, pygraph, a couple
of others):

http://code.google.com/codesearch#search&q=lang:python+declare_namespace
http://code.google.com/codesearch#search&q=lang:python+extend_path

The current idiom is not extremely pretty but it works, and it doesn't
seem to cause much trouble. The fact that setuptools proposes a
different idiom from pkgutil's is not due to the idiom itself, but
probably historical reasons: both idioms require a single import and a
single function call, so they are similarly expressive.

The lack-of-prettiness argument is quite underwhelming when there are
so few projects using namespace packages; and this is not something you
see when you only *use* the package, rather than develop it.

Regards

Antoine.

Eric V. Smith

2012-05-05 13:45:50 UTC

Post by Antoine Pitrou
On Sat, 5 May 2012 23:18:20 +1000
I've done a Google code search and there doesn't seem to be much more
than a dozen projects using namespace packages (Zope, pygraph, a couple
From my experience, they're used extensively inside companies. Three

unrelated companies I've worked at use "company_name" as their top-level
namespace package.

Eric.

Nick Coghlan

2012-05-05 13:55:23 UTC

Post by Antoine Pitrou
The lack-of-prettiness argument is quite underwhelming when there are
so few projects using namespace packages; and this is not something you
see when you only *use* the package, rather than develop it.

No, it's a chicken-and-egg problem. Yes, namespace packages *are*
possible now, but they're a PITA to coordinate (everybody has to play
by the rules and put the right magic incantation in their __init__.py
files). So, people avoid them because they're a pain, not because
they're necessarily a bad idea (when used appropriately).

However, the problem isn't with the concept of namespace packages,
it's with the current awkward *implementation*.

Both PEP 382 and 420 fix the ugliness problem and bring namespace
packages up to a standard where I'd be happy seeing them used in the
standard library (MvL has proposed that "encodings" would be a good
candidate for that, and I'm inclined to agree).

A clean collaborative namespace system also helps with the evolution
of informal taxonomies on PyPI. You can see this on CPAN, where file
related modules are all in File::, email related ones are in Email::,
etc. At the moment, pretty much everything ends up being a top-level
module on PyPI, *because* namespace packages are so awkward and
unintuitive.

If those of us that do stdlib backports like contextlib2, unittest2
and distutils2 could just as easily publish backports.contextlib,
backports.unittest and backports.packaging, that would make it *much*
clearer to the world what is going on. None of us are willing to do
that at the moment, because we'd have to coordinate the installation
of backports.__init__, instead of being able to just include an
additional directory in our path names.

Cheers,
Nick.

--
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

Antoine Pitrou

2012-05-05 14:23:11 UTC

On Sat, 5 May 2012 23:55:23 +1000

Post by Nick Coghlan
A clean collaborative namespace system also helps with the evolution
of informal taxonomies on PyPI. You can see this on CPAN, where file
related modules are all in File::, email related ones are in Email::,
etc. At the moment, pretty much everything ends up being a top-level
module on PyPI, *because* namespace packages are so awkward and
unintuitive.

"Flat is better than nested" would indicate this is a virtue.

The stdlib's experiments with nested namespaces (e.g. urllib.request)
have turned out quite unpractical and clumsy IMHO.

Also, namespace packages have an authority problem: what happens if two
namespaces packages both define e.g. "foo/bar.py"? It works when you
have a central body, such as Zope or Eric's companies, but otherwise?

Post by Nick Coghlan
If those of us that do stdlib backports like contextlib2, unittest2
and distutils2 could just as easily publish backports.contextlib,
backports.unittest and backports.packaging, that would make it *much*
clearer to the world what is going on.

Would it? Why would it go into the "backports" package? Why favour this
category over another (e.g. "testing.unittest")?
You're soon gonna re-discover the limitations of hierarchical
classification :-)

Post by Nick Coghlan
None of us are willing to do
that at the moment, because we'd have to coordinate the installation
of backports.__init__, instead of being able to just include an
additional directory in our path names.

Would you? You could just settle on the standard pkgutil boilerplate in
__init__.py.

Regards

Antoine.

Eric V. Smith

2012-05-05 14:37:39 UTC

Post by Antoine Pitrou
On Sat, 5 May 2012 23:55:23 +1000

Would you? You could just settle on the standard pkgutil boilerplate in
__init__.py.

The typical problem here is for system packagers (RPM, DEB, ...). The
shared __init__.py has to be removed from each individual package and
placed in a standalone package that all of the other packages have to
depend on. That's a lot of hassle, and one more roadblock to using
namespace packages. setuptools is some help here, but many people object
to using it.

All of the namespace PEPs address this problem by having no file that's
shared among all of the portions (to use the 382 and 420 term).

I think there's wide agreement that the import machinery should
understand namespace packages. You (Antoine) seem to be arguing against
it, but it's pretty well settled, if the PyCon discussions are
representative (which they may not be).

Eric.

Yury Selivanov

2012-05-05 16:06:48 UTC

Post by Nick Coghlan
Now rename zope/ to zope.pyp/ in a full Zope checkout and see how much
noise you get.

Why can't we modify whatever PEP to simply mark namespace package
with '__init__.pyp' or some other special file? Why rename directories,
introduce ugly suffixes, deal with all the weirdness of importing
just plain directories and guessing that they are namespace packages,
ignoring content in __init__.py etc, instead of plain simple file
marker?

In terms of steps (as Nick illustrated):

With PEP 382, the migration path is:
1. delete all __init__.py files from namespace package portions
2. rename the directories for all namespace package portions to append
the ".pyp" extension

With PEP 420, the migration path is:
1. delete all __init__.py files from namespace package portions
2. there is no step 2

With a marker:
1. $ mv __init__.py __init__.pyp
2. there is no step 2

The first step can be even replaced with
'$ rm __init__.py && touch __init__.pyp', as current __init__.py files
of namespace packages contain only '__path__ = extend_path(__path__ ...)'
crap.

-
Yury

Eric V. Smith

2012-05-05 16:20:24 UTC

Post by Yury Selivanov

Post by Nick Coghlan
Now rename zope/ to zope.pyp/ in a full Zope checkout and see how much
noise you get.

Because it doesn't solve the problem of wanting to distribute namespace
packages in pieces, using platform package managers, and installing them
all into the same directory. If you do this, your __init__.pyp would
need to be shipped with each portion's .rpm or .deb file. Platform
package managers don't typically like a single file being included with
multiple packages. You can factor it out into yet another package, but
then you need to have every namespace package portion depend on it.

This is described in PEP 420, and I think also 382.

Eric.

PJ Eby

2012-05-05 16:31:09 UTC

I just want to chime in at this point that PEP 402 actually provides
rationale to answer a lot of the questions that are coming up in this
thread, and which are still valid in a PEP 420 world. While some folks
have complained about PEP 402's length, they're mostly people who were
already present for all those discussions and hashing out of rationales.
;-) (I actually wrote 402 with the intent of answering as many as possible
of these objections in advance, hence the length.)

(On a more serious note, it might help to crib some bits of 402's rationale
arguments into 420, so that we don't have to keep answering already-dead
proposals that keep coming up, like, "why can't you just add a special file
named xyz to fix this".)

Post by Yury Selivanov

Post by Nick Coghlan
Now rename zope/ to zope.pyp/ in a full Zope checkout and see how much
noise you get.

Why can't we modify whatever PEP to simply mark namespace package
with '__init__.pyp' or some other special file? Why rename directories,
introduce ugly suffixes, deal with all the weirdness of importing
just plain directories and guessing that they are namespace packages,
ignoring content in __init__.py etc, instead of plain simple file
marker?
1. delete all __init__.py files from namespace package portions
2. rename the directories for all namespace package portions to append
the ".pyp" extension
1. delete all __init__.py files from namespace package portions
2. there is no step 2
1. $ mv __init__.py __init__.pyp
2. there is no step 2
The first step can be even replaced with
'$ rm __init__.py && touch __init__.pyp', as current __init__.py files
of namespace packages contain only '__path__ = extend_path(__path__ ...)'
crap.
-
Yury
_______________________________________________
Import-SIG mailing list
Import-SIG at python.org
http://mail.python.org/mailman/listinfo/import-sig

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120505/b0a7d01f/attachment.html>

martin

2012-05-05 18:57:42 UTC

Post by Yury Selivanov
Why can't we modify whatever PEP to simply mark namespace package
with '__init__.pyp' or some other special file?

That file name would not work, as then portions of the namespace would
all install the same file, which causes conflicts in platform packaging
tools (if the portions get installed into the same sys.path entry).

Post by Yury Selivanov
Why rename directories,
introduce ugly suffixes, deal with all the weirdness of importing
just plain directories and guessing that they are namespace packages,
ignoring content in __init__.py etc, instead of plain simple file
marker?

Hence the current PEP doesn't propose to rename directories, and
does not introduce ugly suffixes. As for the weirdness of importing
just plain directories: yes, it does that.

Post by Yury Selivanov
1. delete all __init__.py files from namespace package portions
2. rename the directories for all namespace package portions to append
the ".pyp" extension

Please understand that an earlier version of the PEP did indeed
propose to use marker files instead of directories. You are, of
course, free to reiterate four years of discussion in a single
week, but please do familiarize yourself with the matter first.

After that, you likely have to write a PEP if you want your
idea to be seriously considered.

Regards,
Martin

Barry Warsaw

2012-05-05 19:32:25 UTC

Post by martin
Hence the current PEP doesn't propose to rename directories, and
does not introduce ugly suffixes. As for the weirdness of importing
just plain directories: yes, it does that.

Of course, the parents of directories have to be on sys.path, so it's not
*that* weird. ;)

-Barry

Eric V. Smith

2012-05-05 12:38:13 UTC

Post by Antoine Pitrou
If anything, I'd like to see data points about these "current version
control systems" being "pretty abysmal [!] when it comes to coping
with directory renames".

It's really irrelevant. The real deciding factor is that Guido didn't
like the scheme proposed in PEP 382, so he rejected it.

Right. I think arguing about VCS capabilities is pointless. You'll need
to convince Guido, instead.

Eric.

"Martin v. Löwis"

2012-05-07 08:01:47 UTC

Post by Antoine Pitrou
Unless there are clear advantages over PEP 382, I'm -1 on this PEP, and
would like to see PEP 382 revived.

When I started this project four years ago, I didn't know how involved
it would get. At first, there was little interest in it, but the more
details were discussed, the more opinions appeared. It eventually lead
to PEPs, counter-PEPs, superceded PEPs. I had PEP czars signed up which
then resigned in the face of having to make a difficult decision.

At this point, I'm happy to have the PEP process. Guido will pronounce
on a PEP, and will (as usual) take both community feedback and his own
intuition into account. So there will be a decision, and then the
community will have to accept it (or else fork Python :-)

So while it is fine that people vote in favor or against individual
PEPs or selected features, they also need to realize that this may
not affect the outcome. Even writing yet another PEP likely will not
affect the outcome. While I'm honored with the support, I personally
have accepted that Guido has made up his mind on this specific detail.

Looking back, I also highly appreciate PJE's pioneering of all this
in setuptools (despite still disagreeing on many other aspects of
setuptools).

Regards,
Martin

PJ Eby

2012-05-07 13:42:05 UTC