Skip to content

Improved support for navigating trees:#903

Merged
jdavid merged 21 commits into
libgit2:masterfrom
koordinates:better-tree-nav
Jun 26, 2019
Merged

Improved support for navigating trees:#903
jdavid merged 21 commits into
libgit2:masterfrom
koordinates:better-tree-nav

Conversation

@rcoup

@rcoup rcoup commented Apr 26, 2019

Copy link
Copy Markdown
Contributor

Navigating tree objects is pretty crude via Pygit2 at the moment, requiring a lot of boilerplate lookups.

This change incorporates the same / operator approach as Pathlib and GitPython:

  • my_tree / 'path' resolves to a TreeEntry (same as my_tree['path'] currently)
  • If a TreeEntry is of type tree, then:
    • my_tree_entry / 'path' resolves to a child TreeEntry
    • my_tree_entry['path'] and my_tree_entry[0] similarly
    • 'path' in my_tree_entry returns a boolean.
  • this enables my_tree / 'path' / 'path' navigation

Ideally I'd suggest that maybe Tree should probably be a subclass of TreeEntry so it kinda magically works how the user expects, but that would require quite a few changes.

@jdavid

jdavid commented Apr 27, 2019

Copy link
Copy Markdown
Member

Looks good! Could you fix the tests in Travis? Don't worry about AppVeyor, they're broken for a different reason.
And update the documentation please. Thanks!

@rcoup

rcoup commented Apr 30, 2019

Copy link
Copy Markdown
Contributor Author

Been using this the last couple of days... it's much better to navigate the tree, but still a bit awkward to actually get to blobs.

Before:

meta_tree = repo[commit.tree['meta'].id]
features_tree = repo[meta_tree['features'].id]
info_blob = repo[features_tree['info.json'].id]
info = json.loads(info_blob.data)

Now:

feature_info = commit.tree / 'meta' / 'features' / 'info.json'
info = json.loads(repo[feature_info.id].data)

With latest commit incorporating .blob and .tree properties.

feature_info = commit.tree / 'meta' / 'features' / 'info.json'
info = json.loads(feature_info.blob.data)

Thoughts?

Submodules end up with a TreeEntry.type=commit, but I can't find a way to lookup the submodule from the oid, only from the absolute path (which we don't have in the TreeEntry, only the relative name) — so I've left that alone.

@rcoup

rcoup commented Apr 30, 2019

Copy link
Copy Markdown
Contributor Author

@jdavid can you give the ref-counting/free'ing a review?

  1. I'm Incrementing the repo refcount when it's associated with a TreeEntry
  2. and clearing it when the TreeEntry is dealloc'd.
  3. AFAICT using the various wrap_ methods means the libgit2-alloc'd items will get cleaned up
  4. I'm freeing the temporary git_tree subtrees I use in getitem/truediv

@rcoup rcoup force-pushed the better-tree-nav branch from 9fe3896 to 0613677 Compare April 30, 2019 13:55
Comment thread src/tree.c Outdated
Comment thread src/tree.c Outdated
@jdavid

jdavid commented May 4, 2019

Copy link
Copy Markdown
Member

Good work! The ref-counting/free'ing looks correct. Just made a couple of tiny comments.

About the user interface:

  • Would it be simpler for the user to have treeentry.obj instead of treentry.blob and treeentry.tree?
  • Why not to support treeentry / 0? That would make _getitem and _truediv to behave the same.

I think these changes would be nice for the user, and they would simplify a bit the code as well.

@dsully

dsully commented May 7, 2019

Copy link
Copy Markdown

These changes look fantastic!

@rcoup

rcoup commented May 8, 2019

Copy link
Copy Markdown
Contributor Author

I considered both of these, I'll add my thoughts below. Happy to be challenged.

Would it be simpler for the user to have treeentry.obj instead of treentry.blob and treeentry.tree?

I like the elegance, but I can't think of an actual use-case for it? Seems like either I'd expect a blob or another tree for any particular case? "Explicit is better than implicit" and all.

Commits are another type possibility wrt submodules, but I've left that for someone else when they need it.

Why not to support treeentry / 0? That would make _getitem and _truediv to behave the same.

If I'm iterating through everything in a directory then I'd always use an iterator or for/getitem. I know it makes it simpler for the implementation, but can't see how it helps the user?

@jdavid

jdavid commented May 10, 2019

Copy link
Copy Markdown
Member

The fact that these two changes make the code simpler is a big advantage. It's difficult to stress it enough, 150+ developers have contributed changes to pygit2, and today I maintain those changes/code (and when I leave someone else will do).

On the other side, the fact that we cannot imagine a use case, doesn't mean it doesn't exist, and more importantly it doesn't hurt, it's a feature superset anyway.

@rcoup

rcoup commented Jun 6, 2019

Copy link
Copy Markdown
Contributor Author

@jdavid sorry, been distracted with other work. I've made both the changes you suggested. 👍

rcoup added 8 commits June 10, 2019 11:14
- `my_tree / 'path'` resolves to a TreeEntry (same as `my_tree['path']`)
- If `my_tree_entry` is of type `tree`: `my_tree_entry / 'path'`, `my_tree_entry['path']`, and `my_tree_entry[0]` all resolve to another TreeEntry (same as `repo[my_tree_entry.id]['path']`)

This enables `my_tree / 'path' / 'path'` navigation ala Pathlib & GitPython.
Saves doing another `repo[tree_entry.id]` lookup

Tree navigation: `blob = (tree / 'path' / 'file').blob`
- drop .blob & .tree in favour of .obj
- use getitem implementation for truediv: allows `T / 'a' / 0` to correspond to `T['a'][0]`
@rcoup rcoup force-pushed the better-tree-nav branch from b7859ce to dc3d86e Compare June 10, 2019 10:15
@rcoup

rcoup commented Jun 10, 2019

Copy link
Copy Markdown
Contributor Author

Rebased on latest master

@rcoup

rcoup commented Jun 10, 2019

Copy link
Copy Markdown
Contributor Author

Don't worry about AppVeyor, they're broken for a different reason.

hmm, not so sure about that?

...
C:\Users\appveyor\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -IC:\projects\pygit2\venv\include -IC:\Python27\include -IC:\Python27\PC /Tcsrc\tree.c /Fobuild\temp.win32-2.7\Release\src\tree.obj
tree.c
src\tree.c(199) : error C2143: syntax error : missing ';' before 'type'
src\tree.c(200) : error C2065: 'err' : undeclared identifier
src\tree.c(201) : error C2065: 'err' : undeclared identifier
src\tree.c(208) : error C2143: syntax error : missing '{' before '*'
src\tree.c(220) : error C2143: syntax error : missing ';' before 'type'
src\tree.c(221) : error C2065: 'err' : undeclared identifier
src\tree.c(222) : error C2065: 'err' : undeclared identifier
src\tree.c(226) : warning C4133: 'return' : incompatible types - from 'PyObject *' to 'int *'
src\tree.c(240) : warning C4133: 'return' : incompatible types - from 'int *' to 'PyObject *'
src\tree.c(385) : warning C4028: formal parameter 1 different from declaration
src\tree.c(385) : warning C4133: 'initializing' : incompatible types - from 'TreeEntry *(__cdecl *)(TreeEntry *,PyObject *)' to 'binaryfunc'
src\tree.c(539) : warning C4090: 'function' : different 'const' qualifiers
src\tree.c(836) : warning C4028: formal parameter 1 different from declaration
src\tree.c(836) : warning C4133: 'initializing' : incompatible types - from 'TreeEntry *(__cdecl *)(Tree *,PyObject *)' to 'binaryfunc'
...

@rcoup

rcoup commented Jun 10, 2019

Copy link
Copy Markdown
Contributor Author

@jdavid at least one of the Windows build configs passed now 👍 Current appveyor failure looks intermittent, can you trigger a rebuild?

Config.__init__() does not always set self._config, which can cause
trouble during __del__(). This avoids seemingly spurious and not
really helpful exceptions.
Fixes libgit2#916
@jdavid

jdavid commented Jun 13, 2019

Copy link
Copy Markdown
Member

Thanks!

Yes I fixed AppVeyor in master. I've re-run the incomplete jobs several times, but there're still failures.

@rcoup

rcoup commented Jun 13, 2019

Copy link
Copy Markdown
Contributor Author

@jdavid aha, there's an actual compile error there now. Will figure it out somehow :)

rcoup added 4 commits June 14, 2019 15:56
- `my_tree / 'path'` resolves to a TreeEntry (same as `my_tree['path']`)
- If `my_tree_entry` is of type `tree`: `my_tree_entry / 'path'`, `my_tree_entry['path']`, and `my_tree_entry[0]` all resolve to another TreeEntry (same as `repo[my_tree_entry.id]['path']`)

This enables `my_tree / 'path' / 'path'` navigation ala Pathlib & GitPython.
Saves doing another `repo[tree_entry.id]` lookup

Tree navigation: `blob = (tree / 'path' / 'file').blob`
rcoup and others added 6 commits June 14, 2019 15:56
- drop .blob & .tree in favour of .obj
- use getitem implementation for truediv: allows `T / 'a' / 0` to correspond to `T['a'][0]`
@jdavid

jdavid commented Jun 16, 2019

Copy link
Copy Markdown
Member

fixed in commit f0f8cc2

@rcoup

rcoup commented Jun 25, 2019

Copy link
Copy Markdown
Contributor Author

@jdavid pulled your changes into here. ✅

@jdavid jdavid merged commit 0bbd717 into libgit2:master Jun 26, 2019
@jdavid

jdavid commented Jun 26, 2019

Copy link
Copy Markdown
Member

Merged thanks!

I still want to give a 2nd look. Specifically this adds a reference to the repository from TreeEntry. I wonder whether it would be possible to implement the same feature without that reference, and in a way that is as friendly to the user. But have not yet stopped to think about it, I'll open an issue if I've some idea.

@rcoup

rcoup commented Jun 26, 2019

Copy link
Copy Markdown
Contributor Author

Thanks ❤️

this adds a reference to the repository from TreeEntry. I wonder whether it would be possible to implement the same feature without that reference

Yeah, needs the repo reference to be able to look up the associated objects, but is a small wart implementation-wise.

One alternative is to keep a reference to the parent/root tree, store one repository reference there, and access everything that way — but then TreeEntry's would need a parent reference instead: I decided the outcome wasn't much different, though it's arguably more elegant (I guess that would allow support for mytreeentry / ".." / "foo" too).

Right at the start I mentioned that I think ideally Tree should be a subclass of TreeEntry so no matter where you start everything is the same, maybe that's something to change as well. Though right now at least they work effectively the same.

@rcoup rcoup deleted the better-tree-nav branch June 26, 2019 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants