The clean task is unreasonably slow because it does a lot of redundant
io. In this commit, I update clean to be implemented using globs. This
allows us to (optionally) route io through the file system cache. There
is a significant performance improvement to this change. Currently,
running clean on the sbt project takes O(6 seconds) on my machine. After
this change, it takes O(1 second).
To implement this, I added a new setting cleanKeepGlobs to replace
cleanKeepFiles. I don't think that cleanKeepFiles returning Seq[File] is
a big deal for performance because, by default, it just contains the
history file so there isn't much benefit to accessing a single file
through the cache. The reason I added the setting was more for
consistency and to help push people towards globs in their own task
implementations.
Part of the performance improvement comes from inverting the problem.
Before we would walk the file system tree from the base and recursively
delete leafs and nodes in a depth first traversal. Now we collect all of
the files that we are interested in deleting in advance. We then sort
the results lexically by path name and then perform the deletions in
that order. Because children will always comes first in this scheme,
this will generally allow us to delete a directory.
There is an edge case that if files are created in a subdirectory after
we've created the list to delete, but before the subdirectory is
deleted, then that subdirectory will not be deleted. In general, this
will tend to impact target/streams because writes occur to
target/streams during traversal. I don't think this really matters for
most users. If the target directory is being concurrently modified with
clean, then the user is doing something wrong.
To ensure legacy compatibility, I re-implement cleanKeepFiles to return
no files. Any plugin that was appending files to the cleanKeepFiles task
with `+=` or `++=` will continue working as before because I explicitly
add those files to the list to delete. I updated the actions/clean-keep
scripted test to use both cleanKeepFiles and cleanKeepGlobs to ensure
both tasks are correctly used.
Bonus: add debug logging of all deleted files
Right now, the sbt.internal.io.Source is something of a second class
citizen within sbt. Since sbt 0.13, there have been extension classes
defined that can convert a file to a PathFinder but no analog has been
introduced for sbt.internal.io.Source.
Given that sbt.internal.io.Source was not really intended to be part of
the public api (just look at its package), I think it makes sense to
just replace it with Glob. In this commit, I add extension
methods to Glob and Seq[Glob] that make it possible to easily
retrieve all of the files for a particular Glob within a task. The
upshot is that where previously, we'd have had to write something like:
watchSources += Source(baseDirectory.value / "src" / "main" / "proto", "*.proto", NothingFilter)
now we can write
watchGlobs += baseDirectory.value / "src" / "main" / "proto" * "*.proto"
Moreover, within a task, we can now do something like:
foo := {
val allWatchGlobs: Seq[File] = watchGlobs.value.all
println(allWatchSources.mkString("all watch source files:\n", "\n", ""))
}
Before we would have had to manually retrieve the files.
The implementation of the dsl uses the new GlobExtractor class which
proxies file look ups through a FileTree.Repository. This makes it so
that, by default, all file i/o using Sources will use the default
FileTree.Repository. The default is a macro that returns
`sbt.Keys.fileTreeRepository.value: @sbtUnchecked`. By doing it this
way, the default repository can only be used within a task definition
(since it delegates to `fileTreeRepository.value`). It does not,
however, prevent the user from explicitly providing a
FileTree.Repository instance which the user is free to instantiate
however they wish.
Bonus: optimize imports in Def.scala and Defaults.scala
The FileTreeViewConfig abstraction that I added was somewhat unwieldy
and confusing. The original intention was to provide users with a lot of
flexibility in configuring the global file tree repository used by sbt.
I don't think that flexibility is necessary and it was both conceptually
complicated and made the implementation complex. In this commit, I add a
new boolean flag enableGlobalCachingFileTreeRepository that toggles
which kind of FileTreeRepository to use globally.
There are actually three kinds of repositories that could be returned:
1) FileTreeRepository.default -- this caches the entire file system
tree it hooks into the cache's event callbacks to create a file event
monitor. It will be used if enableGlobalCachingFileTreeRepository is
true and Global / pollingGlobs := Nil
2) FileTreeRepository.hybrid -- similar to FileTreeRepository.default
except that it will not cache any files that are included in
Global / pollingGlobs. It will be used if
enableGlobalCachingFileTreeRepository is true and
Global / pollingGlobs is non empty
3) FileTreeRepository.legacy -- does not cache any of the file system
tree, but does maintain a persistent file monitoring process that is
implemented with a WatchServiceBackedObservable. Because it doesn't
poll, in general, it's ok to leave the monitoring on in the
background. One reason to use this is that if there are any issues
with the cache being unable to accurately mirror the underlying file
system tree, this repository will always poll the file system
whenever sbt requests the entries for a given glob. Moreover, the
file system tree implementation is very similar to the implementation
that was used in 1.2.x so this gives users a way to almost fully opt
back in to the old behavior.
I'd like to remove '---' and 'pair' in sbt 2 so I'm inlining the logic where
I find it. The '---' method is trivially implemented with a filter on
the sequence of files and filtering the output will not require io,
unlike '---'. For pair, I get confused every time I see it in the code
and it is rarely saving more than a line. While I understand that it may
have been convenient when the code using pair was originally written, I
don't think it is worth the maintenance cost. My specific issue is that
to me pair means tuple2, full stop. The definition of pair is:
def pair[T](mapper: File => Option[T], errorIfNone: Boolean = true): Seq[(File, T)]
First of all, it's not at all obvious when seen inline in the code that
it has the side effect of evaluating PathFinder.get. Moreover, it
doesn't return a general pair, it's a very specific pair with a File in
the first position. I just don't see how using pair improves upon, say:
val func: File => Option[(File, String)] = ???
globs.all.flatMap(func)
or
val func: File => Option[(File, String)] = ???
globs.all.map(f => func(f) match {
case Some(r) => r
case None => throw new IllegalStateException("Couldn't evaluate func for $f")
}) // or just define `func = File => (File, String)` and throw on an error
This new version of io breaks source and binary compatibility everywhere
that uses the register(path: Path, depth: Int) method that is defined on
a few interfaces because I changed the signature to register(glob:
Glob). I had to convert to using a glob everywhere that register was
called.
I also noticed a number of places where we were calling .asFile on a
file. This is redundant because asFile is an extension method on File
that just returns the underlying file.
Finally, I share the IOSyntax trait from io in AllSyntax. There was more
or less a TODO suggesting this change. The one hairy part is the
existence of the Alternative class. This class has unfortunately somehow
made it into the sbt package object. While I doubt many plugins are
using this, it doesn't seem worth breaking binary compatibility to get
rid of it. The issue is that while Alternative is defined private[sbt],
the alternative method in IOSyntax is public, so I can't get rid of
Alternative without breaking binary compatibility.
I'm not deprecating Alternative for now because the sbtProj still has
xfatal warnings on. I think in many, if not most, cases, the Alternative
class makes the code more confusing as is often the case with custom
operators. The confusion is mitigated if the abstraction is used only in
the file in which it's defined.
The io library now delegates to swoval for io which generally handles
relative sources correctly (by converting them to absolute paths
internally). This test started failing after the io bump because of
this. It seemed to me that relative sources not working was not a
feature so I just removed the test.
Fixes#4461
This opens up ExecuteProgress API that's been around under private[sbt].
Since the state passing mechanism hasn't been used, I got rid of it.
The build user can configure the build using two keys Boolean `taskProgress` and `State => Seq[TaskProgress]` `progressReports`. `useSuperShell` is lightweight key on/off switch for the super shell that can be used as follows:
```scala
Global / SettingKey[Boolean]("useSuperShell") := false
```
Although this is technically in the internal package, it is exposed to
users when they write a custom input task. I do not think that we should
prevent users/plugin authors from writing their own parser
implementations if there is a different library they prefer. By my
count, there are 21 implementations of this interface in sbt, so it's
unlikely that there is much benefit from a pattern matching perspective.
Fixes#256
JAVA_OPTS and SBT_OPTS are now read into an array first.
If `-mem` is passed, it will evict all memory related options,
and use the calculated memory options instead.
I often find that when I run a command it takes a long time to start up
because sbt triggers a full gc. To improve the ux, I update the command
exchange to run full gc only once while it's waiting for a command to
run and only after the user has been idle for at least one minute.
Bonus: optimize imports