I realized that Stamped.File was a bad interface that was really just an
implementation detail of external hooks. I updated the
GlobLister.{ all, unique } methods to return Seq[(Path, FileAttributes)]
rather than Stamped.File which is a much more natural api and one I
could see surviving the switch to nio based apis planned for
1.4.0/2.0.0. I also added a simple scripted test for glob listing. The
GlobLister.all method is implicitly tested all over the place since the
compile task uses it, but it's good to have an explicit test.
The caching repository does not work universally so set the default to
always poll. This is still faster than in sbt 1.2.x because of
performance improvements that I added for listing directories.
I decided that FileCacheEntry was a bad name because the methods did not
necessarily have anything to do with caching. Moreover, because it is
exposed in a public interface, it shouldn't be in the internal package.
Rather than exposing the FileEventMonitor.Event types, which are under
active development in the io repo, I am adding a new event trait to
FileCacheEntry. This trait doesn't expose any internal implementation
details.
Without this, the sbt io version is used by the compiler which means
that apis added in later versions of io are not available. I don't
understand why the transitive dependency on io is not used, but this
fixes the issue.
The equals method didn't work exactly the way I thought. By delegating
to the equivStamp object in sbt we can be more confident that it is
actually comparing the stamp values and not object references or
some other equals implementation.
Windows io really doesn't handle concurrent readers and writers all that
well. Using the LegacyFileTreeRepository was problematic in windows
scripted tests because even though the repository implementation did not
use the cache in its list methods, it did persistently monitor the
directories that were registered. The monitor has to do a lot of
io on a background thread to maintain the cache. This caused io
contention that would cause IO.createDirectory to fail with an obscure
AccessDeniedException. The way to avoid this is to prevent the
background io from occurring at all.
I don't necessarily think this will impact most users running sbt
interactively with a cache, but it did cause scripted tests to fail. For
that reason I made the default in non-interactive/shell use cases on
windows to be a PollingFileRepository which never monitors the file
system except when we are in a watch. The LegacyFileTreeRepository works
fine on mac and linux which have a more forgiving file system.
To make this work, I had to add FileManagement.toMonitoringRepository.
There are now two kinds of repositories that cannot monitor on their
own: HybridPollingFileTreeRepository and PollingFileRepository. The
FileManagement.toMonitoringRepository makes a new repository that turns
on monitoring for those two repository types and disables the close
method on all other repositories so that closing the FileEventMonitor
does not actually close the global file repository.
I ran into a couple of issues with the clean implementation. I changed
the logging to print to stdout instead of streams if enabled. I also
added a helper, Clean.deleteContents that recursively deletes all of the
contents of a directory except for those that match the exclude filter
parameter.
Using a normal logger was a bad idea because we are actually deleting
the target/streams directory when running clean.
The previous implementation worked by getting the full list of files to
delete, reverse sorting it and then deleting every element in the list.
While this can work well it many circumstances, if the directory is
still being written to during the recursive deletion, then we could miss
files that were added after we fetched all of the files. The new version
lazily lists the subdirectories as needed.
The Defaults.scala file has a lot going on. I am trying to generally
follow the pattern of implementing the default task implementation in a
different file and just adding the appropriate declarations in
Defaults.scala.
This rewroks the cleanTask so that it only removes a subset of the
files in the target directory. To do this, I add a new task, outputs,
that returns the glob representation of the possible output files for
the task. It must be a task because some outputs will depend on streams.
For each project, the default outputs are all of the files in
baseDirectory / target.
Long term, we could enhance the clean task to be automatically generated
in any scope (as an input task). We could then add the option for the
task scoped clean to delete all of the transitive outputs of the class.
That is beyond the scope of this commit, however.
I copied the scripted tests from #3678 and added an additional test to
make sure that the manage source directory was explicitly cleaned.
The clean task is unreasonably slow because it does a lot of redundant
io. In this commit, I update clean to be implemented using globs. This
allows us to (optionally) route io through the file system cache. There
is a significant performance improvement to this change. Currently,
running clean on the sbt project takes O(6 seconds) on my machine. After
this change, it takes O(1 second).
To implement this, I added a new setting cleanKeepGlobs to replace
cleanKeepFiles. I don't think that cleanKeepFiles returning Seq[File] is
a big deal for performance because, by default, it just contains the
history file so there isn't much benefit to accessing a single file
through the cache. The reason I added the setting was more for
consistency and to help push people towards globs in their own task
implementations.
Part of the performance improvement comes from inverting the problem.
Before we would walk the file system tree from the base and recursively
delete leafs and nodes in a depth first traversal. Now we collect all of
the files that we are interested in deleting in advance. We then sort
the results lexically by path name and then perform the deletions in
that order. Because children will always comes first in this scheme,
this will generally allow us to delete a directory.
There is an edge case that if files are created in a subdirectory after
we've created the list to delete, but before the subdirectory is
deleted, then that subdirectory will not be deleted. In general, this
will tend to impact target/streams because writes occur to
target/streams during traversal. I don't think this really matters for
most users. If the target directory is being concurrently modified with
clean, then the user is doing something wrong.
To ensure legacy compatibility, I re-implement cleanKeepFiles to return
no files. Any plugin that was appending files to the cleanKeepFiles task
with `+=` or `++=` will continue working as before because I explicitly
add those files to the list to delete. I updated the actions/clean-keep
scripted test to use both cleanKeepFiles and cleanKeepGlobs to ensure
both tasks are correctly used.
Bonus: add debug logging of all deleted files
Right now, the sbt.internal.io.Source is something of a second class
citizen within sbt. Since sbt 0.13, there have been extension classes
defined that can convert a file to a PathFinder but no analog has been
introduced for sbt.internal.io.Source.
Given that sbt.internal.io.Source was not really intended to be part of
the public api (just look at its package), I think it makes sense to
just replace it with Glob. In this commit, I add extension
methods to Glob and Seq[Glob] that make it possible to easily
retrieve all of the files for a particular Glob within a task. The
upshot is that where previously, we'd have had to write something like:
watchSources += Source(baseDirectory.value / "src" / "main" / "proto", "*.proto", NothingFilter)
now we can write
watchGlobs += baseDirectory.value / "src" / "main" / "proto" * "*.proto"
Moreover, within a task, we can now do something like:
foo := {
val allWatchGlobs: Seq[File] = watchGlobs.value.all
println(allWatchSources.mkString("all watch source files:\n", "\n", ""))
}
Before we would have had to manually retrieve the files.
The implementation of the dsl uses the new GlobExtractor class which
proxies file look ups through a FileTree.Repository. This makes it so
that, by default, all file i/o using Sources will use the default
FileTree.Repository. The default is a macro that returns
`sbt.Keys.fileTreeRepository.value: @sbtUnchecked`. By doing it this
way, the default repository can only be used within a task definition
(since it delegates to `fileTreeRepository.value`). It does not,
however, prevent the user from explicitly providing a
FileTree.Repository instance which the user is free to instantiate
however they wish.
Bonus: optimize imports in Def.scala and Defaults.scala
The FileTreeViewConfig abstraction that I added was somewhat unwieldy
and confusing. The original intention was to provide users with a lot of
flexibility in configuring the global file tree repository used by sbt.
I don't think that flexibility is necessary and it was both conceptually
complicated and made the implementation complex. In this commit, I add a
new boolean flag enableGlobalCachingFileTreeRepository that toggles
which kind of FileTreeRepository to use globally.
There are actually three kinds of repositories that could be returned:
1) FileTreeRepository.default -- this caches the entire file system
tree it hooks into the cache's event callbacks to create a file event
monitor. It will be used if enableGlobalCachingFileTreeRepository is
true and Global / pollingGlobs := Nil
2) FileTreeRepository.hybrid -- similar to FileTreeRepository.default
except that it will not cache any files that are included in
Global / pollingGlobs. It will be used if
enableGlobalCachingFileTreeRepository is true and
Global / pollingGlobs is non empty
3) FileTreeRepository.legacy -- does not cache any of the file system
tree, but does maintain a persistent file monitoring process that is
implemented with a WatchServiceBackedObservable. Because it doesn't
poll, in general, it's ok to leave the monitoring on in the
background. One reason to use this is that if there are any issues
with the cache being unable to accurately mirror the underlying file
system tree, this repository will always poll the file system
whenever sbt requests the entries for a given glob. Moreover, the
file system tree implementation is very similar to the implementation
that was used in 1.2.x so this gives users a way to almost fully opt
back in to the old behavior.
I'd like to remove '---' and 'pair' in sbt 2 so I'm inlining the logic where
I find it. The '---' method is trivially implemented with a filter on
the sequence of files and filtering the output will not require io,
unlike '---'. For pair, I get confused every time I see it in the code
and it is rarely saving more than a line. While I understand that it may
have been convenient when the code using pair was originally written, I
don't think it is worth the maintenance cost. My specific issue is that
to me pair means tuple2, full stop. The definition of pair is:
def pair[T](mapper: File => Option[T], errorIfNone: Boolean = true): Seq[(File, T)]
First of all, it's not at all obvious when seen inline in the code that
it has the side effect of evaluating PathFinder.get. Moreover, it
doesn't return a general pair, it's a very specific pair with a File in
the first position. I just don't see how using pair improves upon, say:
val func: File => Option[(File, String)] = ???
globs.all.flatMap(func)
or
val func: File => Option[(File, String)] = ???
globs.all.map(f => func(f) match {
case Some(r) => r
case None => throw new IllegalStateException("Couldn't evaluate func for $f")
}) // or just define `func = File => (File, String)` and throw on an error
This new version of io breaks source and binary compatibility everywhere
that uses the register(path: Path, depth: Int) method that is defined on
a few interfaces because I changed the signature to register(glob:
Glob). I had to convert to using a glob everywhere that register was
called.
I also noticed a number of places where we were calling .asFile on a
file. This is redundant because asFile is an extension method on File
that just returns the underlying file.
Finally, I share the IOSyntax trait from io in AllSyntax. There was more
or less a TODO suggesting this change. The one hairy part is the
existence of the Alternative class. This class has unfortunately somehow
made it into the sbt package object. While I doubt many plugins are
using this, it doesn't seem worth breaking binary compatibility to get
rid of it. The issue is that while Alternative is defined private[sbt],
the alternative method in IOSyntax is public, so I can't get rid of
Alternative without breaking binary compatibility.
I'm not deprecating Alternative for now because the sbtProj still has
xfatal warnings on. I think in many, if not most, cases, the Alternative
class makes the code more confusing as is often the case with custom
operators. The confusion is mitigated if the abstraction is used only in
the file in which it's defined.
The io library now delegates to swoval for io which generally handles
relative sources correctly (by converting them to absolute paths
internally). This test started failing after the io bump because of
this. It seemed to me that relative sources not working was not a
feature so I just removed the test.
Fixes#4461
This opens up ExecuteProgress API that's been around under private[sbt].
Since the state passing mechanism hasn't been used, I got rid of it.
The build user can configure the build using two keys Boolean `taskProgress` and `State => Seq[TaskProgress]` `progressReports`. `useSuperShell` is lightweight key on/off switch for the super shell that can be used as follows:
```scala
Global / SettingKey[Boolean]("useSuperShell") := false
```
Although this is technically in the internal package, it is exposed to
users when they write a custom input task. I do not think that we should
prevent users/plugin authors from writing their own parser
implementations if there is a different library they prefer. By my
count, there are 21 implementations of this interface in sbt, so it's
unlikely that there is much benefit from a pattern matching perspective.