The new io verion removes the PathFinder <-> Glob implicit translations.
It also has a number of small bug fixes related to directory listing via
FileTreeView.
I noticed that when I ran scripted locally (with no arguments) it was
hard to tell if they succeeded or not because the last test it ran was
metadata-only-resolver which had no valid test files and dumped a bunch
of lines to stderr. I noticed there were a number of tests that had
files named disabled but no other tests with a test named disable so I
renamed metadata-only-resolver to disabled..
I decided that there were too many settings related to the file
management that did similar things and had similar names but did
slightly different things. To improve this, I introduce the ChangedFiles
class to sbt.nio.file and switch to having just two task for file input
and output retrieval: all(Input|Output)Files and
changed(Input|Output)Files. If, for example, changedInputFiles returns
None that means that either the task has not yet been run or there were
no changes. If there have been any changes, then it will return
Some(changes) and the user can extract the relevant changes that they
are interested in.
The code may be slightly more verbose in a few places, but I think it's
worth it for the conceptual clarity.
This commit unifies my previous work for automatically watching the
input files for a task with support for automatically tracking and
cleaning up the output files of a task. The big idea is that users may
want to define tasks that depend on the file outputs of other tasks and
we may not want to run the dependent tasks if the output files of the
parent tasks are unmodified.
For example, suppose we wanted to make a plugin for managing typescript
files. There may be, say, two tasks with the following inputs and
outputs:
compileTypescript = taskKey[Unit]("shells out to compile typescript files")
fileInputs -- sourceDirectory / ** / "*.ts"
fileOutputs -- target / "generated-js" / ** / "*.js"
minifyGeneratedJS = taskKey[Path]("minifies the js files generated by compileTypescript to a single combined js file.")
dependsOn: compileTypeScript / fileOutputs
Given a clean build, the following should happen
> minifyGeneratedJS
// compileTypescript is run
// minifyGeneratedJS is run
> minifyGeneratedJS
// no op because nothing changed
> minifyGeneratedJS / clean
// removes the file returned by minifyGeneratedJS.previous
> minifyGeneratedJS
// re-runs minifyGeneratedJS with the previously compiled js artifacts
> compileTypescript / clean
// removes the generated js files
> minifyGeneratedJS
// compileTypescript is run because the previous clean removed the generated js files
// minifyGeneratedJS runs because the artifacts have changed
> clean
// removes the generated js files and the minified js file
> minifyGeneratedJS
// compileTypescript is run because the generated js files were
// minifyGeneratedJS is run both because it was removed and
Moreover, if compileTypescript fails, we want minifyGeneratedJS to fail
as well.
This commit makes this all possible. It adds a number of tasks to
sbt.nio.Keys that deal with the output files. When injecting settings, I
now identify all tasks that return Seq[File], File, Seq[Path] and Path
and create a hidden special task: dynamicFileOutputs: TaskKey[Seq[Path]]
This special task runs the underlying task and converts the result to
Seq[Path]. From there, we can have the tasks like changedOutputPaths
delegate to dynamicFileOutputs which, by proxy, runs the underlying
task. If any task in the input / output chain fails, the entire sequence
fails.
Unlike the fileInputs, we do not register the dynamicFileOutputs or
fileOutputs with continuous watch service so these paths will not
trigger a continuous build if they are modified. Only explicit unmanaged
input sources should should do that.
As part of this, I also added automatic generation of a custom clean task for
any task that returns Seq[File], File, Seq[Path] or Path. I also added
aggregation so that clean can be defined in a configuration or project
and it will automatically run clean for all of the tasks that have a
custom clean implementation in that task or project. The automatic clean
task will only delete files that are in the task target directory to
avoid accidentally deleting unmanaged files.
This commit refactors things so that the nio apis are located primarily
in the nio package. Because the nio keys are a first class sbt feature,
I had to add import sbt.nio._ and sbt.nio.Keys._ to the autoimports in
BuildUtil.scala
In my recent changes to watch, I have been moving towards a world in
which sbt manages the file inputs and outputs at the task level. The
main idea is that we want to enable a user to specify the inputs and
outputs of a task and have sbt able to track those inputs across
multiple task evaluations. Sbt should be able to automatically trigger a
build when the inputs change and it also should be able to avoid task
evaluation if non of the inputs have changed.
The former case of having sbt automatically watch the file inputs of a
task has been present since watch was refactored. In this commit, I
make it possible for the user to retrieve the lists of new, modified and
deleted files. The user can then avoid task evaluation if none of the
inputs have changed.
To implement this, I inject a number of new settings during project
load if the fileInputs setting is defined for a task. The injected
settings are:
allPathsAndAttributes -- this retrieves all of the paths described by
the fileInputs for the task along with their attributes
fileStamps -- this retrieves all of the file stamps for the files
returned by allPathsAndAttributes
Using these two injected tasks, I also inject a number of derived tasks,
such as allFiles, which returns all of the regular files returned by
allPathsAndAttributes and changedFiles, which returns all of the regular
files that have been modified since the last run.
Using these injected settings, the user is able to write tasks that
avoid evaluation if the inputs haven't changed.
foo / fileInputs += baseDirectory.value.toGlob / ** / "*.scala"
foo := {
foo.previous match {
case Some(p) if (foo / changedFiles).value.isEmpty => p
case _ => fooImpl((foo / allFiles).value
}
}
To make this whole mechanism work, I add a private task key:
val fileAttributeMap = taskKey[java.util.HashMap[Path, Stamp]]("...")
This keeps track of the stamps for all of the files that are managed by
sbt. The fileStamps task will first look for the stamp in the attribute
map and, only if it is not present, it will update the cache. This
allows us to ensure that a given file will only be stamped once per task
evaluation run no matter how the file inputs are specified. Moreover, in
a continuous build, I'm able to reuse the attribute map which can
significantly reduce latency because the default file stamping
implementation used by zinc is fairly expensive (it can take anywhere
between 300-1500ms to stamp 5000 8kb source files on my mac).
I also renamed some of the watch related keys to be a bit more clear.
The newest version of io repackages a number of classes into the
sbt.nio.* packages. It also changes some of the semantics of glob
related apis. This commit updates all of the usages of the updated apis
within sbt but should have no functional difference.
This adds dependency to LM implemented using Coursier.
I had to copy paste a bunch of code from sbt-coursier-shared to break the dependency to sbt.
`Global / useCoursier := false` or `-Dsbt.coursier=false` be used to opt-out of using Coursier for the dependency resolution.
I discovered that it wasn't possible to call .previous in an input task.
While I understand why you can't call .previous on an InputKey, I think
it makes sense to allow calling .previous on a TaskKey within an input
task.
Fixes#4438
This slims down update's UpdateReport by removing evicted modules
caller information. The larger the graph, the effect would be more
pronounced. For example, I saw a graph reduce from 5.9MB to 1.1MB in JSON file.
It was reported in https://github.com/sbt/sbt/issues/4608 that there was
a regression that tests run against scala 2.11 would fail. This was
because the interface loader incorrectly contained the scala library. To
fix this, I needed to find the xsbt.boot.BootFilteredLoader in the
classloading hierarchy and put the sbt testing interface library in
between that loader and the scala library loader.
Prior to this commit, it was difficult to prevent the sbt metabuild
classpath from leaking into the runtime and test classpaths. The biggest
issue is that the test-inferface jar was located in the metabuild
classpath. We tried to prevent leakage using the DualClassLoader, but
this was an ugly solution that did not seem to work reliably. The fix is
to modify the actual sbt metabuild classloader provided by the sbt
launcher.
To do this, I add a new classloader SbtMetaClassLoader that isolates the
test-interface jar from the rest of the classpath. I modify xMain to
create a new AppConfiguration that uses this new classloader and
use reflection to invoke the sbt main method using the new classloader.
Not only do I think that this is a much saner solution than DualLoaders,
I accidentally fixed#4575 with this change.
It is possible with the new layering strategies that tests may fail if a
java package private class is accessed across classloader layers. This
will result in an IllegalAccessError that is hard to debug. With this
commit, I add an error message that will be displayed if run throws an
IllegalAccessError that suggests that the user try the
ScalaInstance layering strategy or the flat layering strategy.
I wrote this check in a rush and realized that it didn't quite match the
correct glob semantics. The depth parameter is effectively the index of
the array of sorted child directories of the base. That index is
computed with getNameCount - 1, not getNameCount. It is also inclusive,
not exclusive, hence the switch from `<` to `<=`.
This change was motivated by my reviewing the initial change in the
context of the fix to https://github.com/sbt/sbt/issues/4591.
This commit reworks the watch start message so that instead of printing
something like:
[info] [watch] 1. Waiting for source changes... (press 'r' to re-run the command, 'x' to exit sbt or 'enter' to return to the shell)
it instead prints something like:
[info] 1. Monitoring source files for updates...
[info] Project: filesJVM
[info] Command: compile
[info] Options:
[info] <enter>: return to the shell
[info] 'r': repeat the current command
[info] 'x': exit sbt
It will also print which path triggered the build.
Prior to this commit, it was necessary to add breadcrumbs for every
input that is used within a dynamic task. In this commit, I rework the
watch setup so that we can track the dynamic inputs that are used. To
simplify the discussion, I'm going to ignore aggregation and
multi-commands, but they are both supported. To implement this change, I
update the GlobLister.all method to take a second implicit argument:
DynamicInputs. This is effectively a mutable Set of Globs that is
updated every time a task looks up files from a glob. The repository.get
method should already register the glob with the repository. The set of
globs are necessary because the repository may not do any file filtering
so the file event monitor needs to check the input globs to ensure that
the file event is for a file that actually requested by a task during
evaluation.
* Long term, I plan to add support for lifting tasks into a dynamic task
in a way that records _all_ of the possible dependencies for the task
through each of the dynamic code paths. We should revisit this change to
determine if its still necessary after that change.
This commit cleans up the approach for transforming the sbt state upon
completion of a task returning State. I add a new approach where a task
can return an instance of StateTransform, which is just a wrapper around
State. I then update EvaluateTask to apply this stateTransform rather
than the (optional) state transformation that may be stored in the Task
info parameter. By requiring that the user return StateTransform rather
than State directly, we ensure that no existing tasks that depend on the
state transformation function embedded in the Task info break. In sbt 2,
I could see the possibility of making this automatic (and probably
removing the state transformation function via attribute).
The problem with using the transformState attribute key is that it is
applied non-deterministically. This means that if you decorate a task
returning State, then the state transformation may or may not be
correctly applied.
I tracked this non-determinism down to the stateTransform
method in EvaluateTask. It iterates through the task result map and
chains all of the defined transformState attribute values. Because the
result is a map, this order is not specified. This chaining is arguably
a bad design because State => State does not imply commutivity. Indeed,
the problem here was that my state transformation functions were
constant functions, which are obviously non-commutative. I believe that
this logic likely written under the assumption that there would be no
more than one of these tranformations in a given result map.
I decided that it makes sense to move all of the new watch code out of
the Watched companion object since the Watched trait itself is now
deprecated. I don't really like having the new code in Watched.scala
mixed with the legacy code, so I pulled it all out and moved it into the
Watch object. Since we have to put all of the logic for the Continuous
object in main in order to access the sbt.Keys object, it makes sense to
move the logic out of main-command and into command so that most of the
watch related logic is in the same subproject.
This is a huge refactor of Watched. I produced this through multiple
rewrite iterations and it was too difficult to separate all of the
changes into small individual commits so I, unfortunately, had to make a
massive commit. In general, I have tried to document the source code
extensively both to facilitate reading this commit and to help with
future maintenance.
These changes are quite complicated because they provided a built-in
like api to a feature that is implemented like a plugin. In particular,
we have to manually do a lot of parsing as well as roll our own
task/setting evaluation because we cannot infer the watch settings at
project build time because we do not know a priori what commands the
user may watch in a given session. The dynamic setting and task
evaluation is mostly confined to the WatchSettings class in Continuous.
It feels dirty to do all of this extraction by hand, but it does seem to
work correctly with scopes.
At a high level this commit does four things:
1) migrate the watch implementation to using the InputGraph to collect
the globs that it needs to monitor during the watch
2) simplify WatchConfig to make it easier for plugin authors to write
their own custom watch implementations
3) allow configuration of the watch settings based on the task(s) that
is/are being run
4) adds an InputTask implemenation of watch.
Point #1 is mostly handled by Point #3 since I had to overhaul how _all_
of the watch settings are generated. InputGraph already handles both
transitive inputs and triggers as well as legacy watchSources so not
much additional logic is needed beyond passing the correct scoped keys
into InputGraph.
Point #3 require some structural changes. The watch settings cannot in
general be defined statically because we don't know a priori what tasks
the user will try and watch. To address this, I added code that will
extract the task keys for all of the commands that we are running. I
then manually extract the relevant settings for each command. Finally, I
aggregate those settings into a single WatchConfig that can be used to
actually implement the watch. The aggregation is generally
straightforward: we run all of the callbacks for each task and choose
the next watch state based on the highest priority Action that is
returned by any of the callbacks.
Because I needed Extracted to pull out the necessary settings, I was
forced to move a lot of logic out of Watched and into a new singleton,
Continuous, that exists in the main project (Watched is in the command
project). The public footprint of Continuous is tiny. Even though I want
to make the watch feature flexible for plugin authors, the
implementation and api remain a moving target so I do not want to be
limited by future binary compatibility requirements. Anyone who wants to
live dangerously can access the private[sbt] apis via reflection or by
adding custom code to the sbt package in their plugin (a technique I've
used in CloseWatch).
Point #2 is addressed by removing the count and lastStatus from the
WatchConfig callbacks. While these parameters can be useful, they are
not necessary to implement the semantics of a watch. Moreover, a status
boolean isn't really that useful and the sbt task engine makes it very
difficult to actually extract the previous result of the tasks that were
run. After this refactor, WatchConfig has a simpler api. There are fewer
callbacks to implement and the signatures are simpler. To preserve the
_functionality_ of making the count accessible to the user specifiable
callbacks, I still provided settings like watchOnInputEvent that accept
a count parameter, but the count is actually tracked externally to
Watched.watch and incremented every time the task is run.
Moreover, there are a few parameters of the watch: the logger and
transitive globs, that cannot be provided via settings. I provide
callback settings like watchOnStart that mirror the WatchConfig
callbacks except that they return a function from Continuous.Arguments
to the needed callback. The Continuous.aggregate function will check if
the watchOnStart setting is set and if it is, will pass in the needed
arguments. Otherwise it will use the default watchOnStart implementation
which simulates the existing behavior by tracking the iteration count in
an AtomicInteger and passing the current count into the user provided
callback. In this way, we are able to provide a number of apis to the
watch process while preserving the default behavior.
To implement #4, I had to change the label of the `watch` attribute key
from "watch" to "watched". This allows `watch compile` to work at the
sbt command line even thought it maps to the watchTasks key. The actual
implementation is almost trivial. The difference between an
InputTask[Unit] and a command is very small. The tricky part is that the
actual implementation requires applying mapTask to a delegate task that
overrides the Task's info.postTransform value (which is used to
transform the state after task evaluation). The actual postTransform
function can be shared by the continuous task and continuous command.
There is just a slightly different mechanism for getting to the state
transformation function.
This commit adds functionality to traverse the settings graph to find
all of the Inputs settings values for the transitive dependencies of the
task. We can use this to build up the list of globs that we must watch
when we are in a continuous build. Because the Inputs key is a setting,
it is actually quite fast to fetch all the values once the compiled map
is generated (O(2ms) in the scripted tests, though I did find that it
took O(20ms) to generate the compiled map).
One complicating factor is that dynamic tasks do not track any of
their dynamic dependencies. To work around this, I added the
transitiveDependencies key. If one does something like:
foo := {
val _ = bar / transitiveDependencies
val _ = baz / transitiveDependencies
if (System.getProperty("some.prop", "false") == "true") Def.task(bar.value)
else Def.task(baz.value)
}
then (foo / transitiveDependencies).value will return all of the inputs
and triggers for bar and baz as well as for foo.
To implement transitiveDependencies, I did something fairly similar to
streams where if the setting is referenced, I add a default
implementation. If the default implementation is not present, I fall
back on trying to extract the key from the commandLine. This allows the
user to run `show bar / transitiveDependencies` from the command line
even if `bar / transitiveDependencies` is not defined in the project.
It might be possible to coax transitiveDependencies into a setting, but
then it would have to be eagerly evaluated at project definition time
which might increase start up time too much. Alternatively, we could
just define this task for every task in the build, but I'm not sure how
expensive that would be. At any rate, it should be straightforward to
make that change without breaking binary compatibility if need be. This
is something to possibly explore before the 1.3 release if there is any
spare time (unlikely).
In order to walk the full dependency graph of a task, we need to know
the internal class path dependency configurations. Suppose that we have
projects a and b where b depends on *->compile in a. If we want to find
all of the inputs for b, then if we find that there is a dependency on
b/Compile/internalDependencyClasspath, then we must add a / Compile /
internalDependencyClasspath to the list of dependencies for the task.
I copied the setup of one of the other scripted tests that was
introduced to test the track-internal-dependencies feature to write a
basic scripted test for this new key and implementation.
This adds two new tasks: fileInputs and watchTriggers, that will be used by sbt
both to fetch files within a task as well as to create watch sources for
continuous builds. In a subsequent commit, I will add a task for a
command that will traverse the task dependency graph to find all of the
input task dependency scopes. The idea is to make it possible to easily
and accurately specify the watch sources for a task. For example, we'd
be able to write something like:
val foo = taskKey[Unit]("print text file contents")
foo / fileInputs := baseDirectory.value ** "*.txt"
foo := {
(foo / fileInputs).value.all.foreach(f =>
println(s"$f:\n${new String(java.nio.Files.readAllBytes(f.toPath))}"))
}
If the user then runs `~foo`, then the task should trigger if the user
modifies any file with the "txt" extension in the project directory.
Today, the user would have to do something like:
val fooInputs = settingKey[Seq[Source]]("the input files for foo")
fooInputs := baseDirectory.value ** "*.txt"
val foo = taskKey[Unit]("print text file contents")
foo := {
fooInputs.value.all.foreach(f =>
println(s"$f:\n${new String(java.nio.Files.readAllBytes(f.toPath))}"))
}
watchSources ++= fooInputs.value
or even worse:
val foo = taskKey[Unit]("print text file contents")
foo := {
(baseDirectory.value ** "*.txt").all.foreach(f =>
println(s"$f:\n${new String(java.nio.Files.readAllBytes(f.toPath))}"))
}
watchSources ++= baseDirectory.value ** "*.txt"
which makes it possible for the watchSources and the task sources to get
out of sync.
For consistency, I also renamed the `outputs` key `fileOutputs`.
I realized that Stamped.File was a bad interface that was really just an
implementation detail of external hooks. I updated the
GlobLister.{ all, unique } methods to return Seq[(Path, FileAttributes)]
rather than Stamped.File which is a much more natural api and one I
could see surviving the switch to nio based apis planned for
1.4.0/2.0.0. I also added a simple scripted test for glob listing. The
GlobLister.all method is implicitly tested all over the place since the
compile task uses it, but it's good to have an explicit test.
This rewroks the cleanTask so that it only removes a subset of the
files in the target directory. To do this, I add a new task, outputs,
that returns the glob representation of the possible output files for
the task. It must be a task because some outputs will depend on streams.
For each project, the default outputs are all of the files in
baseDirectory / target.
Long term, we could enhance the clean task to be automatically generated
in any scope (as an input task). We could then add the option for the
task scoped clean to delete all of the transitive outputs of the class.
That is beyond the scope of this commit, however.
I copied the scripted tests from #3678 and added an additional test to
make sure that the manage source directory was explicitly cleaned.
The clean task is unreasonably slow because it does a lot of redundant
io. In this commit, I update clean to be implemented using globs. This
allows us to (optionally) route io through the file system cache. There
is a significant performance improvement to this change. Currently,
running clean on the sbt project takes O(6 seconds) on my machine. After
this change, it takes O(1 second).
To implement this, I added a new setting cleanKeepGlobs to replace
cleanKeepFiles. I don't think that cleanKeepFiles returning Seq[File] is
a big deal for performance because, by default, it just contains the
history file so there isn't much benefit to accessing a single file
through the cache. The reason I added the setting was more for
consistency and to help push people towards globs in their own task
implementations.
Part of the performance improvement comes from inverting the problem.
Before we would walk the file system tree from the base and recursively
delete leafs and nodes in a depth first traversal. Now we collect all of
the files that we are interested in deleting in advance. We then sort
the results lexically by path name and then perform the deletions in
that order. Because children will always comes first in this scheme,
this will generally allow us to delete a directory.
There is an edge case that if files are created in a subdirectory after
we've created the list to delete, but before the subdirectory is
deleted, then that subdirectory will not be deleted. In general, this
will tend to impact target/streams because writes occur to
target/streams during traversal. I don't think this really matters for
most users. If the target directory is being concurrently modified with
clean, then the user is doing something wrong.
To ensure legacy compatibility, I re-implement cleanKeepFiles to return
no files. Any plugin that was appending files to the cleanKeepFiles task
with `+=` or `++=` will continue working as before because I explicitly
add those files to the list to delete. I updated the actions/clean-keep
scripted test to use both cleanKeepFiles and cleanKeepGlobs to ensure
both tasks are correctly used.
Bonus: add debug logging of all deleted files
Right now, the sbt.internal.io.Source is something of a second class
citizen within sbt. Since sbt 0.13, there have been extension classes
defined that can convert a file to a PathFinder but no analog has been
introduced for sbt.internal.io.Source.
Given that sbt.internal.io.Source was not really intended to be part of
the public api (just look at its package), I think it makes sense to
just replace it with Glob. In this commit, I add extension
methods to Glob and Seq[Glob] that make it possible to easily
retrieve all of the files for a particular Glob within a task. The
upshot is that where previously, we'd have had to write something like:
watchSources += Source(baseDirectory.value / "src" / "main" / "proto", "*.proto", NothingFilter)
now we can write
watchGlobs += baseDirectory.value / "src" / "main" / "proto" * "*.proto"
Moreover, within a task, we can now do something like:
foo := {
val allWatchGlobs: Seq[File] = watchGlobs.value.all
println(allWatchSources.mkString("all watch source files:\n", "\n", ""))
}
Before we would have had to manually retrieve the files.
The implementation of the dsl uses the new GlobExtractor class which
proxies file look ups through a FileTree.Repository. This makes it so
that, by default, all file i/o using Sources will use the default
FileTree.Repository. The default is a macro that returns
`sbt.Keys.fileTreeRepository.value: @sbtUnchecked`. By doing it this
way, the default repository can only be used within a task definition
(since it delegates to `fileTreeRepository.value`). It does not,
however, prevent the user from explicitly providing a
FileTree.Repository instance which the user is free to instantiate
however they wish.
Bonus: optimize imports in Def.scala and Defaults.scala
This new version of io breaks source and binary compatibility everywhere
that uses the register(path: Path, depth: Int) method that is defined on
a few interfaces because I changed the signature to register(glob:
Glob). I had to convert to using a glob everywhere that register was
called.
I also noticed a number of places where we were calling .asFile on a
file. This is redundant because asFile is an extension method on File
that just returns the underlying file.
Finally, I share the IOSyntax trait from io in AllSyntax. There was more
or less a TODO suggesting this change. The one hairy part is the
existence of the Alternative class. This class has unfortunately somehow
made it into the sbt package object. While I doubt many plugins are
using this, it doesn't seem worth breaking binary compatibility to get
rid of it. The issue is that while Alternative is defined private[sbt],
the alternative method in IOSyntax is public, so I can't get rid of
Alternative without breaking binary compatibility.
I'm not deprecating Alternative for now because the sbtProj still has
xfatal warnings on. I think in many, if not most, cases, the Alternative
class makes the code more confusing as is often the case with custom
operators. The confusion is mitigated if the abstraction is used only in
the file in which it's defined.
The io library now delegates to swoval for io which generally handles
relative sources correctly (by converting them to absolute paths
internally). This test started failing after the io bump because of
this. It seemed to me that relative sources not working was not a
feature so I just removed the test.
If the managedSources task writes into an unmanaged source directory,
that would cause an infinite loop. I don't think it's worth doing out of
band task execution to try and prevent this.