Windows io really doesn't handle concurrent readers and writers all that
well. Using the LegacyFileTreeRepository was problematic in windows
scripted tests because even though the repository implementation did not
use the cache in its list methods, it did persistently monitor the
directories that were registered. The monitor has to do a lot of
io on a background thread to maintain the cache. This caused io
contention that would cause IO.createDirectory to fail with an obscure
AccessDeniedException. The way to avoid this is to prevent the
background io from occurring at all.
I don't necessarily think this will impact most users running sbt
interactively with a cache, but it did cause scripted tests to fail. For
that reason I made the default in non-interactive/shell use cases on
windows to be a PollingFileRepository which never monitors the file
system except when we are in a watch. The LegacyFileTreeRepository works
fine on mac and linux which have a more forgiving file system.
To make this work, I had to add FileManagement.toMonitoringRepository.
There are now two kinds of repositories that cannot monitor on their
own: HybridPollingFileTreeRepository and PollingFileRepository. The
FileManagement.toMonitoringRepository makes a new repository that turns
on monitoring for those two repository types and disables the close
method on all other repositories so that closing the FileEventMonitor
does not actually close the global file repository.
I ran into a couple of issues with the clean implementation. I changed
the logging to print to stdout instead of streams if enabled. I also
added a helper, Clean.deleteContents that recursively deletes all of the
contents of a directory except for those that match the exclude filter
parameter.
Using a normal logger was a bad idea because we are actually deleting
the target/streams directory when running clean.
The previous implementation worked by getting the full list of files to
delete, reverse sorting it and then deleting every element in the list.
While this can work well it many circumstances, if the directory is
still being written to during the recursive deletion, then we could miss
files that were added after we fetched all of the files. The new version
lazily lists the subdirectories as needed.
The Defaults.scala file has a lot going on. I am trying to generally
follow the pattern of implementing the default task implementation in a
different file and just adding the appropriate declarations in
Defaults.scala.
This rewroks the cleanTask so that it only removes a subset of the
files in the target directory. To do this, I add a new task, outputs,
that returns the glob representation of the possible output files for
the task. It must be a task because some outputs will depend on streams.
For each project, the default outputs are all of the files in
baseDirectory / target.
Long term, we could enhance the clean task to be automatically generated
in any scope (as an input task). We could then add the option for the
task scoped clean to delete all of the transitive outputs of the class.
That is beyond the scope of this commit, however.
I copied the scripted tests from #3678 and added an additional test to
make sure that the manage source directory was explicitly cleaned.
The clean task is unreasonably slow because it does a lot of redundant
io. In this commit, I update clean to be implemented using globs. This
allows us to (optionally) route io through the file system cache. There
is a significant performance improvement to this change. Currently,
running clean on the sbt project takes O(6 seconds) on my machine. After
this change, it takes O(1 second).
To implement this, I added a new setting cleanKeepGlobs to replace
cleanKeepFiles. I don't think that cleanKeepFiles returning Seq[File] is
a big deal for performance because, by default, it just contains the
history file so there isn't much benefit to accessing a single file
through the cache. The reason I added the setting was more for
consistency and to help push people towards globs in their own task
implementations.
Part of the performance improvement comes from inverting the problem.
Before we would walk the file system tree from the base and recursively
delete leafs and nodes in a depth first traversal. Now we collect all of
the files that we are interested in deleting in advance. We then sort
the results lexically by path name and then perform the deletions in
that order. Because children will always comes first in this scheme,
this will generally allow us to delete a directory.
There is an edge case that if files are created in a subdirectory after
we've created the list to delete, but before the subdirectory is
deleted, then that subdirectory will not be deleted. In general, this
will tend to impact target/streams because writes occur to
target/streams during traversal. I don't think this really matters for
most users. If the target directory is being concurrently modified with
clean, then the user is doing something wrong.
To ensure legacy compatibility, I re-implement cleanKeepFiles to return
no files. Any plugin that was appending files to the cleanKeepFiles task
with `+=` or `++=` will continue working as before because I explicitly
add those files to the list to delete. I updated the actions/clean-keep
scripted test to use both cleanKeepFiles and cleanKeepGlobs to ensure
both tasks are correctly used.
Bonus: add debug logging of all deleted files
Right now, the sbt.internal.io.Source is something of a second class
citizen within sbt. Since sbt 0.13, there have been extension classes
defined that can convert a file to a PathFinder but no analog has been
introduced for sbt.internal.io.Source.
Given that sbt.internal.io.Source was not really intended to be part of
the public api (just look at its package), I think it makes sense to
just replace it with Glob. In this commit, I add extension
methods to Glob and Seq[Glob] that make it possible to easily
retrieve all of the files for a particular Glob within a task. The
upshot is that where previously, we'd have had to write something like:
watchSources += Source(baseDirectory.value / "src" / "main" / "proto", "*.proto", NothingFilter)
now we can write
watchGlobs += baseDirectory.value / "src" / "main" / "proto" * "*.proto"
Moreover, within a task, we can now do something like:
foo := {
val allWatchGlobs: Seq[File] = watchGlobs.value.all
println(allWatchSources.mkString("all watch source files:\n", "\n", ""))
}
Before we would have had to manually retrieve the files.
The implementation of the dsl uses the new GlobExtractor class which
proxies file look ups through a FileTree.Repository. This makes it so
that, by default, all file i/o using Sources will use the default
FileTree.Repository. The default is a macro that returns
`sbt.Keys.fileTreeRepository.value: @sbtUnchecked`. By doing it this
way, the default repository can only be used within a task definition
(since it delegates to `fileTreeRepository.value`). It does not,
however, prevent the user from explicitly providing a
FileTree.Repository instance which the user is free to instantiate
however they wish.
Bonus: optimize imports in Def.scala and Defaults.scala
The FileTreeViewConfig abstraction that I added was somewhat unwieldy
and confusing. The original intention was to provide users with a lot of
flexibility in configuring the global file tree repository used by sbt.
I don't think that flexibility is necessary and it was both conceptually
complicated and made the implementation complex. In this commit, I add a
new boolean flag enableGlobalCachingFileTreeRepository that toggles
which kind of FileTreeRepository to use globally.
There are actually three kinds of repositories that could be returned:
1) FileTreeRepository.default -- this caches the entire file system
tree it hooks into the cache's event callbacks to create a file event
monitor. It will be used if enableGlobalCachingFileTreeRepository is
true and Global / pollingGlobs := Nil
2) FileTreeRepository.hybrid -- similar to FileTreeRepository.default
except that it will not cache any files that are included in
Global / pollingGlobs. It will be used if
enableGlobalCachingFileTreeRepository is true and
Global / pollingGlobs is non empty
3) FileTreeRepository.legacy -- does not cache any of the file system
tree, but does maintain a persistent file monitoring process that is
implemented with a WatchServiceBackedObservable. Because it doesn't
poll, in general, it's ok to leave the monitoring on in the
background. One reason to use this is that if there are any issues
with the cache being unable to accurately mirror the underlying file
system tree, this repository will always poll the file system
whenever sbt requests the entries for a given glob. Moreover, the
file system tree implementation is very similar to the implementation
that was used in 1.2.x so this gives users a way to almost fully opt
back in to the old behavior.
This new version of io breaks source and binary compatibility everywhere
that uses the register(path: Path, depth: Int) method that is defined on
a few interfaces because I changed the signature to register(glob:
Glob). I had to convert to using a glob everywhere that register was
called.
I also noticed a number of places where we were calling .asFile on a
file. This is redundant because asFile is an extension method on File
that just returns the underlying file.
Finally, I share the IOSyntax trait from io in AllSyntax. There was more
or less a TODO suggesting this change. The one hairy part is the
existence of the Alternative class. This class has unfortunately somehow
made it into the sbt package object. While I doubt many plugins are
using this, it doesn't seem worth breaking binary compatibility to get
rid of it. The issue is that while Alternative is defined private[sbt],
the alternative method in IOSyntax is public, so I can't get rid of
Alternative without breaking binary compatibility.
I'm not deprecating Alternative for now because the sbtProj still has
xfatal warnings on. I think in many, if not most, cases, the Alternative
class makes the code more confusing as is often the case with custom
operators. The confusion is mitigated if the abstraction is used only in
the file in which it's defined.
Fixes#4461
This opens up ExecuteProgress API that's been around under private[sbt].
Since the state passing mechanism hasn't been used, I got rid of it.
The build user can configure the build using two keys Boolean `taskProgress` and `State => Seq[TaskProgress]` `progressReports`. `useSuperShell` is lightweight key on/off switch for the super shell that can be used as follows:
```scala
Global / SettingKey[Boolean]("useSuperShell") := false
```
I often find that when I run a command it takes a long time to start up
because sbt triggers a full gc. To improve the ux, I update the command
exchange to run full gc only once while it's waiting for a command to
run and only after the user has been idle for at least one minute.
Bonus: optimize imports
Previously, we were leaking the internal details of incremental
compilation to users by defining FileTree(DataView|Repository)[Stamp].
To avoid this, I introduce the new class FileCacheEntry that is quite
similar to Stamp except defined using scala Options rather than java
Optionals. The implementation class just delegates to an actual Stamp
and I provided a private[sbt] ops class that adds a
method `stamp` to FileCacheEntry. This will usually just extract the
stamp from the implementation class. This allows us to use
FileCacheEntry almost interchangeably with Stamp while still avoiding
exposing users to Stamp.
In the FileTreeDataView use case, we were previously working with
FileTreeDataView[Stamped], which actually contained a lot of redundant
information because FileTreeDataView.Entry[_] has a toTypedPath method
that could be used to read the path related fields in Stamped. Instead,
we can just return the Stamp itself in FileTreeDataView.list* methods
and convert to Stamped.File where needed (i.e. in ExternalHooks).
Also move BasicKeys.globalFileTreeView to Keys since it isn't actually
used in the main-command project.
Resident compilation actually works pretty well most of the time*,
but if there ever is an issue with the cached compilation, we should be
able to easily clear the cache.
* I've only had issues when package objects are involved
If the managedSources task writes into an unmanaged source directory,
that would cause an infinite loop. I don't think it's worth doing out of
band task execution to try and prevent this.
The community build was broken for some projects because I broke builds
that relied on the unscoped definition of `runner`. To preserve legacy
behavior, I restore the old unscoped behavior and append the new scoped
runners that used the layered classloaders. This makes more sense
because the layered classloaders were specifically designed for the
Runtime and Test configurations and may not make sense in other
contexts.
I noticed that sometimes when running scripted tests that I'd run out of
metaspace. I believe that this may be due to the caffeine cache leaking
classloaders. Regardless, it's a good idea to clear the cache whenever
we shutdown the command exchange or reload the state.
Previously, the ClassLoaderLayeringStrategy was set globally. This
didn't really make sense because the Runtime and Test configs had
different strategies available (Test being a superset of Runtime).
Instead, we now set the layering strategy in the Runtime and Test
configurations directly. In doing this, we can eliminate the Default
ClassLoaderLayeringStrategy. Previously this had existed so that we
could set the layering strategy globally and have it do the right thing
in both test and runtime.
To implement this, I factored out the logic for generating the layered
classloader in the test task and shared it with the runtime task. I did
this because I realized that Test / run is a thing. Previously I had
been operating under the assumption that the runner would never include
the test dependencies. Once I realized this, it made sense to combine
the logic in both tasks.
As a bonus, I only allow the layering strategies that explicitly make
sense to be set in each configuration. If the user sets an invalid
strategy, an error will be thrown that specifies the valid strategies
for the task.
I also added ScalaInstance as an option for the runtime layer. It was an
oversight that this was left out.
In code review, @eed3si9n suggested that I switch to a more verbose and
descriptive naming scheme. In addition to trying to make layers more
descriptive, I also made the various layer case objects extend the
relevant layers so it's more clear what the layer should look like.
I was seeing spurious travis failures and I finally tracked it down to
the fact that in some cases, the project metabuild would sometimes use a
caching file tree repository instead of a polling repository. This
caused problems because the caching repository can take a few milliseconds
to detect changes in a directory. Because scripted copies the project
sources to the temporary test directory, it was possible for the project
meta build compilation to be initiated before the cache was aware of all
of the files.
The reason this happened was because scripted would create a state where
the remaining commands looked like:
List(sbtPopOnFailure, resumeFromFailure, notifyUsersAboutShell, iflast shell, ~compile, < 41684)
The ~compile command was causing the continuous flag to get set to true
which caused the default file tree repository task to return the caching
version. The reason for the continuous flag was so that when sbt is
started in a non-interactive mode where the command is to be repeated,
then we use the caching file tree repository. To support this use case,
we just need to check that the last command begins with `~`, not that
_any_ command begins with `~`.
We want the user to be able to invalidate the classloader cache in the
event that it somehow gets in a bad state. The cache is, however,
defined in multiple configurations, so there are in fact many
ClassLoaderCache instances that are managed by sbt. To make this sane, I
add a global cache that is keyed by a TaskKey[_] and can return
arbitrary data back. Invalidating all of the ClassLoaderCache instances
is then as straightforward as just replacing the TaskRepository
instance.
I also went ahead and unified the management of the global file tree
repository. Instead of having to specifically clear the file tree
repository or the classloader cache, the user can now invalidate both
with the new clearCaches command.
Using the data structures that I added in the previous commits, it is
now possible to rework the run and test task to use (configurable)
layered class loaders. The layering strategy is globally set to
LayeringStrategy.Default. The default strategy leads to what is
effectively a three layered ClassLoader for the both the test and run
tasks. The first layer contains the scala instance (and test framework
loader in the test task). The second layer contains all of the
dependencies for the configuration while the third layer contains the
project artifacts.
The layering strategy is very easily changed both at the Global or
Configuration level, e.g. adding
Test / layeringStrategy := LayeringStrategy.Flat
to the project build.sbt will make the test task not even use the scala
instance and instead a create a single layer containing the full
classpath of the test task.
I also tried to ensure that all of the ClassLoaders have good toString
overrides so that it's easy to see how the ClassLoader is constructed
with, e.g. `show testLoader`, in the sbt console.
In this commit, the ClassLoaderCache instances are settings. In the next
commit, I make them tasks so that we can easily clear out the caches
with a command.
This introduces a new trait LayeringStrategy that is used to configure
how sbt constructs the ClassLoaders used by the run and test tasks. In
addition to defining the various options, I try to give a good high
level overview of the problem that the LayeringStrategy is intended to
address in its scaladoc.
In order to speed up the start up time of the test and run tasks, I'm
introducing a ClassLoaderCache that can be used to avoid reloading the
classes in the project dependencies (which includes the scala library).
I made the api as minimal as possible so that we can iterate on the
implementation without breaking binary compatibility. This feature will
be gated on a feature flag, so I'm not concerned with the cache class
loaders being useable in every user configuration. Over time, I hope
that the CachedClassLoaders will be a drop in replacement for the
existing one-off class loaders*.
The LayeredClassLoader was adapted from the NativeCopyLoader. The main
difference is that the NativeCopyLoader extracts named shared libraries
into the task temp directory to ensure that the ephemeral libraries are
deleted after each task run. This is a problem if we are caching the
ClassLoader so for LayeredClassLoader I track the native libraries that
are extracted by the loader and I delete them either when the loader is
explicitly closed or in a shutdown hook.
* This of course means that we both must layer the class loaders
appropriately so that the project code is in a layer above the cached
loaders and we must correctly invalidate the cache when the project, or
its dependencies are updated.
I am going to be introducing multiple caches throughout sbt and I am
going to build these features out using this simple Repository
interface. The idea is that we access data by some key through the
Repository. This allows us to use the strategy pattern to easily switch
the runtime implementation of how to get the data.
I am going to add a classloader cache to improve the startup performance
of the run and test tasks. To prevent the classloader cache from having
unbounded size, I'm adding a simple LRUCache implementation to sbt. An
important characteristic of the implementation of the cache is that when
entries are evicted, we run a callback to cleanup the entry. This allows
us to automatically cleanup any resources created by the entry.
This is a pretty naive implementation that uses an array of entries that
it manipulates as elements are removed/accessed. In general, I expect
these caches to be quite small <= 4 elements, so the storage overhead /
performance of the simple implementation should be quite good. If
performance ever becomes an issue, we can specialzed LRUCache.apply to
use a different implementation for caches with large limits.
The `sbt-server` was prepending a new probem and not appending.
The result was a `textDocument/publishDiagnostics` notification
containing a inverted list of problems compare to what was show in the
sbt console.