This check doesn't actually make sense anymore with the new
ClassLoaderCache. In the old ClassLoaderCache, there were separate
layers for the snapshots and regular jars. The test was verifying that
only the snapshot layer was invalidated but now there is just one layer.
The dotty sbt-bridge module assumes that it's going to get a
URLClassLoader from which it can extract all of the classpath urls. That
doesn't work with the old wrapped classloader because its classpath was
empty. As a nasty workaround, I override the getURLs method, which is
where it gets the URLs from. After this change the
compiler-project/dotty-compiler-plugin test passes.
This commit adds a new ClassLoaderCache that builds on the
ClassLoaderCache that is present in zinc (and can be used to build an
instance of the zinc ClassLoaderCache to preserve compatibility). It
differs from the zinc classloader cache that it does not use direct
SoftReferences to classloaders. Instead, we create a wrapper loader
that can't load any classes and just delegates to its parent. This
allows us to add a thread that reaps the soft reference to the wrapper
loader. Crucially, we add a custom SoftReference class that has a strong
reference to the underlying classloader. This allows us to call close on
the strong reference.
The one issue with this approach is that we can't
rescue the jvm from crashing with an OOM: metaspace because the jvm
doesn't give us a chance to close and dereference the underlying
classloaders before it crashes. It WILL collect classloaders under
normal memory pressure, just not metaspace pressure. To fix this, I
check if the MaxMetaspaceSize is set via an MxBean and, if it is, we
fill the cache with regular soft references. We are going to change the
bash script to not set -XX:MaxMetaspaceSize by default so most builds
should probably end up correctly closing the classloaders after this
change. But we should break existing builds that set MaxMetaspaceSize
but don't crash.
As part of this commit, I audited all of the places where we were
instantiating ClassLoaderCache instances and instead pass in the
state's ClassLoaderCache instance. This reduces the total number of
classloaders created.
This commit finally fixes#241 by adding support for sbt to either
print a warning or automatically reload the project if the metabuild
sources have changed. To facilitate this, I introduce a new key,
metaBuildSourceOption which has three options:
1) IgnoreSourceChanges
2) WarnOnSourceChanges
3) ReloadOnSourceChanges
When the former is set, sbt will not check if the meta build sources
have changed. Otherwise, sbt will use the buildStructure / fileInputs to
get the ChangedFiles for the metabuild. If there are any changes, it
will either warn or reload the build depending on the value of
metaBuildSourceOption.
The mechanism for diffing the files is that I add a step to EvaluateTask
where, if the project has been loaded and
metaBuildSourceOption != IgnoreSourceChanges, we evaluate the needReload
task. If we need a reload, we return an error that indicates that a
Reload is necessary. When that error is detected, the MainLoop will
prepend "reload" to the pending commands for the state. Otherwise we
just print a warning and continue.
I benchmarked the overhead of this and it wasn't too bad. I generally
saw it taking 5-20ms to perform the check. Since this is only done once
per task evaluation run, I don't think it's a big deal. When
IgnoreSourceChanges is set, there is O(10us) overhead. If performance
does become a problem, we could add a global watch service and skip the
needReload evaluation if no files have been modified.
I removed the watchTrackMetaBuild key and made it so that the continuous
builds only track the meta build when
metaBuildSourceOption == ReloadOnSourceChanges
The newest version of io repackages a number of classes into the
sbt.nio.* packages. It also changes some of the semantics of glob
related apis. This commit updates all of the usages of the updated apis
within sbt but should have no functional difference.
Since the new watch implementation has yet to be widely deployed, we
should hold off on deprecating the old keys. They could still be
deprecated in a patch release or in 1.4.0.
This commit reworks the watch start message so that instead of printing
something like:
[info] [watch] 1. Waiting for source changes... (press 'r' to re-run the command, 'x' to exit sbt or 'enter' to return to the shell)
it instead prints something like:
[info] 1. Monitoring source files for updates...
[info] Project: filesJVM
[info] Command: compile
[info] Options:
[info] <enter>: return to the shell
[info] 'r': repeat the current command
[info] 'x': exit sbt
It will also print which path triggered the build.
I decided that it makes sense to move all of the new watch code out of
the Watched companion object since the Watched trait itself is now
deprecated. I don't really like having the new code in Watched.scala
mixed with the legacy code, so I pulled it all out and moved it into the
Watch object. Since we have to put all of the logic for the Continuous
object in main in order to access the sbt.Keys object, it makes sense to
move the logic out of main-command and into command so that most of the
watch related logic is in the same subproject.
This is a huge refactor of Watched. I produced this through multiple
rewrite iterations and it was too difficult to separate all of the
changes into small individual commits so I, unfortunately, had to make a
massive commit. In general, I have tried to document the source code
extensively both to facilitate reading this commit and to help with
future maintenance.
These changes are quite complicated because they provided a built-in
like api to a feature that is implemented like a plugin. In particular,
we have to manually do a lot of parsing as well as roll our own
task/setting evaluation because we cannot infer the watch settings at
project build time because we do not know a priori what commands the
user may watch in a given session. The dynamic setting and task
evaluation is mostly confined to the WatchSettings class in Continuous.
It feels dirty to do all of this extraction by hand, but it does seem to
work correctly with scopes.
At a high level this commit does four things:
1) migrate the watch implementation to using the InputGraph to collect
the globs that it needs to monitor during the watch
2) simplify WatchConfig to make it easier for plugin authors to write
their own custom watch implementations
3) allow configuration of the watch settings based on the task(s) that
is/are being run
4) adds an InputTask implemenation of watch.
Point #1 is mostly handled by Point #3 since I had to overhaul how _all_
of the watch settings are generated. InputGraph already handles both
transitive inputs and triggers as well as legacy watchSources so not
much additional logic is needed beyond passing the correct scoped keys
into InputGraph.
Point #3 require some structural changes. The watch settings cannot in
general be defined statically because we don't know a priori what tasks
the user will try and watch. To address this, I added code that will
extract the task keys for all of the commands that we are running. I
then manually extract the relevant settings for each command. Finally, I
aggregate those settings into a single WatchConfig that can be used to
actually implement the watch. The aggregation is generally
straightforward: we run all of the callbacks for each task and choose
the next watch state based on the highest priority Action that is
returned by any of the callbacks.
Because I needed Extracted to pull out the necessary settings, I was
forced to move a lot of logic out of Watched and into a new singleton,
Continuous, that exists in the main project (Watched is in the command
project). The public footprint of Continuous is tiny. Even though I want
to make the watch feature flexible for plugin authors, the
implementation and api remain a moving target so I do not want to be
limited by future binary compatibility requirements. Anyone who wants to
live dangerously can access the private[sbt] apis via reflection or by
adding custom code to the sbt package in their plugin (a technique I've
used in CloseWatch).
Point #2 is addressed by removing the count and lastStatus from the
WatchConfig callbacks. While these parameters can be useful, they are
not necessary to implement the semantics of a watch. Moreover, a status
boolean isn't really that useful and the sbt task engine makes it very
difficult to actually extract the previous result of the tasks that were
run. After this refactor, WatchConfig has a simpler api. There are fewer
callbacks to implement and the signatures are simpler. To preserve the
_functionality_ of making the count accessible to the user specifiable
callbacks, I still provided settings like watchOnInputEvent that accept
a count parameter, but the count is actually tracked externally to
Watched.watch and incremented every time the task is run.
Moreover, there are a few parameters of the watch: the logger and
transitive globs, that cannot be provided via settings. I provide
callback settings like watchOnStart that mirror the WatchConfig
callbacks except that they return a function from Continuous.Arguments
to the needed callback. The Continuous.aggregate function will check if
the watchOnStart setting is set and if it is, will pass in the needed
arguments. Otherwise it will use the default watchOnStart implementation
which simulates the existing behavior by tracking the iteration count in
an AtomicInteger and passing the current count into the user provided
callback. In this way, we are able to provide a number of apis to the
watch process while preserving the default behavior.
To implement #4, I had to change the label of the `watch` attribute key
from "watch" to "watched". This allows `watch compile` to work at the
sbt command line even thought it maps to the watchTasks key. The actual
implementation is almost trivial. The difference between an
InputTask[Unit] and a command is very small. The tricky part is that the
actual implementation requires applying mapTask to a delegate task that
overrides the Task's info.postTransform value (which is used to
transform the state after task evaluation). The actual postTransform
function can be shared by the continuous task and continuous command.
There is just a slightly different mechanism for getting to the state
transformation function.
I realized that Stamped.File was a bad interface that was really just an
implementation detail of external hooks. I updated the
GlobLister.{ all, unique } methods to return Seq[(Path, FileAttributes)]
rather than Stamped.File which is a much more natural api and one I
could see surviving the switch to nio based apis planned for
1.4.0/2.0.0. I also added a simple scripted test for glob listing. The
GlobLister.all method is implicitly tested all over the place since the
compile task uses it, but it's good to have an explicit test.
I decided that FileCacheEntry was a bad name because the methods did not
necessarily have anything to do with caching. Moreover, because it is
exposed in a public interface, it shouldn't be in the internal package.
Rather than exposing the FileEventMonitor.Event types, which are under
active development in the io repo, I am adding a new event trait to
FileCacheEntry. This trait doesn't expose any internal implementation
details.
The equals method didn't work exactly the way I thought. By delegating
to the equivStamp object in sbt we can be more confident that it is
actually comparing the stamp values and not object references or
some other equals implementation.
Right now, the sbt.internal.io.Source is something of a second class
citizen within sbt. Since sbt 0.13, there have been extension classes
defined that can convert a file to a PathFinder but no analog has been
introduced for sbt.internal.io.Source.
Given that sbt.internal.io.Source was not really intended to be part of
the public api (just look at its package), I think it makes sense to
just replace it with Glob. In this commit, I add extension
methods to Glob and Seq[Glob] that make it possible to easily
retrieve all of the files for a particular Glob within a task. The
upshot is that where previously, we'd have had to write something like:
watchSources += Source(baseDirectory.value / "src" / "main" / "proto", "*.proto", NothingFilter)
now we can write
watchGlobs += baseDirectory.value / "src" / "main" / "proto" * "*.proto"
Moreover, within a task, we can now do something like:
foo := {
val allWatchGlobs: Seq[File] = watchGlobs.value.all
println(allWatchSources.mkString("all watch source files:\n", "\n", ""))
}
Before we would have had to manually retrieve the files.
The implementation of the dsl uses the new GlobExtractor class which
proxies file look ups through a FileTree.Repository. This makes it so
that, by default, all file i/o using Sources will use the default
FileTree.Repository. The default is a macro that returns
`sbt.Keys.fileTreeRepository.value: @sbtUnchecked`. By doing it this
way, the default repository can only be used within a task definition
(since it delegates to `fileTreeRepository.value`). It does not,
however, prevent the user from explicitly providing a
FileTree.Repository instance which the user is free to instantiate
however they wish.
Bonus: optimize imports in Def.scala and Defaults.scala
The FileTreeViewConfig abstraction that I added was somewhat unwieldy
and confusing. The original intention was to provide users with a lot of
flexibility in configuring the global file tree repository used by sbt.
I don't think that flexibility is necessary and it was both conceptually
complicated and made the implementation complex. In this commit, I add a
new boolean flag enableGlobalCachingFileTreeRepository that toggles
which kind of FileTreeRepository to use globally.
There are actually three kinds of repositories that could be returned:
1) FileTreeRepository.default -- this caches the entire file system
tree it hooks into the cache's event callbacks to create a file event
monitor. It will be used if enableGlobalCachingFileTreeRepository is
true and Global / pollingGlobs := Nil
2) FileTreeRepository.hybrid -- similar to FileTreeRepository.default
except that it will not cache any files that are included in
Global / pollingGlobs. It will be used if
enableGlobalCachingFileTreeRepository is true and
Global / pollingGlobs is non empty
3) FileTreeRepository.legacy -- does not cache any of the file system
tree, but does maintain a persistent file monitoring process that is
implemented with a WatchServiceBackedObservable. Because it doesn't
poll, in general, it's ok to leave the monitoring on in the
background. One reason to use this is that if there are any issues
with the cache being unable to accurately mirror the underlying file
system tree, this repository will always poll the file system
whenever sbt requests the entries for a given glob. Moreover, the
file system tree implementation is very similar to the implementation
that was used in 1.2.x so this gives users a way to almost fully opt
back in to the old behavior.
This new version of io breaks source and binary compatibility everywhere
that uses the register(path: Path, depth: Int) method that is defined on
a few interfaces because I changed the signature to register(glob:
Glob). I had to convert to using a glob everywhere that register was
called.
I also noticed a number of places where we were calling .asFile on a
file. This is redundant because asFile is an extension method on File
that just returns the underlying file.
Finally, I share the IOSyntax trait from io in AllSyntax. There was more
or less a TODO suggesting this change. The one hairy part is the
existence of the Alternative class. This class has unfortunately somehow
made it into the sbt package object. While I doubt many plugins are
using this, it doesn't seem worth breaking binary compatibility to get
rid of it. The issue is that while Alternative is defined private[sbt],
the alternative method in IOSyntax is public, so I can't get rid of
Alternative without breaking binary compatibility.
I'm not deprecating Alternative for now because the sbtProj still has
xfatal warnings on. I think in many, if not most, cases, the Alternative
class makes the code more confusing as is often the case with custom
operators. The confusion is mitigated if the abstraction is used only in
the file in which it's defined.
Previously, we were leaking the internal details of incremental
compilation to users by defining FileTree(DataView|Repository)[Stamp].
To avoid this, I introduce the new class FileCacheEntry that is quite
similar to Stamp except defined using scala Options rather than java
Optionals. The implementation class just delegates to an actual Stamp
and I provided a private[sbt] ops class that adds a
method `stamp` to FileCacheEntry. This will usually just extract the
stamp from the implementation class. This allows us to use
FileCacheEntry almost interchangeably with Stamp while still avoiding
exposing users to Stamp.
In the FileTreeDataView use case, we were previously working with
FileTreeDataView[Stamped], which actually contained a lot of redundant
information because FileTreeDataView.Entry[_] has a toTypedPath method
that could be used to read the path related fields in Stamped. Instead,
we can just return the Stamp itself in FileTreeDataView.list* methods
and convert to Stamped.File where needed (i.e. in ExternalHooks).
Also move BasicKeys.globalFileTreeView to Keys since it isn't actually
used in the main-command project.
We want the user to be able to invalidate the classloader cache in the
event that it somehow gets in a bad state. The cache is, however,
defined in multiple configurations, so there are in fact many
ClassLoaderCache instances that are managed by sbt. To make this sane, I
add a global cache that is keyed by a TaskKey[_] and can return
arbitrary data back. Invalidating all of the ClassLoaderCache instances
is then as straightforward as just replacing the TaskRepository
instance.
I also went ahead and unified the management of the global file tree
repository. Instead of having to specifically clear the file tree
repository or the classloader cache, the user can now invalidate both
with the new clearCaches command.
I noticed that debugging settings that return functions is annoying
because often the setting is initialized as an anonymous function with a
useless toString method. To improve the debugging for users, I'm adding
a number of wrapper classes for functions that override the default
toString with a provided label.
I then used these functions to label all of the anonymous functions in
Watched.scala.
It was a mistake to disallow trailing semicolons for multi commands.
Firstly this was a mistake because previous versions of sbt supported a
trailing semi colon. It was also inconsistent with how commands work in
a regular shell (e.g. bash or zsh).
This file was littered with intellij warnings due to public members and
fields not having their types annotated. Although in this case it didn't
really matter, it is good practice to always annotate public methods and
fields so that they can evolve in a binary compatible way.
It has long been a frustration of mine that it is necessary to prepend
multiple commands with a ';'. In this commit, I relax that restriction.
I had to reorder the command definitions so that multi comes before act.
This was because if the multi command did not have a leading semicolon,
then it would be handled by the action parser before the multi command
parser had a shot at it. Sadness ensued.
Presently the multi command parser doesn't work correctly if one of the
commands includes a string literal. For example, suppose that there is
an input task defined name "bash" that shells out and runs the input.
Then the following does not work with the current multi command parser:
; bash "rm target/classes/Foo.class; touch src/main/scala/Foo.scala"; comple
Note that this is a real use case that has caused me issues in the past.
The problem is that the semicolon inside of the quote gets interpreted
as a command separator token. To fix this, I rework the parser so that
it consumes string literals and doesn't modify them. By using
StringEscapable, I allow the string to contain quotation marks itself.
I couldn't write a scripted test for this because in a command like
`; foo "bar"; baz`, the quotes around bar seem to get stripped. This
could be fixed by adding an alternative to StringEscapable that matches
an escaped string, but that is more work than I'm willing to do right
now.
I discovered that when I ran multi-commands with '~' that if there was a
space between the ';' and the command, then the parsing of the command
would fail and the watch would abort. To fix this, I refactor
Watched.watch to use the multi command parser and, if that parser fails,
we fallback on a single command.
Prior to this commit, there was no unit testing of the parser for
multiple commands. I wanted to make some improvements to the parser, so
I reworked the implementation to be testable. This change also allows
the multiParserImpl method to be shared with Watched.watch, which I will
also update in a subsequent commit.
There also were no explicit scripted tests for multiple commands, so I
added one that I will augment in later commits.
On windows* it was possible to get into a loop where the build would
continually restart because for some reason the build.sbt file would get
touched during test (I did not see this behavior on osx). Thankfully,
the repository keeps track of the file hash and when we detect that the
build file has been updated, we check the file hash to see if it
actually changed.
Note that had this bug shipped, it would have been fixable by overriding
the watchOnEvent task in user builds.
The loop would occur if I ran ~filesJVM/test in
https://github.com/swoval/swoval. It would not occur if I ran
test:compile, so the fact that the build file is being touched seems
to be related to the test run itself.
For whatever reason, I couldn't get jline to work on windows, so I'm
disabling the re-run with 'r' feature. This can almost surely be fixed,
but the way I was invoking jline was blocking the continuous build from
exiting when the user pressed enter.
Sometimes a user may want to rerun their task even if the source files
haven't changed. Presently this is a little annoying because you have to
hit enter to stop the build and then up arrow or <ctrl+r> plus enter to
rebuild. It's more convenient to just be able to press the 'r' key to
re-run the task.
To implement this, I had to make the watch task set up a jline terminal
so that System.in would be character buffered instead of line buffered.
Furthermore, I took advantage of the NonBlockingInputStream
implementation provided by jline to wrap System.in. This was necessary
because even with the jline terminal, System.in.available doesn't return
> 0 until a newline character is entered. Instead, the
NonBlockingInputStream does provide a peek api with a timeout that will
return the next unread key off of System.in if there is one available.
This can be use to proxy available in the WrappedNonBlockingInputStream.
To ensure maximum user flexibility, I also update the watchHandleInput Key to
take an InputStream and return an Action. This setting will now receive
the wrapped System.in, which will allow the user to create their own
keybindings for watch actions without needing to use jline themselves.
Future work might make it more straightforward to go back to a line
buffered input if that is what the user desires.
It is not always possible to monitor a directory using OS file system
events. For example, inotify does not work with nfs. To work around
this, I add support for a hybrid FileTreeViewConfig that caches a
portion of the file system and monitors it with os file system
notification, but that polls a subset of the directories. When we query
the view using list or listEntries, we will actually query the file
system for the polling directories while we will read from the cache for
the remainder. When we are not in a continuous build (~ *), there is no
polling of the pollingDirectories but the cache will continue to update
the regular directories in the background. When we are in a continuous
build, we use a PollingWatchService to poll the pollingDirectories and
continue to use the regular repository callbacks for the other
directories.
I suspect that #4179 may be resolved by adding the directories for which
monitoring is not working to the pollingDirectories task.
Now that we have the fileTreeView task, we can generalized the process
of collecting files from the view (which may or may not actually cache
the underlying file tree). I moved the implementation of collectFiles
and addBaseSources into the new FileManagement object because Defaults
is already too large of a file. When we query the view, we also need to
register the directory we're listing because if the underlying view is a
cache, we must call register before any entries will be available.
Because FileTreeDataView doesn't have a register method, I implement
registration with a simple implicit class that pattern matches on the
underlying type and only calls register if it is actually a
FileRepository.
A side effect of this change is that the underlying files returned by
collectFiles and appendBaseSources are StampedFile instances. This is so
that in a subsequent commit, I can add a Zinc external hook that will
read these stamps from the files in the source input array rather than
compute the stamp on the fly. This leads to a substantial reduction in
Zinc startup time for projects with many source files. The file filters
also may be applied more quickly because the isDirectory property (which
we check for all source files) is read from a cached value rather than
requiring a stat.
I had to update a few of the scripted tests to use the `1.2.0`
FileTreeViewConfig because those tests would copy a file and then
immediately re-compile. The latency of cache invalidation is O(1-10ms),
but not instantaneous so it's necessary to either use a non-caching
FileTreeView or add a sleep between updates and compilation. I chose the
former.
Every time that the compile task is run, there are potentially a large
number of iops that must occur in order for sbt to generate the source
file list as well as for zinc to check which files have changed since
the last build. This can lead to a noticeable delay between when a build
is started (either manually or by triggered execution) and when
compilation actually begins. To reduce this latency, I am adding a
global view of the file system that will be stored in
BasicKeys.globalFileTreeView.
To make this work, I introduce the StampedFile trait, which augments the
java.io.File class with a stamp method that returns the zinc stamp for
the file. For source files, this will be a hash of the file, while for
binaries, it is just the last modified time. In order to gain access to
the sbt.internal.inc.Stamper class, I had to append addSbtZinc to the
commandProj configurations.
This view may or may not use an in-memory cache of the file system tree
to return the results. Because there is always the risk of the cache
getting out of sync with the actual file system, I both make it optional
to use a cache and provide a mechanism for flushing the cache. Moreover,
the in-memory cache implementation in sbt.io, which is backed by a
swoval FileTreeRepository, has the property that touching a monitored
directory invalidates the entire directory within the cache, so the
flush command isn't even strictly needed in general.
Because caching is optional, the global is of a FileTreeDataView, which
doesn't specify a caching strategy. Subsequent commits will make use of
this to potentially speed up incremental compilation by caching the
Stamps of the source files so that zinc does not need to compute the
hashes itself and will allow for continuous builds to use the cache to
monitor events instead of creating a new, standalone FileEventMonitor.
It may be useful for users to be able to return their own custom
Action types in the Config callbacks. For a contrived example, a user
could add a jar file in the .ivy2 directory to the watch sources and
trigger a reboot full when that jar changes.