In code review, @eed3si9n suggested that I switch to a more verbose and
descriptive naming scheme. In addition to trying to make layers more
descriptive, I also made the various layer case objects extend the
relevant layers so it's more clear what the layer should look like.
I was seeing spurious travis failures and I finally tracked it down to
the fact that in some cases, the project metabuild would sometimes use a
caching file tree repository instead of a polling repository. This
caused problems because the caching repository can take a few milliseconds
to detect changes in a directory. Because scripted copies the project
sources to the temporary test directory, it was possible for the project
meta build compilation to be initiated before the cache was aware of all
of the files.
The reason this happened was because scripted would create a state where
the remaining commands looked like:
List(sbtPopOnFailure, resumeFromFailure, notifyUsersAboutShell, iflast shell, ~compile, < 41684)
The ~compile command was causing the continuous flag to get set to true
which caused the default file tree repository task to return the caching
version. The reason for the continuous flag was so that when sbt is
started in a non-interactive mode where the command is to be repeated,
then we use the caching file tree repository. To support this use case,
we just need to check that the last command begins with `~`, not that
_any_ command begins with `~`.
We want the user to be able to invalidate the classloader cache in the
event that it somehow gets in a bad state. The cache is, however,
defined in multiple configurations, so there are in fact many
ClassLoaderCache instances that are managed by sbt. To make this sane, I
add a global cache that is keyed by a TaskKey[_] and can return
arbitrary data back. Invalidating all of the ClassLoaderCache instances
is then as straightforward as just replacing the TaskRepository
instance.
I also went ahead and unified the management of the global file tree
repository. Instead of having to specifically clear the file tree
repository or the classloader cache, the user can now invalidate both
with the new clearCaches command.
Using the data structures that I added in the previous commits, it is
now possible to rework the run and test task to use (configurable)
layered class loaders. The layering strategy is globally set to
LayeringStrategy.Default. The default strategy leads to what is
effectively a three layered ClassLoader for the both the test and run
tasks. The first layer contains the scala instance (and test framework
loader in the test task). The second layer contains all of the
dependencies for the configuration while the third layer contains the
project artifacts.
The layering strategy is very easily changed both at the Global or
Configuration level, e.g. adding
Test / layeringStrategy := LayeringStrategy.Flat
to the project build.sbt will make the test task not even use the scala
instance and instead a create a single layer containing the full
classpath of the test task.
I also tried to ensure that all of the ClassLoaders have good toString
overrides so that it's easy to see how the ClassLoader is constructed
with, e.g. `show testLoader`, in the sbt console.
In this commit, the ClassLoaderCache instances are settings. In the next
commit, I make them tasks so that we can easily clear out the caches
with a command.
This introduces a new trait LayeringStrategy that is used to configure
how sbt constructs the ClassLoaders used by the run and test tasks. In
addition to defining the various options, I try to give a good high
level overview of the problem that the LayeringStrategy is intended to
address in its scaladoc.
In order to speed up the start up time of the test and run tasks, I'm
introducing a ClassLoaderCache that can be used to avoid reloading the
classes in the project dependencies (which includes the scala library).
I made the api as minimal as possible so that we can iterate on the
implementation without breaking binary compatibility. This feature will
be gated on a feature flag, so I'm not concerned with the cache class
loaders being useable in every user configuration. Over time, I hope
that the CachedClassLoaders will be a drop in replacement for the
existing one-off class loaders*.
The LayeredClassLoader was adapted from the NativeCopyLoader. The main
difference is that the NativeCopyLoader extracts named shared libraries
into the task temp directory to ensure that the ephemeral libraries are
deleted after each task run. This is a problem if we are caching the
ClassLoader so for LayeredClassLoader I track the native libraries that
are extracted by the loader and I delete them either when the loader is
explicitly closed or in a shutdown hook.
* This of course means that we both must layer the class loaders
appropriately so that the project code is in a layer above the cached
loaders and we must correctly invalidate the cache when the project, or
its dependencies are updated.
I am going to be introducing multiple caches throughout sbt and I am
going to build these features out using this simple Repository
interface. The idea is that we access data by some key through the
Repository. This allows us to use the strategy pattern to easily switch
the runtime implementation of how to get the data.
I am going to add a classloader cache to improve the startup performance
of the run and test tasks. To prevent the classloader cache from having
unbounded size, I'm adding a simple LRUCache implementation to sbt. An
important characteristic of the implementation of the cache is that when
entries are evicted, we run a callback to cleanup the entry. This allows
us to automatically cleanup any resources created by the entry.
This is a pretty naive implementation that uses an array of entries that
it manipulates as elements are removed/accessed. In general, I expect
these caches to be quite small <= 4 elements, so the storage overhead /
performance of the simple implementation should be quite good. If
performance ever becomes an issue, we can specialzed LRUCache.apply to
use a different implementation for caches with large limits.
The `sbt-server` was prepending a new probem and not appending.
The result was a `textDocument/publishDiagnostics` notification
containing a inverted list of problems compare to what was show in the
sbt console.
It drives me crazy that in intellij when I do the go to class task that
TestBuild.Keys comes up before Keys. Given how central Keys is to sbt,
it doesn't seem like a good idea to alias that particular class name.
It was becoming a pain to work on these files in intellij because the
auto-import feature would implicitly optimize all of the imports in
these files, leading to a large diff. I'd then have to go and manually
add the import that I care about. This change does add some wildcard
imports, which I don't always love, but these files are so unwieldy
already that I think it's worth it to have the imports follow the format
preferred by intellij.
It has long been a frustration of mine that it is necessary to prepend
multiple commands with a ';'. In this commit, I relax that restriction.
I had to reorder the command definitions so that multi comes before act.
This was because if the multi command did not have a leading semicolon,
then it would be handled by the action parser before the multi command
parser had a shot at it. Sadness ensued.
In #4446, @japgolly reported that in some projects, if a parent project
was broken, then '~' would immediately exit upon startup. I tracked it
down to this managed sources filter. The idea of this filter is to avoid
getting stuck in a build loop if managedSources writes into an unmanaged
source directory. If the (managedSources in ThisScope).value line
failed, however, it would cause the watchSources, and by delegation,
watchTransitiveSources task to fail. The fix is to only create this
filter if the managedSources task succeeds.
I'm not 100% sure if we shouldn't just get rid of this filter entirely
and just document that '~' will probably loop if a build writes the
result of managedSources into an unmanaged source directory.
Fixes#4437
Until now, sbt was resolved twice once by the launcher, and the second time by the metabuild.
This excludes sbt from the metabuild graph, and instead uses app classpath from the launcher.
Fixes#3436
This implements isMetaBuild setting that is explicitly for meta build only,
unlike sbtPlugin setting which can be used for both meta build and plugin development purpose.
I noticed that when using the latest nightly, triggered execution would
fail to work if I switched projects with, e.g. ++2.10.7. This was
because the background thread that filled the file cache was incorrectly shutdown.
To fix this, we just need to close whatever view is cached in the
globalFileTreeView attribute in the exit hook rather than the view
created by the method.
After making this change and publishing a local SNAPSHOT build, I was
able to switch projects with ++ and have triggeredExecution continue to
work.
On windows* it was possible to get into a loop where the build would
continually restart because for some reason the build.sbt file would get
touched during test (I did not see this behavior on osx). Thankfully,
the repository keeps track of the file hash and when we detect that the
build file has been updated, we check the file hash to see if it
actually changed.
Note that had this bug shipped, it would have been fixable by overriding
the watchOnEvent task in user builds.
The loop would occur if I ran ~filesJVM/test in
https://github.com/swoval/swoval. It would not occur if I ran
test:compile, so the fact that the build file is being touched seems
to be related to the test run itself.
It was possible that on startup, when this function was first invoked,
that the default boot commands are present. This was a problem because
the global file repository is instantiated using the value of this task.
When we start a continuous build, this task gets run again to evaluate
again.
When sbt is started without an implicit task list, then the task is
implicitly shell as indicated by the command "iflast shell". We can use
this to determine whether or not to use the global file system cache or
not.
Ideally we use the FileTreeRepository for interactive sessions by
default. A continuous build is effectively interactive, so I'd like that
case to also use the file tree repository. To avoid breaking scripted
tests, many of which implicitly expect file tree changes to be
instantaneously available, we set interactive to true only if we are not
in a scripted run, which can be verified by checking that the commands
contains "setUpScripted".
Sometimes a user may want to rerun their task even if the source files
haven't changed. Presently this is a little annoying because you have to
hit enter to stop the build and then up arrow or <ctrl+r> plus enter to
rebuild. It's more convenient to just be able to press the 'r' key to
re-run the task.
To implement this, I had to make the watch task set up a jline terminal
so that System.in would be character buffered instead of line buffered.
Furthermore, I took advantage of the NonBlockingInputStream
implementation provided by jline to wrap System.in. This was necessary
because even with the jline terminal, System.in.available doesn't return
> 0 until a newline character is entered. Instead, the
NonBlockingInputStream does provide a peek api with a timeout that will
return the next unread key off of System.in if there is one available.
This can be use to proxy available in the WrappedNonBlockingInputStream.
To ensure maximum user flexibility, I also update the watchHandleInput Key to
take an InputStream and return an Action. This setting will now receive
the wrapped System.in, which will allow the user to create their own
keybindings for watch actions without needing to use jline themselves.
Future work might make it more straightforward to go back to a line
buffered input if that is what the user desires.
For projects with a large number of files, zinc has to do a lot of work
to determine which source files and binaries have changes since the last
build. In a very simple project with 5000 source files, it takes roughly
750ms to do a no-op compile using the default incremental compiler
options. After this change, it takes about 200ms. Of those 200ms, 50ms
are due to the update task, which does a partial project resolution*.
The implementation is straightforward since zinc already provides an api
for overriding the built in change detection strategy. In a previous
commit, I updated the sources task to return StampedFile rather than
regular java.io.File instances. To compute all of the source file
stamps, we simply list the sources and if the source is in fact an
instance of StampedFile, we don't need to compute it, otherwise we
generate a StampedFile on the fly. After building a map of stamped files
for both the sources files and all of the binary dependencies, we simply
diff these maps with the previous results in the changedSources,
changedBinaries and removedProducts methods.
The new ExternalHooks are easily disabled by setting
`externalHooks := _ => None`
in the project build.
In the future, I could see moving ExternalHooks into the zinc project so
that other tools like bloop or mill could use them.
* I think this delay could be eliminated by caching the UpdateResult so
long as the project doesn't depend on any snapshot libraries. For a
project with a single source, the no-op compile takes O(50ms) so caching
the project resolution would make compilation start nearly
instantaneous.
I realized that using the cache has the potential to cause issues for
batch processing in CI if some tasks assume that a file created by one
task will immediately be visible in the other. With the cache, there is
typically on O(10ms) latency between a file being created and appearing
in the cache (at least on OSX). When manually running commands, that
latency doesn't matter.
It is not always possible to monitor a directory using OS file system
events. For example, inotify does not work with nfs. To work around
this, I add support for a hybrid FileTreeViewConfig that caches a
portion of the file system and monitors it with os file system
notification, but that polls a subset of the directories. When we query
the view using list or listEntries, we will actually query the file
system for the polling directories while we will read from the cache for
the remainder. When we are not in a continuous build (~ *), there is no
polling of the pollingDirectories but the cache will continue to update
the regular directories in the background. When we are in a continuous
build, we use a PollingWatchService to poll the pollingDirectories and
continue to use the regular repository callbacks for the other
directories.
I suspect that #4179 may be resolved by adding the directories for which
monitoring is not working to the pollingDirectories task.
Now that we have the fileTreeView task, we can generalized the process
of collecting files from the view (which may or may not actually cache
the underlying file tree). I moved the implementation of collectFiles
and addBaseSources into the new FileManagement object because Defaults
is already too large of a file. When we query the view, we also need to
register the directory we're listing because if the underlying view is a
cache, we must call register before any entries will be available.
Because FileTreeDataView doesn't have a register method, I implement
registration with a simple implicit class that pattern matches on the
underlying type and only calls register if it is actually a
FileRepository.
A side effect of this change is that the underlying files returned by
collectFiles and appendBaseSources are StampedFile instances. This is so
that in a subsequent commit, I can add a Zinc external hook that will
read these stamps from the files in the source input array rather than
compute the stamp on the fly. This leads to a substantial reduction in
Zinc startup time for projects with many source files. The file filters
also may be applied more quickly because the isDirectory property (which
we check for all source files) is read from a cached value rather than
requiring a stat.
I had to update a few of the scripted tests to use the `1.2.0`
FileTreeViewConfig because those tests would copy a file and then
immediately re-compile. The latency of cache invalidation is O(1-10ms),
but not instantaneous so it's necessary to either use a non-caching
FileTreeView or add a sleep between updates and compilation. I chose the
former.
Every time that the compile task is run, there are potentially a large
number of iops that must occur in order for sbt to generate the source
file list as well as for zinc to check which files have changed since
the last build. This can lead to a noticeable delay between when a build
is started (either manually or by triggered execution) and when
compilation actually begins. To reduce this latency, I am adding a
global view of the file system that will be stored in
BasicKeys.globalFileTreeView.
To make this work, I introduce the StampedFile trait, which augments the
java.io.File class with a stamp method that returns the zinc stamp for
the file. For source files, this will be a hash of the file, while for
binaries, it is just the last modified time. In order to gain access to
the sbt.internal.inc.Stamper class, I had to append addSbtZinc to the
commandProj configurations.
This view may or may not use an in-memory cache of the file system tree
to return the results. Because there is always the risk of the cache
getting out of sync with the actual file system, I both make it optional
to use a cache and provide a mechanism for flushing the cache. Moreover,
the in-memory cache implementation in sbt.io, which is backed by a
swoval FileTreeRepository, has the property that touching a monitored
directory invalidates the entire directory within the cache, so the
flush command isn't even strictly needed in general.
Because caching is optional, the global is of a FileTreeDataView, which
doesn't specify a caching strategy. Subsequent commits will make use of
this to potentially speed up incremental compilation by caching the
Stamps of the source files so that zinc does not need to compute the
hashes itself and will allow for continuous builds to use the cache to
monitor events instead of creating a new, standalone FileEventMonitor.