Changed resources were causing the dependency layer to be invalidated on
resource changes in turbo mode because the resource layer was in between
the scala library layer. This commit reworks the layers for the
AllDependencyJars strategy so that the top layer is able to load _all_
of the resources during a test run.
The resource layer was added to address the problem that dependencies
may need to be able to load resources from the project classpath but
wouldn't be able to do so if the dependencies were in a separate layer
from the rest of the classpath. The resource layer was a classloader
that could load any resource on the full classpath but couldn't load any
classes. When I added the resource layer, I was thinking that when
resources changed, the resource class loader needed to be invalidated.
Resources, however, are different from classes in that the same
ClassLoader can find the same resources in a different place because
getResource and getResourceAsStream just return locations but do not
actually do any loading.
Taking advantage of this, I add a proxy classloader for finding
resource locations to ReverseLookupClassLoader. We can reset the
classpath of the resource loader in
ReverseLookupClassLoaderHolder.checkout. This allows us to see the new
versions of the resources without invalidating the dependency layer.
The use of '$' in the path names for streams is a pain because, since
'$' is a special character in the shell, it makes it impossible to
directly copy and paste the paths. If we make this change, some builds
will be left with vestigial directories with $global and $build in them
until they run clean. It also would break any scripts that manually
delete these paths. That doesn't seem like a common use case, but it's
worth mentioning.
The use of the persistent file stamp cache between watch runs didn't
seem to cause any issues, but there was some chance for inconsistency
between the file stamp cache and the file system so it makes sense to
put it behind the turbo flag.
After changing the default, the watch/on-change scripted test started
failing. It turns out that the reason is that the file stamp cache
managed by the watch process was not pre-filled by task evaluation. For
this reason, the first time a source file was modified, it was treated
as a creation regardless of whether or not it actually was.
To fix this, I add logic to pre-fill the watch file stamp cache if we
are _not_ persisting the file stamps between runs.
I ran a before and after with the scala build performance benchmark tool
and setting the watchPersistFileStamps key to true reduced the median
run time by about 200ms in the non-turbo case.
There was a bug where sometimes a source file change would not trigger a
new build if the change occurred during a build. Based on the logs, it
seemed to be because a number of redundant events were generated for the
same path and they triggered the anti-entropy constraint of the file
event monitor.
To fix this, I consolidated a number of observers of the global file
tree repository into a single observer. This way, I am able to ensure
that only one event is generated per file event.
I also reworked the onEvent callback to only stamp the file once. It was
previously stamping the modified source file for all of the aggregated
tasks. In the sbt project running `~compile` meant that we were stamping
a source file 22 times whenever the file changed.
This actually uncovered a zinc issue though as well. Zinc computes and
writes the hash of the source file to the analysis file after
compilation has completed. If a source file is modified during
compilation, then the new hash is written to the analysis file even when
the compilation may have been made against the previous version of the
file. Zinc will then refuse to re-compile that file until another change
is made.
I manually verified that in the sbt project if I ran `~compile` before
this change and modified a file during compilation, then no event was
triggered (there was a log message about the event being dropped due to
the anti-entropy constraint though). After this change, if I changed a
file during compilation, it seemed to always trigger, but due to the
zinc bug, it didn't always re-compile.
This was causing an abstract type pattern T is unchecked since it is
eliminated by erasure. It was unneeded because store.get[T] return
Option[(T, Long)]. I'm surprised that the compiler complained about
this.
@olegych reported that sbt would silently swallow the 'compile' command
in the multi-command, 'run;compile;reload'. I tracked this down to the
build source check. When the build has
Global / onChangedBuildSource := ReloadOnSourceChanges, the check build
sources command return a new state with "reload" prefixed. To actually
perform the reload, I returned this modified state with the prefixed
reload command.
There were two problems with this:
1) In the auto-reload case, the current command was not run after the
reload
2) If the multi-command contained reload, the auto-reload check would
have a false positive which triggered the bug in (1)
To fix this, I clear out the remaining commands before I run the check
command. That way, we know that if the remaining commands has a reload,
then it is an auto-reload. We then prefix the state with both the reload
and the current command.
I updated the scripted test for auto-reload to handle multi commands
containing reload.
In this commit, I both restore some sbt 1.2.8 behavior and enhance the
api for setting keyboard shortcuts in watch. I change the default start
message to just show the watch count, the tasks that are being monitored
and, on a new line, the instructions to terminate the watch or show more
options.
Here's what it looks like:
[info] 1. Monitoring source files for spark/compile...
[info] Press <enter> to interrupt or '?' for more options.
?
[info] Options:
[info] <enter> : interrupt (exits sbt in batch mode)
[info] <ctrl-d> : interrupt (exits sbt in batch mode)
[info] 'r' : re-run the command
[info] 's' : return to shell
[info] 'q' : quit sbt
[info] '?' : print options
I also made it so that the new options can be added (and old options
removed) with the watchInputOptions key. For example, to add an option
to reload the build with the key 'l', you could add
ThisBuild / watchInputOptions += Watch.InputOption('l', "reload", Watch.Reload)
to your global build.sbt.
After adding that to my global ~/sbt/1.0/global.sbt file, the output of
'?' became:
[info] Options:
[info] <ctrl-d> : interrupt (exits sbt in batch mode)
[info] <enter> : interrupt (exits sbt in batch mode)
[info] '?' : print options
[info] 'l' : reload
[info] 'q' : quit sbt
[info] 'r' : re-run the command
[info] 's' : return to shell
At the suggestion of @eed3si9n, instead of specifying the file cache
size in bytes, we now specify it in a formatted string. For example,
instead of specifying 128 megabytes in bytes (134217728), we can specify
it with the string "128M".
It can be quite slow to read and parse a large json file. Often, we are
reading and writing the same file over and over even though it isn't
actually changing. This is particularly noticeable with the
UpdateReport*. To speed this up, I introduce a global cache that can be
used to read values from a CacheStore. When using the cache, I've seen
the time for the update task drop from about 200ms to about 1ms. This
ends up being a 400ms time savings for test because update is called for
both Compile / compile and Test / compile.
The way that this works is that I add a new abstraction
CacheStoreFactoryFactory, which is the most enterprise java thing I've
ever written. We store a CacheStoreFactoryFactory in the sbt State.
When we make Streams for the task, we make the Stream's
cacheStoreFactory field using the CacheStoreFactoryFactory. The
generated CacheStoreFactory may or may not refer to a global cache.
The CacheStoreFactoryFactory may produce CacheStoreFactory instances
that delegate to a Caffeine cache with a max size parameter that is
specified in bytes by the fileCacheSize setting (which can also be set
with -Dsbt.file.cache.size). The size of the cache entry is estimated by
the size of the contents on disk. Since we are generally storing things
in the cache that are serialized as json, I figure that this should be a
reasonable estimate. I set the default max cache size to 128MB, which is
plenty of space for the previous cache entries for most projects. If the
size is set to 0, the CacheStoreFactoryFactory generates a regular
DirectoryStoreFactory.
To ensure that the cache accurately reflects the disk state of the
previous cache (or other cache's using a CacheStore), the Caffeine cache
stores the last modified time of the file whose contents it should
represent. If there is a discrepancy in the last modified times (which
would happen if, say, clean has been run), then the value is read from
disk even if the value hasn't changed.
* With the following build.sbt file, it takes roughly 200ms to read and
parse the update report on my compute:
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.3"
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.1"
This is because spark-sql has an enormous number of dependencies and the
update report ends up being 3MB.
Fixes#4802
For Ivy integration sbt uses credential task in a peculiar way. 9fa25de84c/main/src/main/scala/sbt/Defaults.scala (L2271-L2275)
This lets the build user put `credential` task in various places, like metabuild or root project, but they would all act as if they were scoped globally. This PR adds `allCredentials` task to emulate the behavior to pass credentials into lm-coursier.
This Resolver had the same name as the typesafe ivy resolver specified
in the launcher boot.properties. It was creating a number of verbose
warnings about having multiple resolvers with the same name. I noticed
that the ivy pattern is slightly different for the boot resolver with
this name. It didn't seem to be causing any problems to have both
resolvers.
Fixes#4839
I noticed that for a simple spark project that evaluating the test task
was faster than running run when both tasks evaluated the same code
block. I tracked this down to the BackgroundJobService.copyClasspath
method. This method was hashing the jar contents of all of the files in
the build. On my computer, this took 600ms (for context, the total run
time of the `run` task was around 1.2 seconds, which included about
150ms of scala compiling and 350ms of time in the main method). If
instead we use the last modified time it drops down to 5-10ms. As
predicted, the total runtime of `run` in this project dropped down to
600ms which was on par with `test`.
I am not sure why a hash was used rather than last modified in the first place,
so I reworked things in such a way that, by default, sbt will use a hash
but if turbo mode is on, it will use the last modified instead. We can
revisit the default later.
The multi parser had very poor performance if there were many commands.
Evaluating the expansion of something like "compile;" * 30 could cause
sbt to hang indefinitely. I believe this was due to excessive
backtracking due to the optional `(parser <~ semi.?).?` part of the
parser in the non-leading semicolon case.
I also reworked the implementation so that the multi command now has a
name. This allows us to partition the commands into multi and non-multi
commands more easily in State while still having multi in the command
list. With this change, builds and plugins can exclude the multi parser
if they wish.
Using the partitioned parsers, I removed the high/priority low priority
distinction. Instead, I made it so that the multi command will actually
check if the first command is a named command, like '~'. If it is, it
will pass the raw command argument with the named command stripped out
into the parser for the named command. If that is parseable, then we
directly apply the effect. Otherwise we prefix each multi command to the
state.
The `session save` command has the side effect of modifying a "*.sbt"
file so we don't want to warn about changes or automatically reload when
we return to the shell. Setting the hasCheckedMetaBuild attribute key to
false is sufficient to prevent this.
Ref: https://github.com/sbt/sbt/issues/4813
It didn't really make sense for Continuous to use the other command
parser and then reparse the results. I was able to slightly simplify
things by using the multi parser directly.
It was reported in https://github.com/sbt/sbt/issues/4808 that compared
to 1.2.8, sbt 1.3.0-RC2 will truncate the command args of an input task
that contains semicolons. This is actually intentional, but not
completely robust. For sbt >= 1.3.0, we are making ';' syntactically
meaningful. This means that it always represents a command separator
_unless_ it is inside of a quoted string. To enforce this, the multi parser
will effectively split the input on ';', it will then validate that each
command that it extracted is valid. If not, it throws an exception. If
the input is not a multi command, then parsing fails with a normal
failure.
I removed the multi command from the state's defined commands and reworked
State.combinedParser to explicitly first try multi parsing and fall back
to the regular combined parser if it is a regular command. If the multi
parser throws an uncaught exception, parsing fails even if the regular
parser could have successfully parsed the command. The reason is so that
we do not ever allow the user to evaluate, say 'run a;b'. Otherwise the
behavior would be inconsitent when the user runs 'compile; run a;b'
It's very expensive to compute the hash code of a deeply nested
Incomplete. To prevent a loop, we only want to check for object equality
which we can do with IdentityHashMap
It didn't make sense to aggregate the watch start command if it was
defined in multiple sources so we previously just fell back to the
default message if multiple commands were being run. This, however,
meant that if you ran, say, ~compile in an aggregate project, it wasn't
possible to customize the start message. There was a message in the
sbt gitter channel where someone found the new message too verbose and
wanted to print something shorter and I realized that this was an
unfortunate restriction. Instead of giving up, we can just use the
project's watchStartMessage as a default. If the watchStartMessage
setting is unset for some reason, we can fall back to the default.
I validated this change manually in the swoval project, which has an
aggregate root project, by running
set ThisBuild / watchStartMessage := { (_, _, _) => None }
and indeed nothing was printed after each task evaluation in '~compile'.
We discovered that turbo mode did not work in the sbt settings project.
I tracked this down to the dependency classloader bundle not being
thread safe.
I realized that some builds may crash if we automatically close the
classloaders. While I do think that is a good thing in general that we
are closing the loaders by default, we shuold have an option for
supressing this behavior.
I made all of the custom classloaders that we define for test and run
check this property before calling the super.close method.
We want to notify users about the new features available in sbt 1.3.0 to
increase visibility. Turbo mode especially can benefit many builds, but
we have opted to leave it off by default for now.
The banner will be displayed the first time sbt enters the shell command
on each sbt run. The banner can be disabled globally with the sbt.banner
system property. It can be displayed on a per sbt version by running the
skipWelcomeBanner command. That command touches a file in the ~/.sbt/1.0
directory to make it persistent across all projects.
I decided creations/deletions/updates were a bit too technical rather
than descriptive. It also wasn't really correct to say Meta build
sources because the meta build is the build for the build. Instead, I
dropped Meta from the sentence. I also made the instructions when
changed sources are detected more active. I left them capitalized since
they are instructions rather than warnings.
Apply these changes by running `reload`.
Automatically reload the build when source changes are detected by setting `Global / onChangedBuildSource := ReloadOnSourceChanges`.
Disable this warning by setting `Global / onChangedBuildSource := IgnoreSourceChanges`.
Also indentation was wrong for the printed files when multiple files had
changed because the mkString middle argument was " \n" rather than "\n ".
This creates a performance mode that enables experimental or advanced features that might require some debugging by the build user when it doesn't work.
Initially we are putting the layered ClassLoader (`ClassLoaderLayeringStrategy.AllLibraryJars`) behind this flag.
My first attempt, cc8c66c66d, at making
java reflection work with a layered classloader with a dependency jar
layer was a failure. It would generally work ok the first time the user
ran test, but was likely to fail on a second run.
There were a number of problems with the strategy:
1) It relied on the thread's context class loader to determine where to
attempt the reverse lookup.
2) It is not possible to ever reload classes loaded by a classloader.
Consider the classloading hierarchy a <- b, where
the arrow implies that a is the parent of b. I incorrectly thought
that a's loadClass method would be called every time a class loaded
by a made a call to Class.forName(name: String). This turns out to
not be the case. As a result, the second time the dependency layer
was used, where now the hierarchy is a <- c, that same Class.forName
call could return a Class from b which causes a nasty crash.
It isn't possible to work around the limitation in 2 so the only option
that allows both caching and java reflection across layers to work is to
cache the dependency layer, but invalidate when cross layer reflection
occurs. This turns out to be straightforward to implement. The
performance looks very similar to the ScalaLibrary strategy when java
reflection is used, which makes sense because the scala library and
scala reflect layers are still reused when the dependency layer is
invalidated.
I also stopped passing around the resource map to all of the layers.
Resource loading is hierarchical and the resource layer comes above the
dependency layer in the stack so there was no need for the bottom layers
to also be RawResource loaders.
While testing some classloader changes, I realized that we didn't always
close the test classloaders because they didn't necessarily extend
URLClassLoader, but instead implemented AutoCloseable.
Bonus: don't set the context classloader. It turns out that the test
framework does that anyway inside of trl.run so it was pointless to do
in Defaults.scala.