In 5eab9df0df, I updated the
outputFileStamps task to compute all of the stamps for a directory
recursively if an output file is a directory. Prior to that, it had only
computed the stamp for the directory itself. This caused a significant
performance regression in creating the test classloader because it was
computing the last modified time for all of the classfiles in the class path.
The test for 5000 source files in
https://github.com/eatkins/scala-build-watch-performance was running roughly
400ms slower due to this regression.
Ref https://github.com/sbt/sbt/issues/4905
This is a companion PR to https://github.com/sbt/librarymanagement/pull/318.
This will print the following warnings:
```
sbt:hello> compile
[warn] insecure HTTP request is deprecated 'Artifact(jsoup, jar, jar, None, Vector(), Some(http://jsoup.org/packages/jsoup-1.9.1.jar), Map(), None, false)'; switch to HTTPS or opt-in using from(url(...), allowInsecureProtocol = true) on ModuleID or .withAllowInsecureProtocol(true) on Artifact
[warn] insecure HTTP request is deprecated 'http://repo.typesafe.com/typesafe/releases/'; switch to HTTPS or opt-in as ("Typesafe Releases" at "http://repo.typesafe.com/typesafe/releases/").withAllowInsecureProtocol(true)
[warn] insecure HTTP request is deprecated 'http://repo.typesafe.com/typesafe/releases/'; switch to HTTPS or opt-in as ("Typesafe Releases" at "http://repo.typesafe.com/typesafe/releases/").withAllowInsecureProtocol(true)
[warn] insecure HTTP request is deprecated 'http://repo.typesafe.com/typesafe/releases/'; switch to HTTPS or opt-in as ("Typesafe Releases" at "http://repo.typesafe.com/typesafe/releases/").withAllowInsecureProtocol(true)
[warn] insecure HTTP request is deprecated 'Patterns(ivyPatterns=Vector(), artifactPatterns=Vector(http://repo.typesafe.com/typesafe/releases/[organisation]/[module](_[scalaVersion])(_[sbtVersion])/[revision]/[artifact]-[revision](-[classifier]).[ext]), isMavenCompatible=true, descriptorOptional=false, skipConsistencyCheck=false)'; switch to HTTPS or opt-in as Resolver.url("Typesafe Ivy Releases", url(...)).withAllowInsecureProtocol(true)
[warn] insecure HTTP request is deprecated 'Patterns(ivyPatterns=Vector(), artifactPatterns=Vector(http://repo.typesafe.com/typesafe/releases/[organisation]/[module](_[scalaVersion])(_[sbtVersion])/[revision]/[artifact]-[revision](-[classifier]).[ext]), isMavenCompatible=true, descriptorOptional=false, skipConsistencyCheck=false)'; switch to HTTPS or opt-in as Resolver.url("Typesafe Ivy Releases", url(...)).withAllowInsecureProtocol(true)
[warn] insecure HTTP request is deprecated 'Patterns(ivyPatterns=Vector(), artifactPatterns=Vector(http://repo.typesafe.com/typesafe/releases/[organisation]/[module](_[scalaVersion])(_[sbtVersion])/[revision]/[artifact]-[revision](-[classifier]).[ext]), isMavenCompatible=true, descriptorOptional=false, skipConsistencyCheck=false)'; switch to HTTPS or opt-in as Resolver.url("Typesafe Ivy Releases", url(...)).withAllowInsecureProtocol(true)
```
I inadvertently changed the semantics of clean so that cleanFiles would
only delete the file if it was a regular file. In older versions of sbt,
if a file in cleanFiles was a directory, it would be recursively
deleted.
During refactoring of Continuous, I inadvertently changed the semantics
of `~` so that all multi commands were run regardless of whether or not
an earlier command had failed. I fixed the issue and added a regression
test.
It was reported in https://github.com/sbt/sbt/issues/4973 that the
scalaVersion setting was not being correctly set in a script running
with ScriptMain using 1.3.0-RC4. Using git bisect, I found that the
issue was introduced in
73cfd7c8bd.
That commit manipulates the classloaders passed in by the launcher, but
only for the xMain entry point. I found that the script ran correctly if
I updated the classloader for ScriptedMain as well.
After these changes, the example script in #4973 correctly prints 2.13.0
for the scala version with a locally published sbt.
Bonus: rename xMainImpl object xMain. It was private[sbt] anyway.
https://github.com/sbt/sbt/issues/4986 reported that +compile would
always recompile everything in the project even when the sources hadn't
changed. This was because the dependency classpath was changing between
calls to compile, which caused the external hooks cache introduced in
32a6d0d5d7 to invalidate the scala
library. To fix this, I cache the file stamps on a per scala version
basis. I added a scripted test that checks that there is no
recompilation in two consecutive calls to `+compile` in a multi scala
version build. It failed prior to these changes.
I looked for serial bottlenecks in sbt project loading and discovered
that KeyIndex.aggregate was relatively easily parallelizable. Before
this time, it took about 1 second to run KeyIndex.aggregate in the akka
project on my computer. After this change, it took 250ms. Given that I
have 4 logical cores, the speedup is roughly linear.
It turns out that injecting the keys necessary for incremental tasks
causes a significant startup penalty for many larger projects. For
example, akka starts up about 3 seconds faster if do not inject these
settings for the tasks returning `File` or `Seq[File]`. Given that all
of these apis use java.nio.file anyway, it makes sense to not backport
them to older tasks.
The clean task got a lot slower in 1.3.0
(https://github.com/sbt/sbt/issues/4972). The reason for this was that
sbt 1.3.0 generated many custom clean tasks for any tasks that returned
`File` or `Seq[File]`. Each of these tasks was tagged with
Tags.Clean which meant that only one of them could run at a time. As a
result, it took a long time to evaluate all of the custom tasks, even if
they were no-ops. In the akka project, a no-op clean was taking 35
seconds which is simply unacceptable. After this change, a no-op clean
takes less than a second in akka (a full clean only takes about 6
seconds after running test:compile)
To fix this, I stopped aggregating the clean task across configs and
projects. Because I removed the aggregation, I needed to manually
implement clean in the `Compile` and `Test` configurations to make
`Compile / clean` and `Test / compile` clean work correctly.
Fixes#4964
Together with https://github.com/sbt/util/pull/211, this brings back stack trace supression for custom tasks by default.
Debug levels logs are available in `last`, and this prints a message informing the user of the fact. BLUE on dark background is difficult to read, so I am chaning the color hilight to MAGENTA.
After adding the automatic lookup to external hooks for missing binary
jars, the scripted test dependency-management/invalidate-internal
started failing. This was because the previous analysis contained a jar
dependency that still existed on disk but was no longer a part of the
dependency classpath. Fundamentally the problem is that the zinc
compile analysis is not tightly coupled with the sbt build state.
To fix this, we can cache the dependency classpath file stamps in the
same way that we cache the input file stamps in external hooks and
manually diff them at the sbt level. We then force updates regardless of
the difference between the zinc state and the sbt state.
It was reported that in community builds, sometimes there was
spurious over-compilation due to invalidation of the scala library jar
(https://github.com/sbt/sbt/issues/4948). The reason for this was that
the external hooks prefills the managed cache with all of the time
stamps for the project dependencies but was not looking up any jars that
weren't in the cache. I suspect I did this because I didn't realize that
zinc also includes its own classpath in the binaries which is not
a part of the dependencyClasspath. The fix is to just add the jar to the
cache if it doesn't already exist by switching to getOrElseUpdate from
get.
I followed the steps in #4948 and published a version of sbt locally
with this change and the spurious re-builds stopped.
In the code formatting use case, the formatting task may modify the
source files in place. If the formatting task uses the nio
inputFileStamps, then it would fill the in-memory cache of source paths
to file stamps. This would cause compile to see the pre-formatted
stamps. To fix this, we can invalidate the file cache entries for the
outputs of a task. This will cause the side-effect of some extra io
because the hashes may be computed three times: once for the format
inputs, once for the format outputs and once for the compile inputs. I
think most users would understand that adding auto-formatting would
potentially slowdown compilation.
To really prove this out, I implemented a poor man's scalafmt plugin in
a scripted test. It is fully incremental. Even in the case when some
files cannot be formatted it will update all of the files that can be
formatted and not re-format them until they change.
It makes sense to add a scope for the `fileTreeView` key where it is
available. At the moment, there is only one `fileTreeView`
implementation but, if that changes down the road, these tasks will
automatically inherit the correct view.
It makes sense for the new glob/nio based apis that we provide first
class support for filtering the results. Because it isn't possible to
scope a task within a task within a task, i.e.
`compile / fileInputs / includePathFilter`, I had to add four new
filter settings of type `PathFilter`:
fileInputIncludeFilter :== AllPassFilter.toNio,
fileInputExcludeFilter :== DirectoryFilter.toNio || HiddenFileFilter,
fileOutputIncludeFilter :== AllPassFilter.toNio,
fileOutputExcludeFilter :== NothingFilter.toNio,
Before I was effectively hard-coding the filter: RegularFileFilter &&
!HiddenFileFilter in the inputFileStamps and allInputFiles tasks. These
remain the defaults, as seen in the fileInputExcludeFilter definition
above, but can be overridden by the user.
It makes sense to exclude directories and hidden files for the input
files, but it doesn't necessarily make sense to apply any output filters
by default. For symmetry, it makes sense to have them, but they are
unlikely to be used often.
Apart from adding and defining the default values for these keys, the
only other changes I had to make was to remove the hard-coded filters
from the allInputFiles and inputFileStamps tasks and also add the
filtering to the allOutputFiles task. Because we don't automatically
calculate the FileAttributes for the output files, I added logic for
bypassing the path filter application if the PathFilter is effectively
AllPass, which is the case for the default values because:
AllPassFilter.toNio == AllPass
NothingFilter.toNio == NoPass
AllPass && !NoPass == AllPass && AllPass == AllPass
Prior to this commit, change tracking in sbt 1.3.0 was done via the
changed(Input|Output)Files tasks which were tasks returning
Option[ChangedFiles]. The ChangedFiles case class was defined in io as
case class ChangedFiles(created: Seq[Path], deleted: Seq[Path], updated: Seq[Path])
When no changes were found, or if there were no previous stamps, the
changed(Input|Output)Files tasks returned None. This made it impossible
to tell whether nothing had changed or if it was the first time.
Moreover, the api was awkward as it required pattern matching or folding
the result into a default value.
To address these limitations, I introduce the FileChanges class. It can
be generated regardless of whether or not previous file stamps were
available. The changes contains all of the created, deleted, modified
and unmodified files so that the user can directly call these methods
without having to pattern match.
It is tedious to write (foo / allInputFiles).value so I added simple
extension method macros that expand `foo.inputFiles` to
(foo / allInputFiles).value and `foo.outputFiles` to
`(foo / allOutputFiles).value`.
While writing documentation for the new file management/incremental
task evaluation features, I realized that incremental task evaluation
did not have the correct semantics. The problem was that calls to
`.previous` are not scoped within the current task. By this, I mean that
say there are tasks foo and bar and that the defintion of bar looks like
bar := {
val current = foo.value
foo.previous match {
case Some(v) if v == current => // value hasn't changed
case _ => process(current)
}
}
The problem is that foo.previous is stored in
effectively (foo / streams).value.cacheDirectory / "previous". This
means that it is completely decoupled from foo. Now, suppose that the
user runs something like:
> set foo := 1
> bar // processes the value 1
> set foo := 2
> foo
> bar // does not process the new value 2 because foo was called, which updates the previous value
This is not an unrealistic scenario and is, in fact, common if the
incremental task evaluation is changed across multiple processing steps.
For example, in the make-clone scripted test, the linkLib task processes
the outputs of the compileLib task. If compileLib is invoked separately
from linkLib, then when we next call linkLib, it might not do anything
even if there was recompilation of objects because the objects hadn't
changed since the last time we called compileLib.
To fix this, I generalizedthe previous cache so that it can be keyed on
two tasks, one is the task whose value is being stored (foo in the
example above) and the other is the task in which the stored task value
is retrieved (bar in the example above). When the two tasks are the
same, the behavior is the same as before.
Currently the previous value for foo might be stored somewhere like:
base_directory/target/streams/_global/_global/foo/previous
Now, if foo is stored with respect to bar, it might be stored in
base_directory/target/streams/_global/_global/bar/previous-dependencies/_global/_gloal/foo/previous
By storing the files this way, it is easy to remove all of the previous
values for the dependencies of a task.
In addition to changing how the files are stored on disk, we have to store
the references in memory differently. A given task can now have multiple
previous references (if, say, two tasks bar and baz both depend on the
previous value). When we complete the results, we first have to collect
all of the successful tasks. Then for each successful task, we find all
of its references. For each of the references, we only complete the
value if the scope in which the task value is used is successful.
In the actual implemenation in Previous.scala, there are a number places
where we have to cast to ScopedKey[Task[Any]]. This is due to
limitations of ScopedKey and Task being type invariant. These casts are
all safe because we never try to get the value of anything, we only use
the portion of the apis of these types that are independent of the value
type. Structural typing where ScopedKey[Task[_]] gets inferred to
ScopedKey[Task[x]] forSome x is a big part of why we have problems with
type inference.
Using List instead of vector makes the code a bit more readable. We
don't need indexed access into the data structure so its unlikely that
Vector was providing any performance benefit.
As part of a documentation push, I noticed that these were undocumented
and that there were some public apis in FileStamp that I intended to be
private[sbt].
I realized it was probably not ideal to have these implicit JsonFormats
defined directly in the FileStamp object because they might
inadvertently be brought into scope with a wildcard import.
I noticed that sometimes if I changed a build source and then ran reload
in the shell, I'd still see a warning about build sources having
changed. We can eliminate this behavior by resetting the
hasCheckedMetaBuild state attribute to false and skipping the
checkBuildSources step if the current command is 'reload'. We also now
skip checking the build source step if the command is exit or reboot.
Zinc records all of the compile source file hashes when compilation
completes. This is problematic because its possible that a source file
was changed during compilation. From the user perspective, this may mean
that their source change will not be recompiled even if a build is
triggered by the change.
To overcome this, I add logic in the sbt provided external hooks to
override the zinc analysis stamps. This is done by writing the source
file stamps to the previous cache after compilation completes. This
allows us to see the source differences from sbt's perspective, rather
than zinc's perspective. We then merge the combined differences in the
actual implementation of ExternalHooks. In some cases this may result in
over-compilation but generally over-compilation is preferred to under
compilation. Most of the time, the results should be the same.
The scripted test that I added modifies a file during compilation by
invoking a macro. It then effectively asserts that the file is
recompiled during the next test run by validating the compilation result
in the test. The test fails on the latest develop hash.
Changed resources were causing the dependency layer to be invalidated on
resource changes in turbo mode because the resource layer was in between
the scala library layer. This commit reworks the layers for the
AllDependencyJars strategy so that the top layer is able to load _all_
of the resources during a test run.
The resource layer was added to address the problem that dependencies
may need to be able to load resources from the project classpath but
wouldn't be able to do so if the dependencies were in a separate layer
from the rest of the classpath. The resource layer was a classloader
that could load any resource on the full classpath but couldn't load any
classes. When I added the resource layer, I was thinking that when
resources changed, the resource class loader needed to be invalidated.
Resources, however, are different from classes in that the same
ClassLoader can find the same resources in a different place because
getResource and getResourceAsStream just return locations but do not
actually do any loading.
Taking advantage of this, I add a proxy classloader for finding
resource locations to ReverseLookupClassLoader. We can reset the
classpath of the resource loader in
ReverseLookupClassLoaderHolder.checkout. This allows us to see the new
versions of the resources without invalidating the dependency layer.
The use of '$' in the path names for streams is a pain because, since
'$' is a special character in the shell, it makes it impossible to
directly copy and paste the paths. If we make this change, some builds
will be left with vestigial directories with $global and $build in them
until they run clean. It also would break any scripts that manually
delete these paths. That doesn't seem like a common use case, but it's
worth mentioning.
The use of the persistent file stamp cache between watch runs didn't
seem to cause any issues, but there was some chance for inconsistency
between the file stamp cache and the file system so it makes sense to
put it behind the turbo flag.
After changing the default, the watch/on-change scripted test started
failing. It turns out that the reason is that the file stamp cache
managed by the watch process was not pre-filled by task evaluation. For
this reason, the first time a source file was modified, it was treated
as a creation regardless of whether or not it actually was.
To fix this, I add logic to pre-fill the watch file stamp cache if we
are _not_ persisting the file stamps between runs.
I ran a before and after with the scala build performance benchmark tool
and setting the watchPersistFileStamps key to true reduced the median
run time by about 200ms in the non-turbo case.