When forking, a test could be run as many times as its `TestDef` matched
`Fingerprint`s. When forking is disabled, sbt correctly runs the tests
only once.
Prior to these changes, sbt was leaking large amounts of memory via
log4j appenders. sbt has an unusual use case for log4j because it
creates many ephemeral loggers while also having a global logger that is
supposed to work for the duration of the sbt session. There is a lot of
shared global state in log4j and properly cleaning up the ephemeral task
appenders would break global logging. This commit fixes the behavior by
introducing an alternate logging implementation. Users can still use the
old log4j logging implementation but it will be off by default. The
internal implementation is very simple: it just blocks the current
thread and writes to all of the appenders. Nevertheless, I found the
performance to be roughly identical to that of log4j in my sample
project. As an experiment, I did the appending on a thread pool and got
a significant performance improvement but I'll defer that to a later PR
since parallel io is harder to reason about.
Background: I was testing sbt performance in
https://github.com/jtjeferreira/sbt-multi-module-sample and noticed that
performance rapidly degraded after I ran compile a few times. I took a
heap dump and it became obvious that sbt was leaking console appenders.
Further investigation revealed that all of the leaking appenders in the
project were coming from task streams. This made me think that the fix
would be to track what loggers were created during task evaluation and
clear them out when task evaluation completed. That almost worked except
that log4j has an internal append only data structure containing logger
names. Since we create unique logger names for each run, that internal
data structure grew without bound. It looked like this could be worked
around by creating a new log4j Configuration (where that data structure
was stored) but while creating new configurations with each task runs
did fix the leak, it also broke global logging, which was using a
different configuration. At this point, I decided to write an alternate
implementation of the appender api where I could be sure that the
appenders were cleaned up without breaking global logging.
Implementation: I made ConsoleAppender a trait and made it no longer
extends log4j AbstractAppender. To do this, I had to remove the one
log4j specific method, append(LogEvent). ConsoleAppender now has a
method toLog4J that, in most cases, will return a log4j Appender that is
almost identical to the Appenders that we previously used. To manage
the loggers created during task evaluation, I introduce a new class,
LoggerContext. The LoggerContext determines which logging backend to use
and keeps track of what appenders and loggers have been created. We can
create a fresh LoggerContext before each task evaluation and clear it
out, cleaning up all of its resources after task evaluation concludes.
In order to make this work, there were many places where we need to
either pass in a LoggerContext or create a new one. The main magic is
happening in the `next(State)` method in Main. This is where we create a
new LoggerContext prior to command evaluation and clean it up after the
evaluation completes.
Users can toggle log4j using the new useLog4J key. They also can set the
system property, sbt.log.uselog4j. The global logger will use the sbt
internal implementation unless the system property is set.
There are a fairly significant number of mima issues since I changed the
type of ConsoleAppender. All of the mima changes were in the
sbt.internal package so I think this should be ok.
Effects: the memory leaks are gone. I successfully ran 5000 no-op
compiles in the sbt-multi-module-sample above with no degradation of
performace. There was a noticeable degradation after 30 no-op compiles
before.
During the refactor, I had to work on TestLogger and in doing so I also
fixed https://github.com/sbt/sbt/issues/4480.
This also should fix https://github.com/sbt/sbt/issues/4773
either create test reports with legacy file names (legacyreport=true) or with standard file names (legacyreport=false or omitted) but not both as suggested in #4451
Previously, it was not possible to use `sbt.ForkMain` as a library since
* it called `System.exit`, shutting down the library client JVM process.
* it assumed that it was always running in the same classloader as where
the test suites it was running causing "class not found: my.test.Suite"
if the `ForkMain` class lives in a classloader above the test suite
classes.
This commit adds a new `main(Array[String],ClassLoader])` entrypoint
that allows clients to call `ForkMain` as a library.
To demonstrate [-Yno-lub](http://eed3si9n.com/stricter-scala-with-ynolub), this shows the code changes that removes lubing (Not all subprojects are done).
After I made the changes, I switched the Scala back to normal 2.12.10.
If a test threw a NoClassDefFoundError, it was uncaught and the user got
an obscure error. After this change, they will be warned that the
classloader layering strategy may be at fault and will get instructions
on how to fix it.
It is possible with the new layering strategies that tests may fail if a
java package private class is accessed across classloader layers. This
will result in an IllegalAccessError that is hard to debug. With this
commit, I add an error message that will be displayed if run throws an
IllegalAccessError that suggests that the user try the
ScalaInstance layering strategy or the flat layering strategy.
In code review, @eed3si9n suggested that I switch to a more verbose and
descriptive naming scheme. In addition to trying to make layers more
descriptive, I also made the various layer case objects extend the
relevant layers so it's more clear what the layer should look like.
Using the data structures that I added in the previous commits, it is
now possible to rework the run and test task to use (configurable)
layered class loaders. The layering strategy is globally set to
LayeringStrategy.Default. The default strategy leads to what is
effectively a three layered ClassLoader for the both the test and run
tasks. The first layer contains the scala instance (and test framework
loader in the test task). The second layer contains all of the
dependencies for the configuration while the third layer contains the
project artifacts.
The layering strategy is very easily changed both at the Global or
Configuration level, e.g. adding
Test / layeringStrategy := LayeringStrategy.Flat
to the project build.sbt will make the test task not even use the scala
instance and instead a create a single layer containing the full
classpath of the test task.
I also tried to ensure that all of the ClassLoaders have good toString
overrides so that it's easy to see how the ClassLoader is constructed
with, e.g. `show testLoader`, in the sbt console.
In this commit, the ClassLoaderCache instances are settings. In the next
commit, I make them tasks so that we can easily clear out the caches
with a command.
It is possible for the test task to fail because of issues with the
layered ClassLoaders. When this occurs and is detectable, I try to
provide a useful error message that will help the user fix the issue.
I had to turn off -Xfatal-warnings in commandProj because after updating
io, commandProj depends on the deprecated EventMonitor class. In #4335,
I stop using EventMonitor, but deprecate the Watched class which is both
defined and used (as an unused attribute key) in commandProj. I think we
can probably get rid of Watched in 1.4.x and certainly in a hypothetical
2.x, so hopefully we can restore -Xfatal-warnings sooner than later.
I also had to replace uses of IO.classLocationFile with
IO.classLocationPath to avoid compilation failures due to
-Xfatal-warnings.
I spent a lot of time debugging why it took 5 seconds to run tests each
time. It turns out that if the hostname is not set explicitly on os x,
then getaddrinfo takes 5 seconds to try (and fail) to resolve the dns
entry for the localhostname. This is easily fixed by setting the
hostname, but it is not at all easy to figure out that a slow hostname
lookup is the reason why tests are slow to start.
I don't know if this is a common issue on other platforms, so only issue
the warning on OS X.
ArrayList::add is not thread safe. I ran into cases where async tests
using utests would fail even when all of the individual tests passed.
This was because multiple threads called back into the handle method of
the handler instance variable, which just delegated to eventList::add.
When this happened, one of the events would get added to the list as a
null reference, which would manifest as an NPE upstream on the master
process. After this change, my tests stopped failing.