mirror of https://github.com/sbt/sbt.git
Prior to these changes, sbt was leaking large amounts of memory via log4j appenders. sbt has an unusual use case for log4j because it creates many ephemeral loggers while also having a global logger that is supposed to work for the duration of the sbt session. There is a lot of shared global state in log4j and properly cleaning up the ephemeral task appenders would break global logging. This commit fixes the behavior by introducing an alternate logging implementation. Users can still use the old log4j logging implementation but it will be off by default. The internal implementation is very simple: it just blocks the current thread and writes to all of the appenders. Nevertheless, I found the performance to be roughly identical to that of log4j in my sample project. As an experiment, I did the appending on a thread pool and got a significant performance improvement but I'll defer that to a later PR since parallel io is harder to reason about. Background: I was testing sbt performance in https://github.com/jtjeferreira/sbt-multi-module-sample and noticed that performance rapidly degraded after I ran compile a few times. I took a heap dump and it became obvious that sbt was leaking console appenders. Further investigation revealed that all of the leaking appenders in the project were coming from task streams. This made me think that the fix would be to track what loggers were created during task evaluation and clear them out when task evaluation completed. That almost worked except that log4j has an internal append only data structure containing logger names. Since we create unique logger names for each run, that internal data structure grew without bound. It looked like this could be worked around by creating a new log4j Configuration (where that data structure was stored) but while creating new configurations with each task runs did fix the leak, it also broke global logging, which was using a different configuration. At this point, I decided to write an alternate implementation of the appender api where I could be sure that the appenders were cleaned up without breaking global logging. Implementation: I made ConsoleAppender a trait and made it no longer extends log4j AbstractAppender. To do this, I had to remove the one log4j specific method, append(LogEvent). ConsoleAppender now has a method toLog4J that, in most cases, will return a log4j Appender that is almost identical to the Appenders that we previously used. To manage the loggers created during task evaluation, I introduce a new class, LoggerContext. The LoggerContext determines which logging backend to use and keeps track of what appenders and loggers have been created. We can create a fresh LoggerContext before each task evaluation and clear it out, cleaning up all of its resources after task evaluation concludes. In order to make this work, there were many places where we need to either pass in a LoggerContext or create a new one. The main magic is happening in the `next(State)` method in Main. This is where we create a new LoggerContext prior to command evaluation and clean it up after the evaluation completes. Users can toggle log4j using the new useLog4J key. They also can set the system property, sbt.log.uselog4j. The global logger will use the sbt internal implementation unless the system property is set. There are a fairly significant number of mima issues since I changed the type of ConsoleAppender. All of the mima changes were in the sbt.internal package so I think this should be ok. Effects: the memory leaks are gone. I successfully ran 5000 no-op compiles in the sbt-multi-module-sample above with no degradation of performace. There was a noticeable degradation after 30 no-op compiles before. During the refactor, I had to work on TestLogger and in doing so I also fixed https://github.com/sbt/sbt/issues/4480. This also should fix https://github.com/sbt/sbt/issues/4773 |
||
|---|---|---|
| .. | ||
| agent/src/main/java/sbt | ||
| src/main | ||
| NOTICE | ||