mirror of https://github.com/sbt/sbt.git
It can be quite slow to read and parse a large json file. Often, we are reading and writing the same file over and over even though it isn't actually changing. This is particularly noticeable with the UpdateReport*. To speed this up, I introduce a global cache that can be used to read values from a CacheStore. When using the cache, I've seen the time for the update task drop from about 200ms to about 1ms. This ends up being a 400ms time savings for test because update is called for both Compile / compile and Test / compile. The way that this works is that I add a new abstraction CacheStoreFactoryFactory, which is the most enterprise java thing I've ever written. We store a CacheStoreFactoryFactory in the sbt State. When we make Streams for the task, we make the Stream's cacheStoreFactory field using the CacheStoreFactoryFactory. The generated CacheStoreFactory may or may not refer to a global cache. The CacheStoreFactoryFactory may produce CacheStoreFactory instances that delegate to a Caffeine cache with a max size parameter that is specified in bytes by the fileCacheSize setting (which can also be set with -Dsbt.file.cache.size). The size of the cache entry is estimated by the size of the contents on disk. Since we are generally storing things in the cache that are serialized as json, I figure that this should be a reasonable estimate. I set the default max cache size to 128MB, which is plenty of space for the previous cache entries for most projects. If the size is set to 0, the CacheStoreFactoryFactory generates a regular DirectoryStoreFactory. To ensure that the cache accurately reflects the disk state of the previous cache (or other cache's using a CacheStore), the Caffeine cache stores the last modified time of the file whose contents it should represent. If there is a discrepancy in the last modified times (which would happen if, say, clean has been run), then the value is read from disk even if the value hasn't changed. * With the following build.sbt file, it takes roughly 200ms to read and parse the update report on my compute: libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.3" libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.1" This is because spark-sql has an enormous number of dependencies and the update report ends up being 3MB. |
||
|---|---|---|
| .. | ||
| src | ||
| NOTICE | ||