sbt/tasks-standard at f5c8b8aad50bab5edab114ff92303dcee223bf10 - sbt

History

Ethan Atkins cad89d17a9 Add support for in memory cache store It can be quite slow to read and parse a large json file. Often, we are reading and writing the same file over and over even though it isn't actually changing. This is particularly noticeable with the UpdateReport. To speed this up, I introduce a global cache that can be used to read values from a CacheStore. When using the cache, I've seen the time for the update task drop from about 200ms to about 1ms. This ends up being a 400ms time savings for test because update is called for both Compile / compile and Test / compile. The way that this works is that I add a new abstraction CacheStoreFactoryFactory, which is the most enterprise java thing I've ever written. We store a CacheStoreFactoryFactory in the sbt State. When we make Streams for the task, we make the Stream's cacheStoreFactory field using the CacheStoreFactoryFactory. The generated CacheStoreFactory may or may not refer to a global cache. The CacheStoreFactoryFactory may produce CacheStoreFactory instances that delegate to a Caffeine cache with a max size parameter that is specified in bytes by the fileCacheSize setting (which can also be set with -Dsbt.file.cache.size). The size of the cache entry is estimated by the size of the contents on disk. Since we are generally storing things in the cache that are serialized as json, I figure that this should be a reasonable estimate. I set the default max cache size to 128MB, which is plenty of space for the previous cache entries for most projects. If the size is set to 0, the CacheStoreFactoryFactory generates a regular DirectoryStoreFactory. To ensure that the cache accurately reflects the disk state of the previous cache (or other cache's using a CacheStore), the Caffeine cache stores the last modified time of the file whose contents it should represent. If there is a discrepancy in the last modified times (which would happen if, say, clean has been run), then the value is read from disk even if the value hasn't changed. With the following build.sbt file, it takes roughly 200ms to read and parse the update report on my compute: libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.3" libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.1" This is because spark-sql has an enormous number of dependencies and the update report ends up being 3MB.	2019-07-11 17:45:16 -07:00
..
src	Add support for in memory cache store	2019-07-11 17:45:16 -07:00
NOTICE	Add, configure & enforce file headers	2017-10-05 09:03:40 +01:00

Ethan Atkins cad89d17a9 Add support for in memory cache store

It can be quite slow to read and parse a large json file. Often, we are
reading and writing the same file over and over even though it isn't
actually changing. This is particularly noticeable with the
UpdateReport*. To speed this up, I introduce a global cache that can be
used to read values from a CacheStore. When using the cache, I've seen
the time for the update task drop from about 200ms to about 1ms. This
ends up being a 400ms time savings for test because update is called for
both Compile / compile and Test / compile.

The way that this works is that I add a new abstraction
CacheStoreFactoryFactory, which is the most enterprise java thing I've
ever written. We store a CacheStoreFactoryFactory in the sbt State.
When we make Streams for the task, we make the Stream's
cacheStoreFactory field using the CacheStoreFactoryFactory. The
generated CacheStoreFactory may or may not refer to a global cache.

The CacheStoreFactoryFactory may produce CacheStoreFactory instances
that delegate to a Caffeine cache with a max size parameter that is
specified in bytes by the fileCacheSize setting (which can also be set
with -Dsbt.file.cache.size). The size of the cache entry is estimated by
the size of the contents on disk. Since we are generally storing things
in the cache that are serialized as json, I figure that this should be a
reasonable estimate. I set the default max cache size to 128MB, which is
plenty of space for the previous cache entries for most projects. If the
size is set to 0, the CacheStoreFactoryFactory generates a regular
DirectoryStoreFactory.

To ensure that the cache accurately reflects the disk state of the
previous cache (or other cache's using a CacheStore), the Caffeine cache
stores the last modified time of the file whose contents it should
represent. If there is a discrepancy in the last modified times (which
would happen if, say, clean has been run), then the value is read from
disk even if the value hasn't changed.

* With the following build.sbt file, it takes roughly 200ms to read and
parse the update report on my compute:

libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.3"
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.1"

This is because spark-sql has an enormous number of dependencies and the
update report ends up being 3MB.

2019-07-11 17:45:16 -07:00

src

Add support for in memory cache store

2019-07-11 17:45:16 -07:00

NOTICE

Add, configure & enforce file headers

2017-10-05 09:03:40 +01:00