Store API – Processing in memory
DITA-OT originally assumed resources would be available on disk and available from file paths. Recent versions added URI input, so HTTPS resources could be used, but temporary and output resources were still file-based. DITA-OT 3.6 introduces a new Store API that can process temporary resources in memory instead of writing them to disk.
The Store API (org.dita.dost.store.Store
) is a Java abstraction over temporary file operations.
So for example instead of reading resources directly with FileInputStream
, the Store API provides
operations for this. This abstraction allows implementations of the Store API to choose how they handle resources,
enables optimizations or non–file-based storage. Since DITA-OT processes a lot of XML data, the Store API offers
operations for XML processing directly. For example, a read method to directly get a DOM
Document
, instead of opening a file stream manually, parsing it with an XML parser, and getting
the Document
instance from the parser.
The Store API is extendable using Java’s
Resource Loader with the org.dita.dost.store.StoreBuilder
service. This is a builder
interface to get named Store
instances (“a Store”).
Stream Store for file-based processing
This Store could also be a File Store, since it uses disk and local files for temporary resources. This is the
traditional DITA-OT implementation, where temporary XML files are stored under the
dita.temp.dir
path.
The Stream Store is activated by setting the store-type parameter to file.
Cache Store for in-memory processing
This Store is an in-memory Store, that keeps all temporary resources in memory. The name comes from the feature
of the Store, that it caches the parsed XML after reading. That is, instead of storing XML as a byte array, it
keeps it as a DOM Document
or S9api XdmNode
. When the same
resource is re-read later, it doesn't have to parse it again, only return the parsed document. Resources that
are not available in the temporary directory are handled with the Stream Store.
While the Store doesn't write anything to the temporary directory, it will still use URIs where the resources
are under the temporary directory. The URIs are simply used for addressing, similarly to URNs. Future releases
of DITA-OT may use some other method of addressing, such as a tmp
URI scheme.
Benefits
The initial implementation of the Cache Store is provided in DITA-OT 3.6 as a preview to allow integration partners to test this new feature.
In-memory processing provides performance advantages in I/O bound environments such as cloud computing platforms, where processing time depends primarily on how long it takes to read and write temporary files.
The Store API also makes the Saxon S9api easier to use. It offers an XML document model that is in most cases easier to work with than DOM. The abstraction Store makes it easier to work with XML, so various modules don’t need to repeat the same type of XML processing code.
Caveats
Not all custom plug-ins will work with the Cache Store, because they may assume files are used and expect direct file access for resource operations.
<pipeline>
elements instead of Ant’s built-in <xslt>
tasks as
recommended in Plug-in coding conventions.