public class TreeBuilder extends Object
A mechanism for optimizing the process of loading large sets of records with
non-sequential keys. This class speeds up the process of inserting records
into a set of Persistit Tree
s by sorting them before inserting
them. The sort process uses multiple "sort trees" in multiple temporary
Volume
s to hold copies of the data. These are then merged into
the final "destination trees." Each sort tree is constrained to be small
enough to fit in the BufferPool
.
In general, Persistit can store records very quickly, even when the keys of those records arrive in random order, as long as all the pages of the destination tree or trees are resident in the buffer pool. However, the situation changes dramatically as soon as the the destination tree or trees exceed the size of the buffer pool. Once that happens, insert performance degrades because the ratio of records inserted per disk I/O operation performed decreases. In a worst-case scenario, inserting each new key may require two or more disk I/O operations. These may occur because Persistit performs the following steps:
TreeBuilder
mitigates that degradation by sorting the keys
before inserting them into their final destination trees. To do so it builds
a collection of bounded-size sort trees in temporary volumes. Then it
performs a merge sort from those trees into the final destination tree or
trees. This mechanism eliminates the problem that every key insertion
requires two (or more) random disk I/O operations. However, it is still the
case that every sort tree page must be written and read once, and every
destination tree page must be written at least once. Therefore the I/O
associated with TreeBuilder is reduced but not eliminated.
TreeBuilder is effective if and only if (a) the keys arrive in random order, and (b) the data is significantly larger than available memory in the buffer pool. In general it is faster to insert the keys directly into the destination trees unless both of these conditions are true.
The following example demonstrates the fundamental operation of
TreeBuilder
:
Note that a TreeBuilder can pre-sort data for multiple
destination trees. For example, it is possible to load and merge records for
a table and its corresponding indexes in one pass using TreeBuilder. During
the merge operation the final destination
Exchange exchange = db.getExchange("myVolume", "myTree", true);
TreeBuilder tb = new TreeBuilder(db);
//
// Insert the data into sort trees
//
while (source has more data) {
exchange.to(next key).getValue().put(next value);
tb.store(exchange);
}
//
// Merge the data into myTree
//
tb.merge();
Tree
are built in
sequence. By default that sequence is by alphabetical order of tree name, but
it is possible to customize TreeBuilder to change that order.
Loading a large data set may take a long time under the best of circumstances. Therefore this class is designed to be extended by applications to support progress reporting, to control disk space allocation, to handle attempts to insert conflicting records with duplicate keys, etc. See the following methods which may be overridden to provide custom behavior:
reportSorted(long)
- report completion of N records inserts into
sort treesreportMerged(long)
- report completion of N records mergedduplicateKeyDetected(Tree, Key, Value, Value)
- handle detection
of records inserted with duplicate keysbeforeMergeKey(Exchange)
- allowing filtering or custom handling
per record while mergingafterMergeKey(Exchange)
- customizable behavior after merging
one recordbeforeSortVolumeClosed(Volume, File)
- customizable behavior
before closing a sort volume when fullafterSortVolumeClose(Volume, File)
- customizable behavior after
closing a sort volume when fullgetTreeComparator()
- return a custom Comparator to determine
sequence in which trees are populated within the merge()
method
Constructor and Description |
---|
TreeBuilder(Persistit persistit) |
TreeBuilder(Persistit persistit,
String name,
int pageSize,
float bufferPoolFraction) |
Modifier and Type | Method and Description |
---|---|
protected void |
afterMergeKey(Exchange exchange)
This method may be extended to provide custom behavior after merging one
record.
|
protected void |
afterSortVolumeClose(Volume volume,
File file)
This method may be extended to provide application-specific reporting
functionality after a sort volume has been filled to capacity and has
been evicted.
|
protected boolean |
beforeMergeKey(Exchange exchange)
This method may be extended to provide alternative functionality.
|
protected void |
beforeSortVolumeClosed(Volume volume,
File file)
This method may be extended to provide application-specific behavior when
a sort volume has been filled to capacity.
|
void |
clear() |
protected boolean |
duplicateKeyDetected(Tree tree,
Key key,
Value v1,
Value v2)
This method may be extended to provide application-specific behavior when
an attempt is made to merge records with duplicate keys.
|
long |
getMergedKeyCount() |
String |
getName() |
long |
getReportKeyCountMultiple() |
long |
getSortedKeyCount() |
int |
getSortFileCount() |
List<File> |
getSortFileDirectories() |
protected Comparator<Tree> |
getTreeComparator()
This method may be extended to provide an application-specific ordering
on
Tree s. |
List<Tree> |
getTrees() |
void |
merge()
Merge the record previously stored in sort volumes into their destination
Tree s. |
protected void |
reportMerged(long count)
This method may be extended to provide application-specific progress
reports.
|
protected void |
reportSorted(long count)
This method may be extended to provide application-specific progress
reports.
|
void |
setReportKeyCountMultiple(long multiple)
Set the count of keys inserted or merged per call to
reportSorted(long) or reportMerged(long) . |
void |
setSortTreeDirectories(List<File> directories)
Define a list of directories in which sort volumes will be created.
|
void |
store(Exchange exchange)
Store a key-value pair into a sort tree.
|
void |
store(Tree tree,
Key key,
Value value)
Store a key-value pair for a specified
Tree into a sort
tree. |
public TreeBuilder(Persistit persistit)
public final String getName()
public final void setReportKeyCountMultiple(long multiple)
reportSorted(long)
or reportMerged(long)
.multiple
- public final long getReportKeyCountMultiple()
reportSorted(long)
or reportMerged(long)
public final int getSortFileCount()
public long getSortedKeyCount()
public long getMergedKeyCount()
public final List<Tree> getTrees()
Tree instances. This list is built as keys are stored.
public final void setSortTreeDirectories(List<File> directories) throws IOException
Define a list of directories in which sort volumes will be created. This
method can be used to override the default value provided by
Configuration.getTmpVolDir()
to control more closely where sort
trees will be stored. If the list is empty then the directory defined by
the Configuration
will be used. If multiple directories are
declared then volumes will be allocated to them in round-robin fashion.
This technique can distribute large load sets over multiple volumes and
can allow for interleaved disk reads during the merge process.
If a File
supplied to this method does not exist, an attempt
is made to create it as a directory. This method also attempts to create
and delete a file in each supplied directory to ensure that if there is a
file permission or other problem, it is detected immediately, rather than
much later during the sort process.
directories
- List of File
instances, each of which must be a
directoryIllegalArgumentException
- if a supplied file exists and is not a directory or cannot be
created as a new directoryIOException
- if an attempt to create a file in one of the supplied
directories failspublic final List<File> getSortFileDirectories()
setSortTreeDirectories(List)
method.public final void store(Exchange exchange) throws Exception
Tree
, Key
and Value
are specified by the supplied Exchange
.exchange
- The ExchangeException
public final void store(Tree tree, Key key, Value value) throws Exception
Tree
into a sort
tree.tree
- the Treekey
- the Keyvalue
- the ValueException
public void merge() throws Exception
Tree
s.Exception
protected void beforeSortVolumeClosed(Volume volume, File file) throws Exception
BufferPool
are invalidated to allow their immediate reuse.volume
- The temporary Volume
that has been filledfile
- the file to which the sorted key-value pairs will be writtenException
protected void afterSortVolumeClose(Volume volume, File file) throws Exception
setSortTreeDirectories(List)
within this method if necessary
to adjust disk space utilization, for example. The default behavior of
this method is to do nothing.volume
- The temporary Volume
that has been filledfile
- the file to which the sorted key-value pairs have been writtenException
protected boolean duplicateKeyDetected(Tree tree, Key key, Value v1, Value v2) throws Exception
This method may be extended to provide application-specific behavior when
an attempt is made to merge records with duplicate keys. The two
Value
s v1 and v2 are provided in the order they were
inserted into the TreeBuilder
. behavior is to write a
warning to the log and retain the first value..
tree
- the Tree
to which a key is being mergedkey
- the Key
v1
- the Value
previously insertedv2
- the conflicting Value
true
to replace the value previously stored,
false
to leave the value first inserted and ignore
the new value.DuplicateKeyException
- if a key being inserted or merged matches a key already
existsException
protected boolean beforeMergeKey(Exchange exchange) throws Exception
true
which signifies
that the key-value pair represented in the Exchange
should
be merged into the destination Tree
. A custom implementation
could be used to filter out unwanted records or to emit records to a
different destination.exchange
- represents the key-value pair proposed for mergingtrue
to allow the record to be mergedException
protected void afterMergeKey(Exchange exchange) throws Exception
beforeMergeKey(Exchange)
returned true
.exchange
- represents the key-value pair that was merged.Exception
protected void reportSorted(long count)
setReportKeyCountMultiple(long)
determines the frequency at
which this method is called.count
- The total number of recirds that has been merged so far.protected void reportMerged(long count)
setReportKeyCountMultiple(long)
determines the frequency at
which this method is called.count
- The total number of recirds that has been merged so far.protected Comparator<Tree> getTreeComparator()
Tree
s. This ordering determines the sequence in which
destination trees are built from the sort data. By default trees are
build in alphabetical order by volume and tree name. However, an
application may choose a different order to ensure invariants for
concurrent use.java.util.Comparator
on Tree
Copyright © 2025 Open Identity Platform Community. All rights reserved.