public class BatchArchiver extends Object
BatchArchiver
is in charge of writing the incoming data to one or several
destination. It must be associated with an AvroParquetArchiveBuffer
.
This class holds a current batch of tuples. It provides a method to marshall them all in a buffer once the file bolt receives the batch completion signal.
It also provide a bloom filter.
Constructor and Description |
---|
BatchArchiver(org.thales.punch.libraries.objectstorage.indexing.BatchMetadata metaData,
org.thales.punch.libraries.objectstorage.tuples.ArchiveFormatSettings archiveFormatSettings)
Ctor.
|
Modifier and Type | Method and Description |
---|---|
void |
archiveTuple(org.apache.storm.tuple.Tuple tuple,
long epoch,
Iterable<String> bloomFields)
Marshal this tuple into the archive(s).
|
String |
flush()
Flush your archive.
|
org.apache.storm.tuple.Tuple |
getAnchoredTuple()
Getter for anchored tuple
|
org.thales.punch.libraries.objectstorage.indexing.BatchMetadata |
getBatchMetadata()
An
BatchArchiver is handling a single batch. |
String |
getExpirationCause()
Getter for expiration cause
|
boolean |
hasExpired(long expirationTimeout,
long lifeTimeout)
Check if this batch has expired, given an expiration timeout.
|
BatchArchiver |
prepare(Supplier<IArchiveBuffer> archiveBufferSupplier,
org.thales.punch.libraries.metrics.api.IMetricContext metricContext)
Prepare this instance.
|
BatchArchiver |
setAnchoredTuple(org.apache.storm.tuple.Tuple tuple)
In the exactly once mode, the first tuple of each batch is
kept unacknowledged until the batch is flushed.
|
BatchArchiver |
setBloomParameters(int expectedInsertions,
double fpp)
Use this to make your
BatchArchiver instance compute bloom filters. |
long |
size() |
void |
updateBloom(org.apache.storm.tuple.ITuple tuple,
Iterable<String> bloomFilterFields)
Update the bloom filter with the new fields values
|
public BatchArchiver(org.thales.punch.libraries.objectstorage.indexing.BatchMetadata metaData, org.thales.punch.libraries.objectstorage.tuples.ArchiveFormatSettings archiveFormatSettings)
The BatchMetadata
must only be filled with the minimum required fields,
a batch id, partition id, tenant and channel name.
The ArchiveFormatSettings
describes how to encode the data (csv, fields etc).
The destinations tells the archiver where to store the generate archives.
metaData
- the meta dataarchiveFormatSettings
- the format encodingpublic BatchArchiver prepare(Supplier<IArchiveBuffer> archiveBufferSupplier, org.thales.punch.libraries.metrics.api.IMetricContext metricContext)
archiveBufferSupplier
- the supplier so that one archiveBuffer per destination can be created.metricContext
- the metric context from boltpublic void archiveTuple(org.apache.storm.tuple.Tuple tuple, long epoch, Iterable<String> bloomFields)
The epoch parameter let us you provide a timestamp that will be considered as the time of generation of your tuple.
tuple
- an incoming tupleepoch
- an associated timestampbloomFields
- Fields used for bloom filterpublic String flush()
Even for those, the flush is a required operation, and triggers the generation of the meta data document that summarizes the archiving operation, and return to you with what you need : the stats, etc ..
Some archiver write the same archive to several underlying destinations. In that case a single meta data is returned. It is guaranteed to be identical for all destination archives. I.e. the same batch file has been written on all the configured destinations.
public BatchArchiver setBloomParameters(int expectedInsertions, double fpp)
BatchArchiver
instance compute bloom filters.
If you call it with a zero expectedInsertions, calling this has no effect.
expectedInsertions
- the expected number of different valuesfpp
- the false positive target ratiopublic void updateBloom(org.apache.storm.tuple.ITuple tuple, Iterable<String> bloomFilterFields)
tuple
- tuplebloomFilterFields
- bloomFilterFieldspublic boolean hasExpired(long expirationTimeout, long lifeTimeout)
expirationTimeout
- Timeout before inactivity expirationlifeTimeout
- Timeout before life expirationpublic String getExpirationCause()
public org.apache.storm.tuple.Tuple getAnchoredTuple()
public BatchArchiver setAnchoredTuple(org.apache.storm.tuple.Tuple tuple)
tuple
- the first tuple of each batchpublic org.thales.punch.libraries.objectstorage.indexing.BatchMetadata getBatchMetadata()
BatchArchiver
is handling a single batch. It is associated to a
BatchMetadata
that stores the batch identifier and many other properties.public long size()
Copyright © 2023. All rights reserved.