public abstract class FetchOperator<T> extends Object
FetchCsvOperator
and the FetchJsonOperator
,
CidrRangeOperator
operators, and more generally all operators
that support the loading of remote CSV or JSON resource files.
Using these you can fetch remote or local CSV or JSON files, and use them as resource tuples as required by the operators.
In turn you do not need to restart your processing pipelines whenever you update your resources files, this will be automatically reloaded without service interruption.
Whatever be the operator, here is how you can control the dynamic reloading of remote resources. Here is a simple example using the fetchCsv punch operator.
Tuple resource = fetchCsv("https://your.resource.server/resource.csv")
// make sure the resource is synchronously loaded at startup
.required()
// define the reload period
.refresh("0/60 0/1 * 1/1 * ? *")
// this part is specific to the fetchCsv operator
.asTuple();
Here is a similar example with the cidrRange operator:
if (
cidrRange("https://your.resource.server/resource.csv")
.required()
.refresh("0/60 0/1 * 1/1 * ? *")
// this part is specific to the cidrRange operator
.cidrField("range")
.on("192.168.1.3")
.into([match]
) {
...
}
You can fetch remote resources from HTTP, HTTPS, local files or S3. Here is an example using S3:
Tuple resource = fetchCsv()
.s3endpoint("https://play.min.io")
.s3AccessKey("Q3AM3UQ867SPQQA43P2F")
.s3SecretKey(""zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG")
.s3Bucket("myresources")
.s3File("myresources.csv.gz")
.required()
.refresh("0/60 0/1 * 1/1 * ? *")
.asTuple();
compact | the compact stores only Strings and not nested Tuple. Conversion to tuples is lazy and performed in case of matches |
compactDirect | This mode internally leverages a direct memory ByteBuffer. This is the most memory efficient strategy. |
compactInPlace | This is a compact strategy except that a single resource Tuple is kept and refreshed in memory. This avoids to require aditional memory only for (periodically) loading the new resource tuple. |
punchlang.fetch.rtt | measure the round trip time in milliseconds to load the resource |
punchlang.fetch.length | A sliding window histogram. Measure the length of the loaded resource in bytes. This value is computed based on the last 10 resource. |
punchlang.fetch.success | Meter. The number of successful loading |
punchlet.fetch.failure | Meter. The number of failed loading. |
* * * ? * * | Every second |
0 * * ? * * | Every minute |
0 */2 * ? * * | Every even minute |
0 1/2 * ? * * | Every uneven minute |
0 */2 * ? * * | Every 2 minutes |
0 */3 * ? * * | Every 3 minutes |
0 */4 * ? * * | Every 4 minutes |
0 */5 * ? * * | Every 5 minutes |
0 */10 * ? * * | Every 10 minutes |
0 */15 * ? * * | Every 15 minutes |
0 */30 * ? * * | Every 30 minutes |
0 15,30,45 * ? * * | Every hour at minutes 15, 30 and 45 |
0 0 * ? * * | Every hour |
0 0 */2 ? * * | Every hour |
0 0 0/2 ? * * | Every even hour |
0 0 1/2 ? * * | Every uneven hour |
0 0 */3 ? * * | Every three hours |
0 0 */4 ? * * | Every four hours |
0 0 */6 ? * * | Every six hours |
0 0 */8 ? * * | Every eight hours |
0 0 */12 ? * * | Every twelve hours |
0 0 0 * * ? | Every day at midnight - 12am |
0 0 1 * * ? | Every day at 1am |
0 0 6 * * ? | Every day at 6am |
0 0 12 * * ? | Every day at noon - 12pm |
0 0 12 * * ? | Every day at noon - 12pm |
0 0 12 * * SUN | Every Sunday at noon |
0 0 12 * * MON | Every Monday at noon |
0 0 12 * * TUE | Every Tuesday at noon |
0 0 12 * * WED | Every Wednesday at noon |
0 0 12 * * THU | Every Thursday at noon |
0 0 12 * * FRI | Every Friday at noon |
0 0 12 * * SAT | Every Saturday at noon |
0 0 12 * * MON-FRI | Every Weekday at noon |
0 0 12 * * SUN,SAT | Every Saturday and Sunday at noon |
0 0 12 */7 * ? | Every 7 days at noon |
0 0 12 1 * ? | Every month on the 1st, at noon |
0 0 12 2 * ? | Every month on the 2nd, at noon |
0 0 12 15 * ? | Every month on the 15th, at noon |
0 0 12 1/2 * ? | Every 2 days starting on the 1st of the month, at noon |
0 0 12 1/4 * ? | Every 4 days staring on the 1st of the month, at noon |
0 0 12 L * ? | Every month on the last day of the month, at noon |
0 0 12 L-2 * ? | Every month on the second to last day of the month, at noon |
0 0 12 LW * ? | Every month on the last weekday, at noon |
0 0 12 1L * ? | Every month on the last Sunday, at noon |
0 0 12 2L * ? | Every month on the last Monday, at noon |
0 0 12 6L * ? | Every month on the last Friday, at noon |
0 0 12 1W * ? | Every month on the nearest Weekday to the 1st of the month, at noon |
0 0 12 15W * ? | Every month on the nearest Weekday to the 15th of the month, at noon |
0 0 12 ? * 2#1 | Every month on the first Monday of the Month, at noon |
0 0 12 ? * 6#1 | Every month on the first Friday of the Month, at noon |
0 0 12 ? * 2#2 | Every month on the second Monday of the Month, at noon |
0 0 12 ? * 5#3 | Every month on the third Thursday of the Month, at noon - 12pm |
0 0 12 ? JAN * | Every day at noon in January only |
0 0 12 ? JUN * | Every day at noon in June only |
0 0 12 ? JAN,JUN * | Every day at noon in January and June |
0 0 12 ? DEC * | Every day at noon in December only |
0 0 12 ? JAN,FEB,MAR,APR * | Every day at noon in January, February, March and April |
0 0 12 ? 9-12 * | Every day at noon between September and December |
Modifier and Type | Field and Description |
---|---|
protected IResourceBuilder.CompactionType |
compactionType
Make it true to limit memory usage
|
protected String |
hashKey
An optional hash key to be used to generate a hash table instead of
an array.
|
protected static org.apache.logging.log4j.Logger |
logger |
protected boolean |
lowerCaseKeys
True to make search keys converted to lower case
|
protected boolean |
requiredResource
Set to true to prevent any processing in case of unavailable resource (at start time or afterwards).
|
protected RuntimeContext |
runtimeContext
The caller punchlet runtime context.
|
protected String |
s3Bucket
the S3 bucket.
|
protected String |
s3endpoint
the S3 end-point is the target S3 server HTTP.
|
protected Path |
s3KeyPath
the S3 access key.
|
protected String |
s3Object
the S3 resource object.
|
protected Path |
s3SecretPath
the S3 access key.
|
protected boolean |
silent
Set to true to prevent any processing in case of unavailable resource (at start time or afterwards).
|
protected String |
url
The remote or local resource URL.
|
protected String |
uuid
We need a unique identifier to identify each resource.
|
Modifier | Constructor and Description |
---|---|
protected |
FetchOperator(RuntimeContext r)
Ctor.
|
protected |
FetchOperator(RuntimeContext r,
String url)
Ctor.
|
Modifier and Type | Method and Description |
---|---|
T |
bestEffort()
Indicate that the resource must not be absolutely loaded before the traffic
pass.
|
T |
compact()
In case you have very big resource files (30Mb, 100Mb or more), this option will
represent the resource tuple in a way much more compact.
|
T |
compactDirect()
This option is similar than the compact method but leverage bytebuffers with direct memory allocation.
|
T |
compactInPlace()
Set to true to make resource tuples refreshed in place.
|
Tuple |
getResource()
This subl
|
abstract IResourceBuilder |
getResourceBuilder()
Implemented by subclasses to return the adequate resource builder.
|
T |
hashKey(String hashKey)
Use this method to produce a dictionary instead of an array.
|
T |
loadAtStartup()
Deprecated.
|
T |
lowerCaseKeys()
Make the lookup key values be converted to lower case value.
|
T |
refresh(String cron)
Make the operator periodically fetch the remote resource.
|
T |
required()
Indicate that the resource is absolutely required for the processing.
|
T |
s3AccessKey(String key)
This method is unsafe, use it only for test or ad-hoc investigations.
|
T |
s3AccessKeyPath(String path)
This method is unsafe, use it only for test or ad-hoc investigations.
|
T |
s3Bucket(String bucket)
The bucket where to load your resource file
|
T |
s3Endpoint(String endPoint)
Use this parameter for loading from a S3 compatible store.
|
T |
s3Object(String name)
The bucket where to load your resource file
|
T |
s3Secret(String secret)
This method is unsafe, use it only for test or ad-hoc investigations.
|
T |
silent()
Indicate that failures to load the resource must be silently ignored.
|
T |
url(String url) |
protected static final org.apache.logging.log4j.Logger logger
protected String uuid
protected String url
protected String s3endpoint
protected Path s3KeyPath
protected Path s3SecretPath
protected String s3Bucket
protected String s3Object
protected boolean requiredResource
protected boolean silent
protected IResourceBuilder.CompactionType compactionType
protected boolean lowerCaseKeys
protected final RuntimeContext runtimeContext
protected String hashKey
protected FetchOperator(RuntimeContext r)
r
- the punchlet runtime context.url
- an http or file url.protected FetchOperator(RuntimeContext r, String url)
r
- the punchlet runtime context.url
- an http or file url.public T required()
public T silent()
public T bestEffort()
@Deprecated public T loadAtStartup()
required()
instead.public T s3Endpoint(String endPoint)
endPoint
- the S3 end pointpublic T s3AccessKey(String key)
Providing an access key is not required for anonymous accesses.
endPoint
- the S3 end pointpublic T s3AccessKeyPath(String path)
Providing an access key is not required for anonymous accesses.
endPoint
- the S3 end pointpublic T s3Secret(String secret)
Providing an secret is not required for anonymous accesses.
endPoint
- the S3 end pointpublic T s3Bucket(String bucket)
endPoint
- the S3 end pointpublic T s3Object(String name)
filenameName
- the S3 object namepublic T hashKey(String hashKey)
hashKey
- the key of one of the field to be used as hash key.public T lowerCaseKeys()
For example if you have the following resource :
first user,dimi,54
Second User,ced,43
Third user,damien,43
You will be able to query your tuple using "second user" and "third user".public T compact()
On the good side, this drastically reduces the number of java objects, hence memory usage and garbage collection costs. On the down side every time you have a hit, a punch tuple will be constructed from that source String.
In other words, the compact strategy reduces RAM (possibly by a factor two) but increases the CPU. It all depends if you have many hits. Only benchmarks can tell you if this option is interesting for you.
Note that using that option only the Tuple.get(String)
method will be supported. Compact tuples are read-only.
public T compactInPlace()
This strategy leverages the compact()
resource encoding. The only difference
is there is a single copy updated in place.
public T compactDirect()
public T refresh(String cron)
cron
- a QUARTZ cron expression.public Tuple getResource()
public abstract IResourceBuilder getResourceBuilder()
FetchCsvOperator
returns a builder to construct a tuple from a CSV document
while the FetchJsonOperator
returns builders to deal with JSON files.
This is called only once at init time.
Copyright © 2023. All rights reserved.