AzureBlobStorageSpout (Punch Storm Spouts and Bolts 6.4.5 API)

java.lang.Object
- org.apache.storm.topology.base.BaseComponent
- - org.apache.storm.topology.base.BaseRichSpout
  - - org.thales.punch.libraries.storm.api.BaseInputNode
    - - org.thales.punch.libraries.storm.spout.AzureBlobStorageSpout

All Implemented Interfaces:: Serializable, org.apache.storm.spout.ISpout, org.apache.storm.topology.IComponent, org.apache.storm.topology.IRichSpout

public class AzureBlobStorageSpout
extends org.thales.punch.libraries.storm.api.BaseInputNode

The AzureBlobStorageSpout enable you to pull data from a given container located in an Azure Blob Storage.

For this spout to work, you will need an Azure account with administrator access to the blob on which you want to fetch data.

How it works?

By specifying a blob extension (json, csv, *, etc...) of a container, this spout will pull on a specific time interval every blobs that was inserted / modified the last XX seconds/minutes/hours.

A last_committed blob is created in root path of your container. The blob contain the date of the last successfull ack tuple.

The created blob name is attributed to a unique blob name: Platform ID_Platform tenant_Platform Channel_Your topology name.

Parameter	Description	Value Type	Values	Required
blobstorage_name	The name of the blob storage.	String	"your_blob_storage_name"	true
blobstorage_key	The key of your blob storage.	String	"your_blob_storage_key"	true
container_name	The container name found in your blob storage.	String	a_container_name	true
virtual_directory_blob_name_extension_regex	Blob extension that should be pull.	String	"json"	true
pull_interval	Use elasticsearch time convention. Ex 10s for 10 seconds, 10m for 10 minutes and 10h for 10 hours. If not specified default value to 15 seconds.	String	"Xs" or "Xm" or "Xh", where X is an integer.	false
read_blob_since_last	Use elasticsearch time convention. Ex 10s for 10 seconds, 10m for 10 minutes and 10h for 10 hours. If not specified default value to 15 + 2 seconds. Will pull blob whose age (date) is greater than "current_time" - "last_modified_time". Upload time lag should be taken into account. Adding 2 seconds guarantees that all blobs will be pulled.	String	"Xs" or "Xm" or "Xh", where X is an integer.	false
chunk_size	Size of each reading operation in bytes. The buffer size should be big enough if you plan to pull large blob. Will default to: 1048576 if not specified.	Integer	ex: 256000	false
codec	Specify a codec to be use by the buffer.	String	"json_array"	true
read_strategy	Define the mode this spout should scan an azure blob storage container. last_committed: Resume pull from last_committed blob. earliest: Pull all blobs that exist in the container. Exit the topology when pulling is over. latest: Pull only latest blobs and last_committed blob are ignored. This is the default read strategy if user did not specified.	String	"last_committed" or "earliest" or "latest"	false

Below is a working example:


 {
 "spouts": [
   {
     "type": "azureblobstorage_spout",
     "spout_settings": {
       "blobstorage_name": "name-of-your-blob-storage",
       "blobstorage_key": "your-blob-storage-key",
       "container_name": "name-of-a-container-in-your-blob",
       "virtual_directory_blob_name_extension_regex": "*",
       "pull_interval": 10s,
       "read_blob_since_last": 19s,
       "chunk_size": 1048576,
       "codec": "json_array",
       "read_strategy": "last_committed",
       "blob_name_prefix": "a-pA-P",
       "virtual_directory_prefix_list": [""],
     },
     "storm_settings": {
       "component": "azureblob",
       "publish": [
         {
           "stream": "input",
           "fields": [
             "value"
           ]
         }
       ]
     }
   }
 ],
 "bolts": [
   {
     "type": "punch_bolt",
     "bolt_settings": {
       "punchlet_code": "{print(root);}",
       "decoding_strategy" : "smart"
     },
     "storm_settings": {
       "component": "punch_bolt",
       "subscribe": [
         {
           "component": "azureblob",
           "stream": "input"
         }
       ]
     }
   }
 ]
}

Author:: Jonathan YUE CHUN
See Also:: Serialized Form

Field Summary
- Fields inherited from class org.thales.punch.libraries.storm.api.BaseInputNode
  collector, exitCondition, latencyRecordSender, loadController, metricContext, myself, nodeSettings

Constructor Summary

Constructors
Constructor and Description

AzureBlobStorageSpout(org.thales.punch.libraries.storm.api.NodeSettings config, boolean enableNsg)
Constructor.

Constructors
Constructor and Description
`AzureBlobStorageSpout(org.thales.punch.libraries.storm.api.NodeSettings config, boolean enableNsg)` Constructor.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`ack(Object msgId)`
`void`	`fail(Object msgId)`
`void`	`lastCommittedWriter()` will be use to write last_committed on success
`void`	`nextTuple()`
`void`	`open(Map conf, org.apache.storm.task.TopologyContext context, org.apache.storm.spout.SpoutOutputCollector collector)`

Methods inherited from class org.thales.punch.libraries.storm.api.BaseInputNode
close, deactivate, declareOutputFields, getPublishedStreams, regulate, sendLatencyRecord

Methods inherited from class org.apache.storm.topology.base.BaseRichSpout
activate

Methods inherited from class org.apache.storm.topology.base.BaseComponent
getComponentConfiguration

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.thales.punch.libraries.storm.api.ISpout
registerNextTupleCallback

Methods inherited from interface org.apache.storm.spout.ISpout
activate

Methods inherited from interface org.apache.storm.topology.IComponent
getComponentConfiguration

- Constructor Detail
  - AzureBlobStorageSpout
```
public AzureBlobStorageSpout(org.thales.punch.libraries.storm.api.NodeSettings config,
                             boolean enableNsg)
```
    Constructor.
    
    Parameters:
    
    config - the spout config
    
    enableNsg - true to make the spout act as a NSG spout
- Method Detail
  - open
```
public void open(Map conf,
                 org.apache.storm.task.TopologyContext context,
                 org.apache.storm.spout.SpoutOutputCollector collector)
```
    Specified by:
    
    open in interface org.apache.storm.spout.ISpout
    
    Overrides:
    
    open in class org.thales.punch.libraries.storm.api.BaseInputNode
  - nextTuple
```
public void nextTuple()
```
  - lastCommittedWriter
```
public void lastCommittedWriter()
```
    will be use to write last_committed on success
  - ack
```
public void ack(Object msgId)
```
    Specified by:
    
    ack in interface org.apache.storm.spout.ISpout
    
    Overrides:
    
    ack in class org.thales.punch.libraries.storm.api.BaseInputNode
  - fail
```
public void fail(Object msgId)
```
    Specified by:
    
    fail in interface org.apache.storm.spout.ISpout
    
    Overrides:
    
    fail in class org.thales.punch.libraries.storm.api.BaseInputNode

Class AzureBlobStorageSpout

How it works?

Field Summary

Fields inherited from class org.thales.punch.libraries.storm.api.BaseInputNode

Constructor Summary

Method Summary

Methods inherited from class org.thales.punch.libraries.storm.api.BaseInputNode

Methods inherited from class org.apache.storm.topology.base.BaseRichSpout

Methods inherited from class org.apache.storm.topology.base.BaseComponent

Methods inherited from class java.lang.Object

Methods inherited from interface org.thales.punch.libraries.storm.api.ISpout

Methods inherited from interface org.apache.storm.spout.ISpout

Methods inherited from interface org.apache.storm.topology.IComponent

Constructor Detail

AzureBlobStorageSpout

Method Detail

open

nextTuple

lastCommittedWriter

ack

fail