Feature Generator¶
The feature_generator
node makes it easy to generate numerical Datasets
for testing mllib stages. These dataset are typically composed of numerical values and dense vectors.
An example will explain it more clearly:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | [ { description: ''' You simply write your data inline, it convert it as Dataset<Row> ''' type: feature_generator component: input settings: { row: [ { column: id type: integer } { column: features type: vector } { column: clicked type: double } ] values: [ [ 7 , [ 0.0, 0.0, 18.0, 1.0 ], 1.0 8, [0.0, 1.0, 12.0, 0.0 ], 0.0 9, [1.0, 0.0, 15.0, 0.1], 0.0 ] ] } publish: [ { stream: data } ] } ] |
Executing this pml show the following result:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | punchplatform-analytics.sh --job pml.json +---+-----------------+-------+ |id |features |clicked| +---+-----------------+-------+ |7 |[0.0,0.0,0.0,0.0]|1.0 | |8 |[1.0,1.0,1.0,1.0]|0.0 | |9 |[0.0,0.0,0.0,0.0]|0.0 | +---+-----------------+-------+ root |-- id: integer (nullable = false) |-- features: vector (nullable = false) |-- clicked: double (nullable = false) |