Skip to content

Feature Generator

The feature_generator node makes it easy to generate numerical Datasets for testing mllib stages. These dataset are typically composed of numerical values and dense vectors. An example will explain it more clearly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
[
    {
        description:
        '''
        You simply write your data inline, it convert it as Dataset<Row>
        '''
        type: feature_generator
        component: input
        settings: {
                row: [
                    { 
                        column: id
                        type: integer
                    }
                    { 
                        column: features
                        type: vector
                    }
                    { 
                        column: clicked
                        type: double
                    }
                ]
                values: [
                    [ 
                        7 , [ 0.0, 0.0, 18.0, 1.0 ], 1.0 
                        8, [0.0, 1.0, 12.0, 0.0 ], 0.0
                        9, [1.0, 0.0, 15.0, 0.1], 0.0
                    ]
                ]
        }
        publish: [
            {
                stream: data
            }
        ]
    }
]

Executing this pml show the following result:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
punchplatform-analytics.sh --job pml.json

+---+-----------------+-------+
|id |features         |clicked|
+---+-----------------+-------+
|7  |[0.0,0.0,0.0,0.0]|1.0    |
|8  |[1.0,1.0,1.0,1.0]|0.0    |
|9  |[0.0,0.0,0.0,0.0]|0.0    |
+---+-----------------+-------+

root
 |-- id: integer (nullable = false)
 |-- features: vector (nullable = false)
 |-- clicked: double (nullable = false)