Skip to content

Sql Statement

The sql_statement node lets you execute sql expressions on a dataset. It is very useful to select, rename or cast some columns of your dataset rows.

Here is a simple example to perform a top 5 on a field that can contains different names...

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
{
  job: [
        {
            type: elastic_batch_input
            component: input
            settings: {
                index: metricbeat-*
                cluster_name: es_search
                nodes: [
                    localhost
                ]
                output_columns: [
                    {
                        type: string
                        field: metricset.name
                    }
                ]
                query: {
                    size: 0
                    query: {
                        bool: {
                            must: [
                                {
                                    exists: {
                                        field: metricset.name
                                    }
                                }
                                {
                                    range: {
                                        @timestamp : {
                                            gte : now-10m
                                            lt :  now
                                        }
                                    }
                                }
                            ]
                        }
                    }
                }
            }
            publish: [
                {
                    stream: data
                }
            ]
        }
        {
            type: sql_statement
            component: sql_statement
            settings: {
                statement: SELECT COUNT(`metricset.name`) AS TOP_5_mname, `metricset.name` AS NAME FROM input_data GROUP BY `metricset.name` ORDER BY TOP_5_mname DESC LIMIT 5
            }
            subscribe: [
                {
                    component: input
                    stream: data
                }
            ]
            publish: [
                { 
                    stream: data
                }
            ]
        }
        {
            type: show
            component: show
            settings: {
                truncate: false
                num_rows: 10
            }
            subscribe: [
                {
                component: sql_statement
                stream: data
                }
            ]
        }
    ]
}

Parameters

1
2
3
4
-   statement: String

        Valid Spark Sql query string
        Table name are set as follow: component_stream where component is the name of the suscribed component alongside it's stream name.