Geospatial¶
Nowadays, machine learning algorithms often need a geographic overlay for their processing. Often you will need to compute distances, surfaces or even check if some geo-shape intersect with others so as to enrich or tag your data accordingly.
How do you do that ? The punch language exposes standard geo operators to let you do that very easily, as explained below.
This has important benefits. First, you do not need to deploy additional applications to (re)process your data. Do it in your ingestion pipelines in the first place and save your application from running expensive additional jobs (additional processes, additional IOs).
Second the punch language is available in (spark) PML. You may also need geo capabilities in your machine learning feature transformation.
Last: the punch geo shape capabilities combined with Elasticsearch and Kibana provides you with a state-of-the-art geo spatial platform.
Geo Spatial Data¶
Here are a few examples. The geo json format let us you work with point:
{ "type": "Point", "coordinates": [ 40, 5 ] }
{ "type": "LineString", "coordinates": [ [ 40, 5 ], [ 41, 6 ] ] }
{
"type": "Polygon",
"coordinates": [ [ [ 0 , 0 ] , [ 3 , 6 ] , [ 6 , 1 ] , [ 0 , 0 ] ] ]
}
{
"type" : "Polygon",
"coordinates" : [
[ [ 0 , 0 ] , [ 3 , 6 ] , [ 6 , 1 ] , [ 0 , 0 ] ],
[ [ 2 , 2 ] , [ 3 , 3 ] , [ 4 , 2 ] , [ 2 , 2 ] ]
]
}
The latest example is illustrated next:
Note
The characters in the value of the type field can be in uppercase or lowercase.
Punch language Geo Capabilities¶
This section assumes you are familiar with the punch language. If note, make sure you read this chapter first.
Using the geo operator¶
Leveraging geo operators is extremely simple, here are a few examples.
[distance] = geo().distance([left], [right]);
if (geo().intersect([shape1],[shape2])) {
// do something ...
}
[area] = geo().getArea([shape]);
In there the geo()
punch tuple provides you with a ready to use geo operator that
provides the most common methods (distance, intersect, contains, getArea).
These methods rely on the punch assignment behavior that makes it extremely compact to safely code. Consider the first example:
[distance] = geo().distance([left], [right]);
[left]
and [right]
tuples contain geo shapes.
What if they contain hello world !
instead ?
The statement will have no effect at all. I.e. [distance]
will not exists.
If you want to check that explicitly, you can write
[distance] = geo().distance([left], [right]);
if (![distance]) {
// oops my data is not what I expect
}
Refer to the GeoOperator Javadoc
Using the geo Tuple¶
You can also explicitly retrieve a so-called GeoShape
tuple.
[shape] = [my_input].asGeoShape();
Once you have your tuple, you can access to its geometry, and from there, the many available methods:
- getBoundary
- getEnvelope
- getInteriorPoint
- etc ..