Skip to content

Geospatial

Nowadays, machine learning algorithms often need a geographic overlay for their processing. Often you will need to compute distances, surfaces or even check if some geo-shape intersect with others so as to enrich or tag your data accordingly.

How do you do that ? The punch language exposes standard geo operators to let you do that very easily, as explained below.

This has important benefits. First, you do not need to deploy additional applications to (re)process your data. Do it in your ingestion pipelines in the first place and save your application from running expensive additional jobs (additional processes, additional IOs).

Second the punch language is available in (spark) PML. You may also need geo capabilities in your machine learning feature transformation.

Last: the punch geo shape capabilities combined with Elasticsearch and Kibana provides you with a state-of-the-art geo spatial platform.

Geo Spatial Data

Here are a few examples. The geo json format let us you work with point:

{ "type": "Point", "coordinates": [ 40, 5 ] }
lines:
{ "type": "LineString", "coordinates": [ [ 40, 5 ], [ 41, 6 ] ] }
polygon
{
  "type": "Polygon",
  "coordinates": [ [ [ 0 , 0 ] , [ 3 , 6 ] , [ 6 , 1 ] , [ 0 , 0  ] ] ]
}
or shapes that can contain inner shapes:
{
  "type" : "Polygon",
  "coordinates" : [
     [ [ 0 , 0 ] , [ 3 , 6 ] , [ 6 , 1 ] , [ 0 , 0 ] ],
     [ [ 2 , 2 ] , [ 3 , 3 ] , [ 4 , 2 ] , [ 2 , 2 ] ]
  ]
}

The latest example is illustrated next:

image

Note

The characters in the value of the type field can be in uppercase or lowercase.

Punch language Geo Capabilities

This section assumes you are familiar with the punch language. If note, make sure you read this chapter first.

Using the geo operator

Leveraging geo operators is extremely simple, here are a few examples.

    [distance] = geo().distance([left], [right]);
   if (geo().intersect([shape1],[shape2])) {
     // do something ...
   }
    [area] = geo().getArea([shape]);

In there the geo() punch tuple provides you with a ready to use geo operator that provides the most common methods (distance, intersect, contains, getArea).

These methods rely on the punch assignment behavior that makes it extremely compact to safely code. Consider the first example:

    [distance] = geo().distance([left], [right]);
This will only work if the [left] and [right] tuples contain geo shapes. What if they contain hello world ! instead ? The statement will have no effect at all. I.e. [distance] will not exists.

If you want to check that explicitly, you can write

    [distance] = geo().distance([left], [right]);
    if (![distance]) {
        // oops my data is not what I expect
    }

Refer to the GeoOperator Javadoc

Using the geo Tuple

You can also explicitly retrieve a so-called GeoShape tuple.

    [shape] = [my_input].asGeoShape();
This operation will throw an exception if your input data contains something that cannot be converted to a valid (geo json) shape.

Once you have your tuple, you can access to its geometry, and from there, the many available methods:

  • getBoundary
  • getEnvelope
  • getInteriorPoint
  • etc ..