TROUBLESHOOTING structured streaming

In this page, we'll cover a few troubles you might encounter while using Spark Structured Streaming in the punch.

Input and Output Nodes¶

First, a structured streaming pipeline should always start with a streaming input and end with a streaming node.

If your input node is not a streaming node, you'll end up with this exception :

17:41:34 [ERROR][SparkMain] message="Analytics SparkMain error" 
org.apache.spark.sql.AnalysisException: 'writeStream' can be called only on streaming Dataset/DataFrame

If your output node is not a streaming node, you'll end up with this exception :

17:55:08 [ERROR][SparkMain] message="Analytics SparkMain error" 
org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed with writeStream.start();

Make sure you're using a correct streaming node.

Mode¶

Spark Structured Streaming allows three modes : APPEND, COMPLETE, UPDATE. See Structured Streaming documentation for more information on each mode.

If you're not doing any aggregation, you cannot use the COMPLETE mode.

18:02:07 [ERROR][SparkMain] message="Analytics SparkMain error"
org.apache.spark.sql.AnalysisException:  Complete output mode not supported when there are no streaming aggregations on streaming DataFrames/Datasets;

If you're doing an aggregation, you cannot use the APPEND mode :

18:00:19 [ERROR][SparkMain] message="Analytics SparkMain error"
org.apache.spark.sql.AnalysisException: Append output mode not supported when there are streaming aggregations on streaming DataFrames/DataSets without watermark;

Choose your mode using the "mode" setting in your output node.