TROUBLESHOOTING structured streaming
In this page, we'll cover a few troubles you might encounter while using Spark Structured Streaming in the punch.
Input and Output Nodes¶
First, a structured streaming pipeline should always start with a streaming input and end with a streaming node.
If your input node is not a streaming node, you'll end up with this exception :
17:41:34 [ERROR][SparkMain] message="Analytics SparkMain error"
org.apache.spark.sql.AnalysisException: 'writeStream' can be called only on streaming Dataset/DataFrame
If your output node is not a streaming node, you'll end up with this exception :
17:55:08 [ERROR][SparkMain] message="Analytics SparkMain error"
org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed with writeStream.start();
Make sure you're using a correct streaming node.
Mode¶
Spark Structured Streaming allows three modes : APPEND, COMPLETE, UPDATE. See Structured Streaming documentation for more information on each mode.
If you're not doing any aggregation, you cannot use the COMPLETE mode.
18:02:07 [ERROR][SparkMain] message="Analytics SparkMain error"
org.apache.spark.sql.AnalysisException: Complete output mode not supported when there are no streaming aggregations on streaming DataFrames/Datasets;
If you're doing an aggregation, you cannot use the APPEND mode :
18:00:19 [ERROR][SparkMain] message="Analytics SparkMain error"
org.apache.spark.sql.AnalysisException: Append output mode not supported when there are streaming aggregations on streaming DataFrames/DataSets without watermark;
Choose your mode using the "mode" setting in your output node.