spark 任务写hudi error,但是异常没有抛出。
ERROR HoodieSparkSqlWriter$: INSERT_OVERWRITE_TABLE failed with errors
降低driver log日志为TRACE级别,可以看到如下报错
23/08/24 17:31:37 WARN HoodieSparkSqlWriter$: Error for key: HoodieKey { recordKey=707810002387078 partitionPath=13}
34079934java.lang.ArrayIndexOutOfBoundsException
34079935 at reflectasm.java.lang.ArrayIndexOutOfBoundsExceptionConstructorAccess.newInstance(Unknown Source)
34079936 at com.twitter.chill.Instantiators$.$anonfun$reflectAsm(KryoBase.scala:151)
34079937 at com.twitter.chill.Instantiators$anon.newInstance(KryoBase.scala:137)
34079938 at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1139)
34079939 at com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:562)
34079940 at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:538)
34079941 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
34079942 at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:161)
34079943 at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)
34079944 at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
34079945 at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
34079946 at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
34079947 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
34079948 at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:306)
34079949 at org.apache.spark.serializer.DeserializationStream$anon.getNext(Serializer.scala:168)
34079950 at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
34079951 at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
34079952 at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
34079953 at scala.collection.Iterator$anon.hasNext(Iterator.scala:513)
34079954 at scala.collection.Iterator.foreach(Iterator.scala:943)
34079955 at scala.collection.Iterator.foreach$(Iterator.scala:943)
34079956 at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
34079957 at org.apache.spark.rdd.RDD.$anonfun$foreach(RDD.scala:1012)
34079958 at org.apache.spark.rdd.RDD.$anonfun$foreach$adapted(RDD.scala:1012)
34079959 at org.apache.spark.SparkContext.$anonfun$runJob(SparkContext.scala:2254)
34079960 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
34079961 at org.apache.spark.scheduler.Task.run(Task.scala:131)
34079962 at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run(Executor.scala:506)
34079963 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
34079964 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
34079965 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
34079966 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
34079967 at java.lang.Thread.run(Thread.java:750)
3407996823/08/24 17:31:37 WARN HoodieSparkSqlWriter$: Error for key: HoodieKey { recordKey=708543334900218 partitionPath=13}
34079969java.lang.ArrayIndexOutOfBoundsException</pre>
原因:最后排查原因为dataframe中,数组类型的值中包含null元素.例如:[null, 无] ,写hudi序列化过程中出现数组越界问题
修改方法:datafram添加过滤数组中null值元素的逻辑