当前位置: 首页>后端>正文

hudi-INSERT_OVERWRITE_TABLE failed with errors

spark 任务写hudi error,但是异常没有抛出。

ERROR HoodieSparkSqlWriter$: INSERT_OVERWRITE_TABLE failed with errors

降低driver log日志为TRACE级别,可以看到如下报错

23/08/24 17:31:37 WARN HoodieSparkSqlWriter$: Error for key: HoodieKey { recordKey=707810002387078 partitionPath=13}
34079934java.lang.ArrayIndexOutOfBoundsException
34079935        at reflectasm.java.lang.ArrayIndexOutOfBoundsExceptionConstructorAccess.newInstance(Unknown Source)
34079936        at com.twitter.chill.Instantiators$.$anonfun$reflectAsm(KryoBase.scala:151)
34079937        at com.twitter.chill.Instantiators$anon.newInstance(KryoBase.scala:137)
34079938        at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1139)
34079939        at com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:562)
34079940        at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:538)
34079941        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
34079942        at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:161)
34079943        at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)
34079944        at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
34079945        at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
34079946        at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
34079947        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
34079948        at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:306)
34079949        at org.apache.spark.serializer.DeserializationStream$anon.getNext(Serializer.scala:168)
34079950        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
34079951        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
34079952        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
34079953        at scala.collection.Iterator$anon.hasNext(Iterator.scala:513)
34079954        at scala.collection.Iterator.foreach(Iterator.scala:943)
34079955        at scala.collection.Iterator.foreach$(Iterator.scala:943)
34079956        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
34079957        at org.apache.spark.rdd.RDD.$anonfun$foreach(RDD.scala:1012)
34079958        at org.apache.spark.rdd.RDD.$anonfun$foreach$adapted(RDD.scala:1012)
34079959        at org.apache.spark.SparkContext.$anonfun$runJob(SparkContext.scala:2254)
34079960        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
34079961        at org.apache.spark.scheduler.Task.run(Task.scala:131)
34079962        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run(Executor.scala:506)
34079963        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
34079964        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
34079965        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
34079966        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
34079967        at java.lang.Thread.run(Thread.java:750)
3407996823/08/24 17:31:37 WARN HoodieSparkSqlWriter$: Error for key: HoodieKey { recordKey=708543334900218 partitionPath=13}
34079969java.lang.ArrayIndexOutOfBoundsException</pre>

原因:最后排查原因为dataframe中,数组类型的值中包含null元素.例如:[null, 无] ,写hudi序化过程中出现数组越界问题

修改方法:datafram添加过滤数组中null值元素的逻辑


https://www.xamrdz.com/backend/39k1932367.html

相关文章: