Caused by: org.apache.hudi.exception.HoodieCommitException: Failed to archive commits
at org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:323)
at org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:128)
at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:430)
at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:186)
at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:121)
at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:482)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:224)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145)
... 15 more
Caused by: java.io.IOException: Not an Avro data file
at org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50)
at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:175)
at org.apache.hudi.client.utils.MetadataConversionUtils.createMetaWrapper(MetadataConversionUtils.java:97)
at org.apache.hudi.table.HoodieTimelineArchiveLog.convertToAvroRecord(HoodieTimelineArchiveLog.java:369)
at org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:310)
... 48 more
hudi 0.8版本,在特殊条件下出现此问题
发生条件:
1.由于hdfs磁盘满,但是spark任务仍在写hudi表。hudi表写失败后,会在.hoodie下写一个rollback文件,由于hdfs没有空间,写入文件失败,只创建了一个rollback空目录
2.磁盘加空间后,再次执行,hudi会扫描上面的rollback文件,结果没有读到数据,就报错
临时解决办法:删除磁盘已满期间写的空的rollback文件以及对应的rollback.inflight文件,然后重新写hudi
永久解决方法: 此bug在hudi官方0.8已有issue和pr
https://github.com/apache/hudi/issues/4466
https://github.com/apache/hudi/pull/4016