当前位置: 首页>后端>正文

hudi 0.8 java.io.IOException- Not an Avro data file

Caused by: org.apache.hudi.exception.HoodieCommitException: Failed to archive commits
        at org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:323)
        at org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:128)
        at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:430)
        at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:186)
        at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:121)
        at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:482)
        at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:224)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145)
        ... 15 more
    Caused by: java.io.IOException: Not an Avro data file
        at org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50)
        at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:175)
        at org.apache.hudi.client.utils.MetadataConversionUtils.createMetaWrapper(MetadataConversionUtils.java:97)
        at org.apache.hudi.table.HoodieTimelineArchiveLog.convertToAvroRecord(HoodieTimelineArchiveLog.java:369)
        at org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:310)
        ... 48 more

hudi 0.8版本,在特殊条件下出现此问题

发生条件
1.由于hdfs磁盘满,但是spark任务仍在写hudi表。hudi表写失败后,会在.hoodie下写一个rollback文件,由于hdfs没有空间,写入文件失败,只创建了一个rollback空目录
2.磁盘加空间后,再次执行,hudi会扫描上面的rollback文件,结果没有读到数据,就报错

临时解决办法:删除磁盘已满期间写的空的rollback文件以及对应的rollback.inflight文件,然后重新写hudi

永久解决方法: 此bug在hudi官方0.8已有issue和pr
https://github.com/apache/hudi/issues/4466
https://github.com/apache/hudi/pull/4016


https://www.xamrdz.com/backend/3c31935542.html

相关文章: