解决kafka:org.apache.kafka.common.errors.TimeoutException问题
作者:小明同学YYDS
记录使用kafka遇到的问题:
- 1.Caused by java.nio.channels.UnresolvedAddressException null
- 2.org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for t2-0: 30042 ms has passed since batch creation plus linger time
两个报错是同一个问题。
问题重现
java 生产者向kafka集群生产消息,报错如下:
2018-06-03 00:10:02.071 INFO 80 --- [nio-8080-exec-1] o.a.kafka.common.utils.AppInfoParser : Kafka version : 0.10.2.0
2018-06-03 00:10:02.071 INFO 80 --- [nio-8080-exec-1] o.a.kafka.common.utils.AppInfoParser : Kafka commitId : 576d93a8dc0cf421
2018-06-03 00:10:32.253 ERROR 80 --- [ad | producer-1] o.s.k.support.LoggingProducerListener : Exception thrown when sending a message with key='test1' and payload='hello122' to topic t2:org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for t2-0: 30042 ms has passed since batch creation plus linger time
明显连接kafka cluster超时,但是实际上并不知道具体是什么原因。
排除找原因
确定kafka集群已经启动,包括zookeeper、kafka集群。
可以通过命令
ps -ef | grep java
个人排查,kafka集群即zookeeper已经成功启动
确定生产者程序一方的kafka配置是否有误。
spring boot的配置如下
spring.kafka.bootstrap-servers=39.108.61.252:9092 spring.kafka.consumer.group-id=springboot-group1 spring.kafka.consumer.auto-offset-reset=earliest
自然也没有问题
确定kafka集群所在机器防火墙或者说安全组是否已经开放相关端口。
若是在windows上,打开telnet
工具可以查看是否端口开放被监听
cmd进入命令行
telnet 39.108.61.252 9092
执行后成功连接,说明问题在程序所在的一方。
打开debug日志级别看错误详细信息
spring boot 的设置方式是:
logging.level.root=debug
然后重启应用
发现后台在不停的刷错误
如下:
2018-06-03 00:22:37.703 DEBUG 5972 --- [ t1-0-C-1] org.apache.kafka.clients.NetworkClient : Initialize connection to node 0 for sending metadata request
2018-06-03 00:22:37.703 DEBUG 5972 --- [ t1-0-C-1] org.apache.kafka.clients.NetworkClient : Initiating connection to node 0 at izwz9c79fdwp9sb65vpyk3z:9092.
2018-06-03 00:22:37.703 DEBUG 5972 --- [ t1-0-C-1] org.apache.kafka.clients.NetworkClient : Error connecting to node 0 at izwz9c79fdwp9sb65vpyk3z:9092:java.io.IOException: Can't resolve address: izwz9c79fdwp9sb65vpyk3z:9092
at org.apache.kafka.common.network.Selector.connect(Selector.java:182) ~[kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:629) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.NetworkClient.access$600(NetworkClient.java:57) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:768) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:684) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:347) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:203) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:138) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:216) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:193) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:275) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) [kafka-clients-0.10.2.0.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:558) [spring-kafka-1.2.2.RELEASE.jar:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_144]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_144]
Caused by: java.nio.channels.UnresolvedAddressException: null
at sun.nio.ch.Net.checkAddress(Net.java:101) ~[na:1.8.0_144]
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622) ~[na:1.8.0_144]
at org.apache.kafka.common.network.Selector.connect(Selector.java:179) ~[kafka-clients-0.10.2.0.jar:na]
... 17 common frames omitted
可知建立socket时不能解析kafka所在服务器地址
查看日志可知,解析的地址是izwz9c79fdwp9sb65vpyk3z
这个地址是远程服务器的实例名称(阿里云服务器)。
自己配置的明明是ip,程序内部却去获取他的别名,那如果生产者所在机器上没有配置这个ip的别名,就不能解析到对应的ip,所以连接失败报错。
解决
windows则去添加一条host映射
C:\Windows\System32\drivers\etc\hosts
39.108.61.252 izwz9c79fdwp9sb65vpyk3z 127.0.0.1 localhost
linux则
vi /etc/hosts 修改方式一致
此时重启生产者应用
日志如下部分,成功启动,后台会一直在心跳检测连接和更新offset,所以debug日志一直在刷,此时就可以把日志级别修改为info
2018-06-03 00:29:46.543 DEBUG 12772 --- [ t2-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : Group springboot-group1 committed offset 10 for partition t2-0
2018-06-03 00:29:46.543 DEBUG 12772 --- [ t2-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : Completed auto-commit of offsets {t2-0=OffsetAndMetadata{offset=10, metadata=''}} for group springboot-group1
2018-06-03 00:29:46.563 DEBUG 12772 --- [ t1-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : Group springboot-group1 committed offset 0 for partition t1-0
2018-06-03 00:29:46.563 DEBUG 12772 --- [ t1-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : Completed auto-commit of offsets {t1-0=OffsetAndMetadata{offset=0, metadata=''}} for group springboot-group1
2018-06-03 00:29:46.672 DEBUG 12772 --- [ t2-0-C-1] o.a.k.c.consumer.internals.Fetcher : Sending fetch for partitions [t2-0] to broker izwz9c79fdwp9sb65vpyk3z:9092 (id: 0 rack: null)
2018-06-03 00:29:46.867 DEBUG 12772 --- [ t1-0-C-1] essageListenerContainer$ListenerConsumer : Received: 0 records
2018-06-03 00:29:46.872 DEBUG 12772 --- [ t2-0-C-1] essageListenerContainer$ListenerConsumer : Received: 0 records
验证
进行生产数据,成功执行!
总结
以上为个人经验,希望能给大家一个参考,也希望大家多多支持脚本之家。