Druid-0.12.3使用CDH5.14.0加载数据时遇到的一些问题
近期在加载HDFS中数据的时候遇到一些问题,记录一下
版本冲突
错误一:xercesImpl版本冲突
2018-11-05 13:55:34,673 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster java.lang.LinkageError: loader constraint violation: when resolving overridden method "org.apache.xerces.jaxp.DocumentBuilderImpl.newDocument()Lorg/w3c/dom/Document;" the class loader (instance of org/apache/hadoop/util/ApplicationClassLoader) of the current class, org/apache/xerces/jaxp/DocumentBuilderImpl, and its superclass loader (instance of <bootloader>), have different Class objects for the type org/w3c/dom/Document used in the signature at org.apache.xerces.jaxp.DocumentBuilderFactoryImpl.newDocumentBuilder(Unknown Source) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2621) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2583) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2489) at org.apache.hadoop.conf.Configuration.get(Configuration.java:1270) at org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider.getRecordFactory(RecordFactoryProvider.java:49) at org.apache.hadoop.mapreduce.TypeConverter.<clinit>(TypeConverter.java:62) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:523) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:511) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1614) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:511) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:301) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1572) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1569) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1502) 2018-11-05 13:55:34,681 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1
错误二:jackson版本冲突
2018-11-05 14:54:25,477 ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.VerifyError: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object; at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at com.fasterxml.jackson.datatype.guava.GuavaModule.setupModule(GuavaModule.java:22) at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:524) at io.druid.jackson.DefaultObjectMapper.<init>(DefaultObjectMapper.java:47) at io.druid.jackson.DefaultObjectMapper.<init>(DefaultObjectMapper.java:35) at io.druid.jackson.JacksonModule.jsonMapper(JacksonModule.java:46) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.google.inject.internal.ProviderMethod.get(ProviderMethod.java:104) at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40) at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031) at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) at com.google.inject.Scopes$1$1.get(Scopes.java:65) at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40) at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:54) at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38) at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62) at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:84) at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254) at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031) at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) at com.google.inject.Scopes$1$1.get(Scopes.java:65) at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40) at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38) at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62) at com.google.inject.internal.SingleMethodInjector.inject(SingleMethodInjector.java:83) at com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:110) at com.google.inject.internal.MembersInjectorImpl$1.call(MembersInjectorImpl.java:75) at com.google.inject.internal.MembersInjectorImpl$1.call(MembersInjectorImpl.java:73) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024) at com.google.inject.internal.MembersInjectorImpl.injectAndNotify(MembersInjectorImpl.java:73) at com.google.inject.internal.Initializer$InjectableReference.get(Initializer.java:147) at com.google.inject.internal.Initializer.injectAll(Initializer.java:92) at com.google.inject.internal.InternalInjectorCreator.injectDynamically(InternalInjectorCreator.java:173) at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:109) at com.google.inject.Guice.createInjector(Guice.java:95) at com.google.inject.Guice.createInjector(Guice.java:72) at io.druid.guice.GuiceInjectors.makeStartupInjector(GuiceInjectors.java:60) at io.druid.indexer.HadoopDruidIndexerConfig.<clinit>(HadoopDruidIndexerConfig.java:106) at io.druid.indexer.HadoopDruidIndexerMapper.setup(HadoopDruidIndexerMapper.java:51) at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.setup(DetermineHashedPartitionsJob.java:225) at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.run(DetermineHashedPartitionsJob.java:280) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
解决办法
(可选)下载与本地版本对应的hadoop客户端
默认的${DRUID_HOME}/hadoop-dependencies/hadoop-client下的版本为2.7.3,如果想与自己的Hadoop版本匹配,则可以执行下面的命令:
java -classpath "lib/*" io.druid.cli.Main tools pull-deps -r https://repository.cloudera.com/content/repositories/releases/ --clean -h org.apache.hadoop:hadoop-client:2.6.0-cdh5.14.0
注意:下载的时候会清理掉${DRUID_HOME}/hadoop-dependencies/hadoop-client和${DRUID_HOME}/extensions两个目录,注意备份
修改spec
{ "type": "index_hadoop", "spec": { "dataSchema": { "dataSource": "cxy_wikiticker3", "parser": { "type": "hadoopyString", "parseSpec": { "format": "json", "dimensionsSpec": { "dimensions": [ "channel", "cityName", "comment", "countryIsoCode", "countryName", "isAnonymous", "isMinor", "isNew", "isRobot", "isUnpatrolled", "metroCode", "namespace", "page", "regionIsoCode", "regionName", "user", { "name": "delta", "type": "long" }, { "name": "added", "type": "long" }, { "name": "deleted", "type": "long" } ] }, "timestampSpec": { "format": "auto", "column": "time" } } }, "metricsSpec": [], "granularitySpec": { "type": "uniform", "segmentGranularity": "day", "queryGranularity": "none", "intervals": [ "2015-09-12/2015-09-13" ], "rollup": false } }, "ioConfig": { "type": "hadoop", "inputSpec": { "type": "static", "paths": "/work/dc/examples/wikiticker-2015-09-12-sampled.json.gz" } }, "tuningConfig": { "type": "hadoop", "partitionsSpec": { "type": "hashed", "targetPartitionSize": 5000000 }, "ignoreInvalidRows": true, "jobProperties" : { "mapreduce.job.user.classpath.first": "true" } } }, "hadoopDependencyCoordinates": [ "org.apache.hadoop:hadoop-client:2.6.0-cdh5.14.0" ] }
重点在以下两个地方
测试
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/hadoop.json http://hadoop.cxy7.com:8090/druid/indexer/v1/task
参考
https://github.com/vihag/druid-hacks/tree/master/xerces-hell/cdh-5.10.2
读后有收获可以支付宝请作者喝咖啡
