Druid-0.12.3使用CDH5.14.0加载数据时遇到的一些问题

  |   0 评论   |   592 浏览

近期在加载HDFS中数据的时候遇到一些问题,记录一下

版本冲突

错误一:xercesImpl版本冲突

2018-11-05 13:55:34,673 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.LinkageError: loader constraint violation: when resolving overridden method "org.apache.xerces.jaxp.DocumentBuilderImpl.newDocument()Lorg/w3c/dom/Document;" the class loader (instance of org/apache/hadoop/util/ApplicationClassLoader) of the current class, org/apache/xerces/jaxp/DocumentBuilderImpl, and its superclass loader (instance of <bootloader>), have different Class objects for the type org/w3c/dom/Document used in the signature
at org.apache.xerces.jaxp.DocumentBuilderFactoryImpl.newDocumentBuilder(Unknown Source)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2621)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2583)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2489)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:1270)
at org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider.getRecordFactory(RecordFactoryProvider.java:49)
at org.apache.hadoop.mapreduce.TypeConverter.<clinit>(TypeConverter.java:62)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:523)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:511)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1614)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:511)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:301)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1572)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1569)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1502)
2018-11-05 13:55:34,681 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1

错误二:jackson版本冲突

2018-11-05 14:54:25,477 ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.VerifyError: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at com.fasterxml.jackson.datatype.guava.GuavaModule.setupModule(GuavaModule.java:22)
at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:524)
at io.druid.jackson.DefaultObjectMapper.<init>(DefaultObjectMapper.java:47)
at io.druid.jackson.DefaultObjectMapper.<init>(DefaultObjectMapper.java:35)
at io.druid.jackson.JacksonModule.jsonMapper(JacksonModule.java:46)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.google.inject.internal.ProviderMethod.get(ProviderMethod.java:104)
at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
at com.google.inject.Scopes$1$1.get(Scopes.java:65)
at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:54)
at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:84)
at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
at com.google.inject.Scopes$1$1.get(Scopes.java:65)
at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
at com.google.inject.internal.SingleMethodInjector.inject(SingleMethodInjector.java:83)
at com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:110)
at com.google.inject.internal.MembersInjectorImpl$1.call(MembersInjectorImpl.java:75)
at com.google.inject.internal.MembersInjectorImpl$1.call(MembersInjectorImpl.java:73)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
at com.google.inject.internal.MembersInjectorImpl.injectAndNotify(MembersInjectorImpl.java:73)
at com.google.inject.internal.Initializer$InjectableReference.get(Initializer.java:147)
at com.google.inject.internal.Initializer.injectAll(Initializer.java:92)
at com.google.inject.internal.InternalInjectorCreator.injectDynamically(InternalInjectorCreator.java:173)
at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:109)
at com.google.inject.Guice.createInjector(Guice.java:95)
at com.google.inject.Guice.createInjector(Guice.java:72)
at io.druid.guice.GuiceInjectors.makeStartupInjector(GuiceInjectors.java:60)
at io.druid.indexer.HadoopDruidIndexerConfig.<clinit>(HadoopDruidIndexerConfig.java:106)
at io.druid.indexer.HadoopDruidIndexerMapper.setup(HadoopDruidIndexerMapper.java:51)
at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.setup(DetermineHashedPartitionsJob.java:225)
at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.run(DetermineHashedPartitionsJob.java:280)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

解决办法

(可选)下载与本地版本对应的hadoop客户端

默认的${DRUID_HOME}/hadoop-dependencies/hadoop-client下的版本为2.7.3,如果想与自己的Hadoop版本匹配,则可以执行下面的命令:

java -classpath "lib/*" io.druid.cli.Main tools pull-deps -r https://repository.cloudera.com/content/repositories/releases/ --clean -h org.apache.hadoop:hadoop-client:2.6.0-cdh5.14.0

注意:下载的时候会清理掉${DRUID_HOME}/hadoop-dependencies/hadoop-client和${DRUID_HOME}/extensions两个目录,注意备份

修改spec

{
    "type": "index_hadoop",
    "spec": {
        "dataSchema": {
            "dataSource": "cxy_wikiticker3",
            "parser": {
                "type": "hadoopyString",
                "parseSpec": {
                    "format": "json",
                    "dimensionsSpec": {
                        "dimensions": [
                            "channel",
                            "cityName",
                            "comment",
                            "countryIsoCode",
                            "countryName",
                            "isAnonymous",
                            "isMinor",
                            "isNew",
                            "isRobot",
                            "isUnpatrolled",
                            "metroCode",
                            "namespace",
                            "page",
                            "regionIsoCode",
                            "regionName",
                            "user",
                            {
                                "name": "delta",
                                "type": "long"
                            },
                            {
                                "name": "added",
                                "type": "long"
                            },
                            {
                                "name": "deleted",
                                "type": "long"
                            }
                        ]
                    },
                    "timestampSpec": {
                        "format": "auto",
                        "column": "time"
                    }
                }
            },
            "metricsSpec": [],
            "granularitySpec": {
                "type": "uniform",
                "segmentGranularity": "day",
                "queryGranularity": "none",
                "intervals": [
                    "2015-09-12/2015-09-13"
                ],
                "rollup": false
            }
        },
        "ioConfig": {
            "type": "hadoop",
            "inputSpec": {
                "type": "static",
                "paths": "/work/dc/examples/wikiticker-2015-09-12-sampled.json.gz"
            }
        },
        "tuningConfig": {
            "type": "hadoop",
            "partitionsSpec": {
                "type": "hashed",
                "targetPartitionSize": 5000000
            },
            "ignoreInvalidRows": true,
            "jobProperties" : {
                "mapreduce.job.user.classpath.first": "true"
            }
        }
    },
    "hadoopDependencyCoordinates": [
        "org.apache.hadoop:hadoop-client:2.6.0-cdh5.14.0"
    ]
}

重点在以下两个地方

image.png

测试

curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/hadoop.json http://hadoop.cxy7.com:8090/druid/indexer/v1/task

image.png

参考

https://github.com/vihag/druid-hacks/tree/master/xerces-hell/cdh-5.10.2

https://github.com/apache/incubator-druid/issues/2087