Integrations
Breadcrumbs

Generic JDBC - Hive

Standalone Hive

Support

Support with hive 2.3.9.

Configuration

Vendor

HIVE_APACHE

Connection format

The connection URL format for direct connection to hive server is jdbc:hive2://<databse_host>:<database_port>/<database>;principal=<principal>;ssl=true.

The connection URL format via HTTP is jdbc:hive2://<databse_host>:<database_port>/<database>;principal=<principal>;ssl=true;transportMode=http;httpPath=cliservice.

Permissions

User or role with SELECT permissions on the tables or views for which data sampling or metadata extraction is desired.


Hive in Cloudera environment

Support

Support in cdh environments from 6.0 to 6.3 or cdp 7.0 and 7.1 with hive 1.0 up to 3.1.

Configuration

Vendor

HIVE_CLOUDERA

Connection format

The connection URL format for direct connection to hive server is jdbc:hive2://<databse_host>:<database_port>/<database>;principal=<principal>;ssl=true.

The connection URL format via HTTP is jdbc:hive2://<databse_host>:<database_port>/<database>;principal=<principal>;ssl=true;transportMode=http;httpPath=cliservice.

The connection URL format via zookeeper is jdbc:hive2://<databse_host>:<database_port>/<database>;principal=<principal>;ssl=true;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2.

Permissions

User or role with SELECT permissions on the tables or views for which data sampling or metadata extraction is desired.


Specific adjustments for Hive

To execute JDBC queries against Hive, a series of adjustments must be made to the JVM

  • Use of Kerberos authorization

  • SSL

JVM SSL adjustments

Cloudera needs to secure connections with SSL to be able to use Kerberos. The default solution it uses is to deploy its own self-signed CA in a JKS container using properties analogous to these:

Bash
-Djavax.net.ssl.keyStore=/....../....../cm-auto-host_keystore.jks
-Djavax.net.ssl.keyStoreType=jks -Djavax.net.ssl.trustStore=/....../...../cm-auto-global_truststore.jks
-Djavax.net.ssl.trustStoreType=jks

There are two options

  • That the paths are provided to the integrator by the Client

  • Adding the Cloudera self-signed CA to the JVM's default certificate container

Provided paths

The plugin's JVM must be executed with the following parameters:

Parameter

Value

-Djavax.net.ssl.keyStore

Path where the private key store is located.

-Djavax.net.ssl.keyStoreType

jks

-Djavax.net.ssl.trustStore

Path where the certificate store is located.

-Djavax.net.ssl.trustStoreType

jks


Adding the self-signed CA to the JVM certificate container

The goal is to add the Cloudera self-signed certificate to Java's cacerts

keytool -importkeystore -srckeystore /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks  -destkeystore /etc/ssl/certs/java/cacerts -srcstorepass -deststorepass changeit -v

This approach has the advantage of not requiring additional parameters to be added to the JVM.

Kerberos adjustments for the JVM

It is necessary to provide the JVM with the mechanism for the automatic use of Kerberos tickets.

To do this we must launch the JVM with the following parameters.

  • -Djava.security.auth.login.config=security.config

  • -Djavax.security.auth.useSubjectCredsOnly=false

With the security.config file having the following structure:

com.sun.security.jgss.initiate {
   com.sun.security.auth.module.Krb5LoginModule required
   useTicketCache=true
   StoreKey=true
   useKeyTab=true
   keyTab="/home/ubuntu/usr_anjana.keytab"
   principal="usr_anjana@CDP.LOCAL";
};

The keytab and principal parameters must be supplied to us

Common errors

If the Hive tables are encrypted it returns this error log

No Route to Host from ip-10-202-21-147.eu-west-1.compute.internal/10.202.21.147 to ip-10-202-21-122.eu-west-1.compute.internal:8020 failed on socket timeout exception: java.net.NoRouteToHostException