Kerberos enabled zookeeper

For enabling kerberos for zookeeper, we need to follow the below steps:

1) Setup external zookeeper.

2) create file conf/jaas.conf which will contain server keytab and principal.

Server {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/zookeeper/conf/zkpr.keytab"
storeKey=true
useTicketCache=false
principal="zookeeper/localhost@EXAMPLE.COM";
};

3) create file conf/java.env

export JVMFLAGS="-Djava.security.auth.login.config=/zookeeper/conf/jaas.conf"
export JAVA_HOME=${JAVA_HOME}

4) modify zonf/zoo.cfg

tickTime = 2000
dataDir = /zookeeper_data
clientPort = 2181
initLimit = 5
syncLimit = 2
authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
jaasLoginRenew=3600000

5) kinit with your principal and keytab:

kinit <PRINCIPAL> -k -t <PATH_TO_KEYTAB>

6) restart zookeeper

You are ready to use kerberos enabled zookeeper!!!

Thursday // // 0 comments //

Common Errors in setting up Hadoop/YARN with kerberos

Here are some of the errors you face while setting up kerberos enabled hadoop

1) Be sure to validate your ticket and keytab file.
Ticket Validation:

klist

Output:
Ticket cache: FILE:/tmp/krb5cc_1001
Default principal: zookeeper/localhost@EXAMPLE.COM

Valid starting Expires Service principal
2017-05-22T18:40:52 2017-05-23T04:40:52 krbtgt/EXAMPLE.COM@EXAMPLE.COM
renew until 2017-05-29T18:40:52

Keytab validation:

kinit <PRINCIPAL> -k -t <KEYTAB_PATH>

It will return success if your keytab is valid.

2) Caused by: javax.security.auth.login.LoginException: No key to store
at com.sun.security.auth.module.Krb5LoginModule.commit(Krb5LoginModule.java:1072)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:762)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:690)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:688)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:687)
at javax.security.auth.login.LoginContext.login(LoginContext.java:596)
at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.init(KerberosAuthenticationHandler.java:169)
... 24 more
2014-06-07 21:11:33,511 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2014-06-07 21:11:33,512 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

Cause: Ticket got expired.

Solution: rm -f /tmp/krb*

3) .keystore doesnot exist.

Cause: ssl is not correctly setup.Please follow the previous post

Solution: http://lxpert.blogspot.in/2017/05/setting-up-kerberos-enabled-hadoop.html

Monday // // 0 comments //

Setting up Kerberos enabled YARN

For YARN,add following properties in yarn-site.xml:



<property>
<name>yarn.resourcemanager.principal</name>
<value><PRINCIPAL></value>
</property>

<property>
<name>yarn.resourcemanager.keytab</name>
<value><KEYTAB_PATH></value>
</property>





<property>
<name>yarn.nodemanager.principal</name>
<value><PRINCIPAL></value>
</property>

<property>
<name>yarn.nodemanager.keytab</name>
<value><KEYTAB_PATH></value>
</property>





<property>
<name>yarn.web-proxy.keytab</name>
<value><KEYTAB_PATH></value>
</property>

<property>
<name>yarn.web-proxy.principal</name>
<value><PRINCIPAL></value>
</property>

// Labels: Big Data, Hadoop, Kerberos, YARN // 0 comments //

Setting up Kerberos enabled Hadoop

1) Add following properties in hdfs-site.xml


<property>
<name>dfs.namenode.keytab.file</name>
<value><KEYTAB_PATH></value> 
</property>
<property>
<name>dfs.namenode.kerberos.principal</name>
<value><PRINCIPAL></value>
</property>
<property>
<name>dfs.datanode.keytab.file</name>
<value><KEYTAB_PATH></value> 
</property>
<property>
<name>dfs.datanode.kerberos.principal</name>
<value><PRINCIPAL></value>
</property>


<property>
<name>dfs.secondary.namenode.keytab.file</name>
<value><KEYTAB_PATH></value>
</property>
<property>
<name>dfs.secondary.namenode.kerberos.principal</name>
<value><PRINCIPAL></value>
</property>


<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:1025</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:1027</value>
</property>
<property>
<name>dfs.data.transfer.protection</name>
<value>authentication</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.http.policy</name>
<value>HTTPS_ONLY</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value><PRINCIPAL></value>
</property>
<property>
<name>dfs.web.authentication.kerberos.keytab</name>
<value><KEYTAB_PATH></value> 
</property>
<property>
<name>dfs.namenode.kerberos.internal.spnego.principal</name>
<value>${dfs.web.authentication.kerberos.principal}</value>
</property>
<property>
<name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>
<value>>${dfs.web.authentication.kerberos.principal}</value>
</property>

2) Add following properties in core-site.xml:

<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value> 
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>

Now, we have create ssl configurations as in kerberos enabled hadoop it doesnt work with jvc.
Run following commands for the above:

keytool -genkey -keyalg RSA -alias c6401 -keystore /tmp/keystore.jks -storepass bigdata -validity 360 -keysize 2048

3) create ssl-client.xml in etc/hadoop:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>
<name>ssl.client.truststore.location</name>
<value>/tmp/keystore.jks</value>
<description>Truststore to be used by clients like distcp. Must be
specified.
</description>
</property>

<property>
<name>ssl.client.truststore.password</name>
<value>bigdata</value>
<description>Optional. Default value is "".
</description>
</property>

<property>
<name>ssl.client.truststore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>

<property>
<name>ssl.client.truststore.reload.interval</name>
<value>10000</value>
<description>Truststore reload check interval, in milliseconds.
Default value is 10000 (10 seconds).
</description>
</property>

<property>
<name>ssl.client.keystore.location</name>
<value>/tmp/keystore.jks</value>
<description>Keystore to be used by clients like distcp. Must be
specified.
</description>
</property>

<property>
<name>ssl.client.keystore.password</name>
<value>bigdata</value>
<description>Optional. Default value is "".
</description>
</property>

<property>
<name>ssl.client.keystore.keypassword</name>
<value>bigdata</value>
<description>Optional. Default value is "".
</description>
</property>

<property>
<name>ssl.client.keystore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>

</configuration>

4) create ssl-server.xml at the same path

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>
<name>ssl.server.truststore.location</name>
<value>/tmp/keystore.jks</value>
<description>Truststore to be used by NN and DN. Must be specified.
</description>
</property>

<property>
<name>ssl.server.truststore.password</name>
<value>bigdata</value>
<description>Optional. Default value is "".
</description>
</property>

<property>
<name>ssl.server.truststore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>

<property>
<name>ssl.server.truststore.reload.interval</name>
<value>10000</value>
<description>Truststore reload check interval, in milliseconds.
Default value is 10000 (10 seconds).
</description>
</property>

<property>
<name>ssl.server.keystore.location</name>
<value>/tmp/keystore.jks</value>
<description>Keystore to be used by NN and DN. Must be specified.
</description>
</property>

<property>
<name>ssl.server.keystore.password</name>
<value>bigdata</value>
<description>Must be specified.
</description>
</property>

<property>
<name>ssl.server.keystore.keypassword</name>
<value>bigdata</value>
<description>Must be specified.
</description>
</property>

<property>
<name>ssl.server.keystore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>

</configuration>

// Labels: Big Data, Hadoop, Kerberos // 0 comments //

HDFS

HDFS typically stands for Hadoop distributed file system.As compared to traditional RDBMS, HDFS follows the distributed approach and has the following advantages:

1) Data doesn't need to be centralized. However,it is distributed across network.As per traditional RDBMS, data needs to be first accumulated at one place with the help of relational tables.In HDFS, there is no need to accumulate data at one place.Instead,job is split and executed parallely at different nodes where data is stored.

2) Fault tolerant: In HDFS, data gets replicated depending on the replication factor.By deault, it is 3.

3) HDFS works well with structured as well as unstructured data while RDBMS is designed only for structured data.

4) RDMS is by and large utilized for OLTP processing while Hadoop is right now utilized for logical and particularly for BIG DATA handling.

Example:
Plotting a month to month electricity use of a client by looking at between earlier months, between his or her neighbors or even between clients on the same lanes. This will bring more mindfulness, however running such complex correlation by dissecting vast arrangement of information takes a few hours of preparing time, and introduction of Hadoop help enhancing the processing execution from 10 times to 100 times or more.

Saturday // Labels: Big Data // 0 comments //

Introduction to Big Data

Big Data as the name implies is a technology which deals with huge amount of data.While learning about big data,the following question arises:

1)What is Big data?
2)What are the sources of such huge amount of data?
3)Why the need arises now for big data technologies?
4)What are the different kinds of Big Data Technologies?

Lets find the answers for these questions:

1) Big Data is a accepted appellation acclimated to call the exponential advance and availability of data, both structured and unstructured. And big Data may be as important to business – and association – as the Internet has become.Big Data is defined on the basis of three Vs:

Volume-Many factors are responsible for huge amounts of data.Data from satellites,social media,GPS navigation system are the major sources of big data.The NY stock exchange generates around one terabytes of data per day.facebook itself generates 10 billion photos and many more.
Velocity-Data is alive in at aberrant acceleration and have to be dealt with in a appropriate manner. RFID tags, sensors and acute metering are active the charge to accord with torrents of data in near-real time. Reacting bound abundant to accord with data acceleration is a claiming for a lot of organizations.
Variety-Data today comes in all types of formats. Structured, numeric data in acceptable databases. Information created from line-of-business applications. Unstructured argument documents, email, video, audio, banal ticker data and banking transactions. Managing, amalgamation and administering altered varieties of data is something abounding organizations still attack with.

2) Any huge amount of data can be as source of big data.Suppose,the pictures and media you take from your phone and put over cloud.Media saved on social web.Data generated from stock markets and many more contribute to the big data.

3) Big data is basically used for analytics.For dealing with such huge amount of data we need big data technologies.For storing huge amount amount of data,insteaf of taking one large machine we can use commodity servers.Many other factors contribute for the need of Big Data technologies which we are going to discuss later.

Various types of big data technologies are as Hadoop,Hbase,Hive,NoSQL etc.Lets take each technology separetely.

Wednesday // Labels: Big Data // 0 comments //

TRANSACTION CONTROL LANGUAGE

TCL also known as Transaction Control Language.It is used to manage different transactions occurring within a database.Various statements included under this category are:

1) COMMIT
2) ROLLBACK

COMMIT:
This is used to commit or saving the changes permanently you have made.Syntax:

COMMIT;

ROLLBACK:

It is used to discard the changes you have made.Syntax:

ROLLBACK;

IMPORTANT POINTS:

1) In case of Mysql,always use START TRANSACTION at the beginning.

2) When you use commit/rollback.It will commit/rollback the changes you have made after writing START TRANSACTION.

Sunday // // 0 comments //

Programming Languages World