For enabling kerberos for zookeeper, we need to follow the below steps:

1) Setup external zookeeper.

2) create file conf/jaas.conf which will contain server keytab and principal.

Server {
  com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=true
  keyTab="/zookeeper/conf/zkpr.keytab"
  storeKey=true
  useTicketCache=false
  principal="zookeeper/localhost@EXAMPLE.COM";
};

3) create file conf/java.env

export JVMFLAGS="-Djava.security.auth.login.config=/zookeeper/conf/jaas.conf"
export JAVA_HOME=${JAVA_HOME}

4) modify zonf/zoo.cfg

tickTime = 2000
dataDir = /zookeeper_data
clientPort = 2181
initLimit = 5
syncLimit = 2
authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
jaasLoginRenew=3600000

5) kinit with your principal and keytab:

kinit <PRINCIPAL> -k -t <PATH_TO_KEYTAB>

6) restart zookeeper



You are ready to use kerberos enabled zookeeper!!!

Read more

Here are some of the errors you face while setting up kerberos enabled hadoop

1) Be sure to validate your ticket and keytab file.
Ticket Validation:

klist

Output:
Ticket cache: FILE:/tmp/krb5cc_1001
Default principal: zookeeper/localhost@EXAMPLE.COM

Valid starting       Expires              Service principal
2017-05-22T18:40:52  2017-05-23T04:40:52  krbtgt/EXAMPLE.COM@EXAMPLE.COM
renew until 2017-05-29T18:40:52

Keytab validation:

kinit <PRINCIPAL> -k -t <KEYTAB_PATH>

It will return success if your keytab is valid.

2) Caused by: javax.security.auth.login.LoginException: No key to store
 at com.sun.security.auth.module.Krb5LoginModule.commit(Krb5LoginModule.java:1072)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at javax.security.auth.login.LoginContext.invoke(LoginContext.java:762)
 at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203)
 at javax.security.auth.login.LoginContext$4.run(LoginContext.java:690)
 at javax.security.auth.login.LoginContext$4.run(LoginContext.java:688)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:687)
 at javax.security.auth.login.LoginContext.login(LoginContext.java:596)
 at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.init(KerberosAuthenticationHandler.java:169)
 ... 24 more
2014-06-07 21:11:33,511 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2014-06-07 21:11:33,512 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

Cause: Ticket got expired.

Solution: rm -f /tmp/krb*

3) .keystore doesnot exist.

Cause: ssl is not correctly setup.Please follow the previous post

Solution: http://lxpert.blogspot.in/2017/05/setting-up-kerberos-enabled-hadoop.html


Read more

For YARN,add following properties in yarn-site.xml:

<!-- resource manager secure configuration info -->

<property>
  <name>yarn.resourcemanager.principal</name>
  <value><PRINCIPAL></value>
</property>

<property>
  <name>yarn.resourcemanager.keytab</name>
  <value><KEYTAB_PATH></value>
</property>

<!-- remember the principal for the node manager is the principal for the host this yarn-site.xml file is on -->

<!-- these (next four) need only be set on node manager nodes -->

<property>
  <name>yarn.nodemanager.principal</name>
  <value><PRINCIPAL></value>
</property>

<property>
  <name>yarn.nodemanager.keytab</name>
  <value><KEYTAB_PATH></value>
</property>

<!--<property>
  <name>yarn.nodemanager.container-executor.class</name>
  <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
</property>

<property>
  <name>yarn.nodemanager.linux-container-executor.group</name>
  <value>yarn</value>
</property> -->

<!-- OPTIONAL - set these to enable secure proxy server node -->

<property>
  <name>yarn.web-proxy.keytab</name>
  <value><KEYTAB_PATH></value>
</property>

<property>
  <name>yarn.web-proxy.principal</name>
  <value><PRINCIPAL></value>
</property>
<!--<property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
</property>

<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property> -->

Read more

1) Add following properties in hdfs-site.xml

<!-- NameNode security config -->
<property>
  <name>dfs.namenode.keytab.file</name>
  <value><KEYTAB_PATH></value> <!-- path to the HDFS keytab -->
</property>
<property>
  <name>dfs.namenode.kerberos.principal</name>
  <value><PRINCIPAL></value>
</property>
<property>
  <name>dfs.datanode.keytab.file</name>
  <value><KEYTAB_PATH></value> <!-- path to the HDFS keytab -->
</property>
<property>
  <name>dfs.datanode.kerberos.principal</name>
  <value><PRINCIPAL></value>
</property>

<!---Secondary NameNode config-->
<property>
<name>dfs.secondary.namenode.keytab.file</name>
<value><KEYTAB_PATH></value>
</property>
<property>
<name>dfs.secondary.namenode.kerberos.principal</name>
<value><PRINCIPAL></value>
</property>

<!---DataNode config-->
<property>
  <name>dfs.block.access.token.enable</name>
  <value>true</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:1025</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:1027</value>
</property>
<property>
<name>dfs.data.transfer.protection</name>
<value>authentication</value>
</property>
<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>
<property>
  <name>dfs.http.policy</name>
  <value>HTTPS_ONLY</value>
</property>
<property>
  <name>dfs.web.authentication.kerberos.principal</name>
  <value><PRINCIPAL></value>
</property>
<property>
  <name>dfs.web.authentication.kerberos.keytab</name>
  <value><KEYTAB_PATH></value> <!-- path to the HTTP keytab -->
</property>
<property>
        <name>dfs.namenode.kerberos.internal.spnego.principal</name>
        <value>${dfs.web.authentication.kerberos.principal}</value>        
</property>
<property>
        <name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>
        <value>>${dfs.web.authentication.kerberos.principal}</value>        
</property>

2) Add following properties in core-site.xml:

<property>
  <name>hadoop.security.authentication</name>
  <value>kerberos</value> <!-- A value of "simple" would disable security. -->
</property>
<property>
  <name>hadoop.security.authorization</name>
  <value>true</value>
</property>

Now, we have create ssl configurations as in kerberos enabled hadoop it doesnt work with jvc.
Run following commands for the above:

keytool -genkey -keyalg RSA -alias c6401 -keystore /tmp/keystore.jks -storepass bigdata -validity 360 -keysize 2048

3) create ssl-client.xml in etc/hadoop:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
-->
<configuration>

<property>
  <name>ssl.client.truststore.location</name>
  <value>/tmp/keystore.jks</value>
  <description>Truststore to be used by clients like distcp. Must be
  specified.
  </description>
</property>

<property>
  <name>ssl.client.truststore.password</name>
  <value>bigdata</value>
  <description>Optional. Default value is "".
  </description>
</property>

<property>
  <name>ssl.client.truststore.type</name>
  <value>jks</value>
  <description>Optional. The keystore file format, default value is "jks".
  </description>
</property>

<property>
  <name>ssl.client.truststore.reload.interval</name>
  <value>10000</value>
  <description>Truststore reload check interval, in milliseconds.
  Default value is 10000 (10 seconds).
  </description>
</property>

<property>
  <name>ssl.client.keystore.location</name>
  <value>/tmp/keystore.jks</value>
  <description>Keystore to be used by clients like distcp. Must be
  specified.
  </description>
</property>

<property>
  <name>ssl.client.keystore.password</name>
  <value>bigdata</value>
  <description>Optional. Default value is "".
  </description>
</property>

<property>
  <name>ssl.client.keystore.keypassword</name>
  <value>bigdata</value>
  <description>Optional. Default value is "".
  </description>
</property>

<property>
  <name>ssl.client.keystore.type</name>
  <value>jks</value>
  <description>Optional. The keystore file format, default value is "jks".
  </description>
</property>

</configuration>

4) create ssl-server.xml at the same path

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
-->
<configuration>

<property>
  <name>ssl.server.truststore.location</name>
  <value>/tmp/keystore.jks</value>
  <description>Truststore to be used by NN and DN. Must be specified.
  </description>
</property>

<property>
  <name>ssl.server.truststore.password</name>
  <value>bigdata</value>
  <description>Optional. Default value is "".
  </description>
</property>

<property>
  <name>ssl.server.truststore.type</name>
  <value>jks</value>
  <description>Optional. The keystore file format, default value is "jks".
  </description>
</property>

<property>
  <name>ssl.server.truststore.reload.interval</name>
  <value>10000</value>
  <description>Truststore reload check interval, in milliseconds.
  Default value is 10000 (10 seconds).
  </description>
</property>

<property>
  <name>ssl.server.keystore.location</name>
  <value>/tmp/keystore.jks</value>
  <description>Keystore to be used by NN and DN. Must be specified.
  </description>
</property>

<property>
  <name>ssl.server.keystore.password</name>
  <value>bigdata</value>
  <description>Must be specified.
  </description>
</property>

<property>
  <name>ssl.server.keystore.keypassword</name>
  <value>bigdata</value>
  <description>Must be specified.
  </description>
</property>

<property>
  <name>ssl.server.keystore.type</name>
  <value>jks</value>
  <description>Optional. The keystore file format, default value is "jks".
  </description>
</property>

</configuration>

Read more

HDFS typically stands for Hadoop distributed file system.As compared to traditional RDBMS, HDFS follows the distributed approach and has the following advantages:

1)  Data doesn't need to be centralized. However,it is distributed across network.As per traditional RDBMS, data needs to be first accumulated at one place with the help of relational tables.In HDFS, there is no need to accumulate data at one place.Instead,job is split and executed parallely at different nodes where data is stored.

2) Fault tolerant: In HDFS, data gets replicated depending on the replication factor.By deault, it is 3.

3) HDFS works well with structured as well as unstructured data while RDBMS is designed only for structured data.

4) RDMS is by and large utilized for OLTP processing while Hadoop is right now utilized for logical and particularly for BIG DATA handling.

Example:
Plotting a month to month electricity use of a client by looking at between earlier months, between his or her neighbors or even between clients on the same lanes. This will bring more mindfulness, however running such complex correlation by dissecting vast arrangement of information takes a few hours of preparing time, and introduction of Hadoop help enhancing the processing execution from 10 times to 100 times or more.

Read more

Big Data as the name implies is a technology which deals with huge amount of data.While learning about big data,the following question arises:

1)What is Big data?
2)What are the sources of such huge amount of data?
3)Why the need arises now for big data technologies?
4)What are the different kinds of Big Data Technologies?

Lets find the answers for these questions:

1) Big Data is a accepted appellation acclimated to call the exponential advance and availability of data, both structured and unstructured. And big Data may be as important to business – and association – as the Internet has become.Big Data is defined on the basis of three Vs:

Volume-Many factors are responsible for huge amounts of data.Data from satellites,social media,GPS navigation system are the major sources of big data.The NY stock exchange generates around one terabytes of data per day.facebook itself generates 10 billion photos and many more.
Velocity-Data is alive in at aberrant acceleration and have to be dealt with in a appropriate manner. RFID tags, sensors and acute metering are active the charge to accord with torrents of data in near-real time. Reacting bound abundant to accord with data acceleration is a claiming for a lot of organizations.
Variety-Data today comes in all types of formats. Structured, numeric data in acceptable databases. Information created from line-of-business applications. Unstructured argument documents, email, video, audio, banal ticker data and banking transactions. Managing, amalgamation and administering altered varieties of data is something abounding organizations still attack with.

2) Any huge amount of data can be as source of big data.Suppose,the pictures and media you take from your phone and put over cloud.Media saved on social web.Data generated from stock markets and many more contribute to the big data.

3) Big data is basically used for analytics.For dealing with such huge amount of data we need big data technologies.For storing huge amount amount of data,insteaf of taking one large machine we can use commodity servers.Many other factors contribute for the need of Big Data technologies which we are going to discuss later.

Various types of big data technologies are as Hadoop,Hbase,Hive,NoSQL etc.Lets take each technology separetely.

Read more

TCL also known as Transaction Control Language.It is used to manage different transactions occurring within a database.Various statements included under this category are:

1) COMMIT
2) ROLLBACK

COMMIT:
This is used to commit or saving the changes permanently you have made.Syntax:

COMMIT;

ROLLBACK:
It is used to discard the changes you have made.Syntax:

ROLLBACK;

IMPORTANT POINTS:
1) In case of Mysql,always use START TRANSACTION at the beginning.
2) When you use commit/rollback.It will commit/rollback the changes you have made after writing START TRANSACTION.

Read more

Powered by Blogger.