SSH Configuration - Hadoop

The Hadoop control scripts rely on SSH to perform cluster-wide operations. For example, there is a script for stopping and starting all the daemons in the cluster. Note that the control scripts are optional cluster-wide operations can be performed by other mechanisms, too (such as a distributed shell).

To work seamlessly, SSH needs to be set up to allow password-less login for the hadoop user from machines in the cluster. The simplest way to achieve this is to generate a public/private key pair, and it will be shared across the cluster using NFS.

First, generate an RSA key pair by typing the following in the hadoop user account:

Even though we want password-less logins, keys without passphrases are not considered good practice (it’s OK to have an empty passphrase when running a local pseudodistributed cluster, as described in Appendix A), so we specify a passphrase when prompted for one. We shall use ssh-agent to avoid the need to enter a password for each connection.

The private key is in the file specified by the -f option, ~/.ssh/id_rsa, and the public key is stored in a file with the same name with .pub appended, ~/.ssh/id_rsa.pub.

Next we need to make sure that the public key is in the ~/.ssh/authorized_keys file on all the machines in the cluster that we want to connect to. If the hadoop user’s home directory is an NFS filesystem, as described earlier, then the keys can be shared across the cluster by typing:

If the home directory is not shared using NFS, then the public keys will need to be shared by some other means.

Test that you can SSH from the master to a worker machine by making sure sshagent is running, and then run ssh-add to store your passphrase. You should be able to ssh to a worker without entering the passphrase again.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Hadoop Topics