How to Install Hadoop on Ubuntu 18.04?

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. Most of the beginner feels difficulty to install Hadoop in their Ubuntu system.

Software Required:

Java Virtual Machine(JDK-1.8) Step to install JDk In Linux
ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons. For installing ssh in Ubuntu Linux: $ sudo apt-get install ssh
Hadoop distribution file, download a recent stable release from one of the Apache Download Mirrors.

After downloading hadoop file unpack the file and make a change in etc/hadoop/hadoop-env.sh which includes defining the root of your Java installation as export JAVA_HOME=/home/java_folder_name . Now You can set Hadoop environment variables by appending the following line in ~/.bashrc file as:

After editing the file, you need to save the .bashrc file and enter the command source ~/.bashrc in home directory .

Configuration File Set Up:

We need Pseudo-Distributed Mode for running the operations. We need to edit configuration files as below:

etc/hadoop/core-site.xml:

etc/hadoop/hdfs-site.xml:

etc/hadoop/mapred-site.xml:

etc/hadoop/yarn-site.xml:

Setup passphrase less ssh :

ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

Execution Step:

The following instructions are to run a MapReduce job locally.

Format the filesystem:

$ bin/hadoop namenode -format

Start NameNode daemon and DataNode daemon:( inside hadoop/sbin directory)

$ start-dfs.sh

To start all the configuration files use:

$ start-all.sh ( inside hadoop/sbin directory)

The Hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

Hadoop Basic Command:

Make the HDFS directories required to execute MapReduce jobs:

hadoop fs -mkdir /user

hadoop fs -mkdir -p /user/<username>/<project_name>

Copy the input files into the distributed file system:

hadoop fs -put /user/<username>/<project_name>/ <input file>

You can check which demons are running in your machine using the command as:

$ jps

When you’re done, stop the daemons with:

$ stop-dfs.sh ( inside hadoop/sbin directory)

Or to stop all the configurations use:

$ stop-all.sh ( inside hadoop/sbin directory)

Web interface:

Hadoop2 HDFS : http://localhost:50070

Hadoop3 HDFS: http://localhost:9870

Hadoop Application: http://localhost:8088

Job History : http://localhost:19888

If you stuck or get an error you can leave a comment.