How to Install Hadoop on Ubuntu 18.04?

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware.  Most of the beginner feels difficulty to install Hadoop in their Ubuntu system.

Software Required:

  1. Java Virtual Machine(JDK-1.8)  Step to install JDk In Linux
  2. ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons. For installing ssh in Ubuntu Linux: $ sudo apt-get install ssh
  3.  Hadoop distribution file, download a recent stable release from one of the Apache Download Mirrors.

After downloading hadoop file unpack the file and make a change in etc/hadoop/hadoop-env.sh which includes defining the root of your Java installation as export JAVA_HOME=/home/java_folder_name . Now You can set Hadoop environment variables by appending the following line  in ~/.bashrc file as:

After editing the file, you need to save the .bashrc file and enter the command  source ~/.bashrc in home directory .

Configuration File Set Up:

We need Pseudo-Distributed Mode for running the operations. We need to edit configuration files as below:

etc/hadoop/core-site.xml:

core-site.xml Configuration

etc/hadoop/hdfs-site.xml:

hdfs-site.xml Configuration

etc/hadoop/mapred-site.xml:

mapred-site.xml Configuration

etc/hadoop/yarn-site.xml:

yarn-site.xml Configuration

Setup passphrase less ssh :

  1.  ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa
  2.  cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  3. chmod 0600 ~/.ssh/authorized_keys

Execution Step:

The following instructions are to run a MapReduce job locally. 

Format the filesystem:

  $ bin/hadoop namenode -format

Start NameNode daemon and DataNode daemon:( inside hadoop/sbin directory)

$ start-dfs.sh

To start all the configuration files use:

$ start-all.sh ( inside hadoop/sbin directory)

The Hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

Hadoop Basic Command:
Make the HDFS directories required to execute MapReduce jobs:

 hadoop fs  -mkdir /user

hadoop fs -mkdir -p  /user/<username>/<project_name>

Copy the input files into the distributed file system:

hadoop fs -put /user/<username>/<project_name>/ <input file>

You can check which demons are running in your machine using the command as:

$ jps

When you’re done, stop the daemons with:

 $ stop-dfs.sh ( inside hadoop/sbin directory)

Or to stop all the configurations use:

$ stop-all.sh ( inside hadoop/sbin directory)

 

Starting hadoop command and jps
Web interface:

Hadoop2 HDFS :  http://localhost:50070

Hadoop3 HDFS:  http://localhost:9870 

Hadoop Application:  http://localhost:8088

Job History : http://localhost:19888

Hadoop application Job Progress Port
HDFS port (hadoop 3)

If you stuck or get an error you can leave a comment.

About sgc908

Graduate Research Assistant at North Dakota State University, Precision Agriculture, Machine Learning, Deep Learning and Big Data.

View all posts by sgc908 →