BigData/Hadoop

UBUNTU HADOOP 2.2.0 설치

sidcode 2013. 11. 14. 23:14

우분투 13.10 에 하둡 2.2.0 설치 single-node setup


0. 하둡 다운로드 및 유저 설치 디렉토리로 이동 시키기

   $ wget http://mirror.apache-kr.org/hadoop/common/stable/hadoop-2.2.0.tar.gz

      다운로드 후 

   $ sudo tar xvzf hadoop-2.2.0.tar.gz && sudo cp -r hadoop-2.2.0 /usr/local/hadoop


1. JDK 7.0  설치 

  - hadoop 을 돌리기위하여 설치

$ sudo apt-get install openjdk-7-jdk


2. 계정생성 

  - 하둡 관련 그룹및 계정을 생성하여 관리의 용의성 등을 활용 하기 위함  

  - 그룹 : hadoop, 계정 : hadoop

$ sudo addgroup hadoop && sudo adduser --ingroup hadoop hadoop 


hadoop 계정과 그룹을 등록하였으니 이제 압축해제 하여 옮겨둔 하둡 디렉토리의 권한 설정

$sudo chown -R hadoop:hadoop /usr/local/hadoop


3. ssh 설치

 - hadoop ssh 접속관리를 위해 설치 후 재실행

$ sudo apt-get install openssh-server && sudo /etc/init.d/ssh restart

혹신 방화벽 문제가.. 엉뚱하게도 생기시는 분은

$ ufw allow 22/tcp


4. export 설정

 - 접속시 마다 export해서 사용하기 불편함 해소

$ su - hadoop

 

$ vi ~/.profile 

  아래 내용 등록 

 

# JDK 7.0

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/jre


# hadoop home

export HADOOP_HOME=/usr/local/hadoop


# hadoop bin

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin


저장후 아래 명령어를 이용하여 로드 

$ source ~/.profile


확인

hadoop@sidcode-worlds:~$ echo $JAVA_HOME  -  $HADOOP_HOME  -  $PATH

/usr/lib/jvm/java-7-openjdk-amd64/jre - /usr/local/hadoop - /usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/local/hadoop/bin





5. ssh 자동 로그인을 위하여 rsa 키 만들기 

hadoop@sidcode-worlds:~$ ssh-keygen -t rsa -P ""

Generating public/private rsa key pair.

Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):          (엔터)

Created directory '/home/hadoop/.ssh'.

Your identification has been saved in /home/hadoop/.ssh/id_rsa.

Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.

The key fingerprint is:

d6:d3:a1:2a:ae:d1:94:ad:00:f8:6e:a1:5e:f9:31:bb hadoop@sidcode-worlds

The key's randomart image is:

+--[ RSA 2048]----+

|                 |

| .               |

|. .         .    |

| . .   o . o .   |

|  o . o S + .    |

| o ..+ o . .     |

|. oo.o+ .        |

|... .o+.         |

| .  .Eo          |

+-----------------+


생성후 인증키 복사 ~/.ssh/authorized_keys 가 존재 할지도 모르니 cat >> 이용 

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && chmod 755 ~/.ssh/authorized_keys


접속 테스트 

hadoop@sidcode-worlds:~$ ssh hadoop@localhost

The authenticity of host 'localhost (127.0.0.1)' can't be established.

ECDSA key fingerprint is 키값(삭제함).

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.


The programs included with the Ubuntu system are free software;

the exact distribution terms for each program are described in the

individual files in /usr/share/doc/*/copyright.


Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by

applicable law.


hadoop@sidcode-worlds:~$ 




6. hadoop 설정

$  vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh 


아래 내용 추가 및 변경

# The java implementation to use.

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64


# The jsvc implementation to use. Jsvc is required to run secure datanodes.

#export JSVC_HOME=${JSVC_HOME}


#export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}

export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop



7.  하둡 tmp 만들기 

 - chmod 는 생각하시는 보안정책에 따라 변경 하세요. (core-site.xml, HDFS 기본 템프럴리 디렉토리) 위치는 원하시는곳에

mkdir -p $HADOOP_HOME/tmp && chmod 750 $HADOOP_HOME/tmp



8. *-site.xml 수정

core-site.xml 수정

$ vi /usr/local/hadoop/etc/hadoop/core-site.xml

 아래내용 추가, 하둡코어 환경설정(HDFS, MapReDuce을 위한)포트는 원하는 범위를 잡으세요.

<configuration>

 <property>

  <name>hadoop.tmp.dir</name>

  <value>/usr/local/hadoop/tmp</value>

 </property>


 <property>

  <name>fs.default.name</name>

  <value>hdfs://localhost:37050</value>

 </property>

</configuration>



hdfs-site.xml수정

$ vi /usr/local/hadoop/etc/hadoop/hdfs-site.xml

아래내용 추가, HDFS 데몬환경설정( 데이터 노드랑 네임노드 설정) 

 <property>

  <name>dfs.name.dir</name>

  <value>/usr/local/hadoop/dfs/name</value>

 </property>

 

 <property>

  <name>dfs.name.edits.dir</name>

  <value>${dfs.name.dir}</value>

 </property>

 

 <property>

  <name>dfs.data.dir</name>

  <value>/usr/local/hadoop/dfs/data</value>

 </property>



mapred-site.xml 수정 (템플릿만 존재하므로 복사)

$ cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.tem* /usr/local/hadoop/etc/hadoop/mapred-site.xml

$ vi /usr/local/hadoop/etc/hadoop/hdfs-site.xml

아래 추가, MapReduce데몬 환경설정)job tracker

<configuration>

 <property>

  <name>mapred.job.tracker</name>

  <value>hdfs://localhost:37051</value>

 </property>


$  vi /usr/local/hadoop/etc/hadoop/mapred-env.sh
아래 추가
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

$  vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh
아래 추가
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64



9. namenode 포멧

* 꼭, 네임노드 포멧 실행하실경우 신중을 기하세요.

* 처음 설치시 실행해주세요. 왜냐면 네임노드 사용중인데 또 사용하시면 말그대로 포멧됨

$ /usr/local/hadoop/bin/hadoop namenode -format



10.실행, 확인 및 종료 

* 실행

$ /usr/local/hadoop/sbin/start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

13/11/15 00:15:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

13/11/15 00:15:21 WARN fs.FileSystem: "localhost:37050" is a deprecated filesystem name. Use "hdfs://localhost:37050/" instead.

Starting namenodes on [localhost]

localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-sidcode-worlds.out

localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-sidcode-worlds.out

Starting secondary namenodes [0.0.0.0]

The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.

ECDSA key fingerprint is 키값(삭제함)

Are you sure you want to continue connecting (yes/no)? yes

0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.

0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-sidcode-worlds.out

13/11/15 00:15:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

starting yarn daemons

starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-sidcode-worlds.out

localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-sidcode-worlds.out

hadoop@sidcode-worlds:/usr/local/hadoop/sbin$ 


*확인

hadoop@sidcode-worlds:/usr/local/hadoop/etc/hadoop$ jps

21433 ResourceManager

21566 NodeManager

21280 SecondaryNameNode

21882 Jps

20946 NameNode

21096 DataNod


웹 환경 에서 데몬 확인



- 네임노드

http://localhost:50070/



* 종료

$ /usr/local/hadoop/sbin/stop-all.sh




PS. 샘플데이터 추가하여 세팅방법은 다음 편에.. 출근해야하는데.. 12시 47분까지.. 이러구잇다.. 미쵸..