#title Apache Hadoop MapReduce

이전 위키 내용이 영어 발표자료를 그대로 옮겨 적어 논 것이라 처음 접하시는 분들에게 별 도움이 안될것 같아서 아파치 하둡의 공식 위키를 번역한 것입니다. 서브 메뉴들도 짬짬히 번역하겠습니다.

이 페이지의 원본은 [http://wiki.apache.org/lucene-hadoop/ Lucene-hadoop Wiki]입니다.

= 아파치 하둡 (Apache Hadoop) =
[http://hadoop.apache.org/ 아파치 하둡]은 다수의 저가 서버로 구성된 클러스터를 이용해서 어플리케이션을 실행하는 프레임워크(framework)이다. 하둡 프레임워크를 이용하면 어플리케이션은 손쉽게 신뢰성과 데이터 운동성을 확보할 수 있다. 하둡은 [http://wiki.apache.org/hadoop/HadoopMapReduce 맵/리듀스(Map/Reduce)]라는 계산 패러다임(paradigm)을 구현하는데, 맵/리듀스에서 한 어플리케이션은 클러스터 상의 임의의 한 서버에서 실행되는 작은 단위의 일 여러 개로 쪼개져서 실행된다. 추가로, 하둡 프레임워크는 [http://wiki.apache.org/hadoop/DFS HDFS]라는 분산 파일 시스템을 포함하고 있는데, 이는 계산 서버들에 데이터를 저장하면서 클러스터 전역에 걸쳐 우수한 성능을 보여준다. 맵/리듀스와 분산 파일 시스템 모두 클러스터 내의 일부 서버 고장에 대해 프레임워크가 자동으로 대처하도록 설계하였다. 

== 전반적인 정보 ==
 * [http://hadoop.apache.org/ 아파치 하둡 공식 웹사이트]: 다운로드, 버그 트랙킹, 메일링 리스트 등
 * [http://wiki.apache.org/hadoop/ProjectDescription 아파치 하둡 개요]
 * [http://wiki.apache.org/hadoop/FAQ 자주하는 질문들]
 * [http://wiki.apache.org/hadoop/HadoopIsNot 하둡에 관한 오해]
 * [http://wiki.apache.org/hadoop/Distribution 하둡 배포판]
 * 하둡 관련 [http://wiki.apache.org/hadoop/HadoopPresentations 발표자료], [http://wiki.apache.org/hadoop/Books 관련서적], [http://wiki.apache.org/hadoop/HadoopArticles 기사], [http://wiki.apache.org/hadoop/Papers 논문]
 * [http://wiki.apache.org/hadoop/PoweredBy 파워드바이]: 아파치 하둡을 이용하는 사이트와 어플리케이션의 목록 
 * 지원
  * [http://wiki.apache.org/hadoop/Help 하둡 커뮤니티] 
  * [http://wiki.apache.org/hadoop/Support 고용 가능한 인재와 기업]
 * 하둡 커뮤니티 이벤트와 학회
  * [http://wiki.apache.org/hadoop/HadoopUserGroups HadoopUserGroups (HUGs)] 
  * [http://wiki.apache.org/hadoop/HadoopSummit HadoopSummit]
 * [http://developer.yahoo.com/hadoop/tutorial/ Yahoo! 하둡 튜토리얼 ]: 하둡 설정, HDFS, 맵/리듀스를 포함하는 튜토리얼 
 * [http://www.cloudera.com/hadoop-training-basic Cloudera 온라인 하둡 교육]: 비디오 교육, 실습, 미리 설정된 [http://www.cloudera.com/hadoop-training-virtual-machine 가상 머신] 제공. 수업은 [http://www.cloudera.com/hadoop-training-programming-with-hadoop 하둡], [http://www.cloudera.com/hadoop-training-mapreduce-algorithms 맵/리듀스], [http://www.cloudera.com/hadoop-training-hive-introduction 하이브(Hive)], [http://www.cloudera.com/hadoop-training-pig-introduction 픽(Pig)] 등을 포함함

== 사용자 문서 ==
 * [http://wiki.apache.org/hadoop/ImportantConcepts 중요 개념] 
 * [http://wiki.apache.org/hadoop/GettingStartedWithHadoop 하둡 시작]
 * [http://wiki.apache.org/hadoop/QuickStart 속성반]
 * [http://hadoop.apache.org/core/docs/current/commands_manual.html 하둡 쉘 스크립트를 위한 컨멘드 라인 옵션] 
 * [http://wiki.apache.org/hadoop/HadoopOverview 하둡 코드 개요]
 * [http://wiki.apache.org/hadoop/TroubleShooting 문제해결] 
 * 클러스터 셋업
  * [http://wiki.apache.org/hadoop/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29 우분투 리눅스에서 하둡 실행 (머신 하나로 구성된 클러스터)] (머신 한대를 이용해서 하둡을 설치, 설정, 실행하는 튜토리얼)
  * [http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_%28Single-Node_Cluster%29 OS X 10.5 64-bit에서 하둡 실행 (머신 하나로 구성된 클러스터)] 
  * [http://wiki.apache.org/hadoop/HowToConfigure 하둡 설정하는 법]
  * [http://wiki.apache.org/hadoop/WebApp%20URLs WebApp를 이용한 시스템 모니터링
  * [http://wiki.apache.org/hadoop/NameNodeFailover 네임노드 장애 대처]
  * [http://wiki.apache.org/hadoop/GangliaMetrics How to get metrics into ganglia]
  * [http://wiki.apache.org/hadoop/LargeClusterTips 대규모 클러스터 운용 팁]
  * [http://wiki.apache.org/hadoop/VirtualCluster 가상 머신을 이용한 클러스터 구성]
  * [http://wiki.apache.org/hadoop/DiskSetup 디스크 셋업에 관한 조언]
  * [http://wiki.apache.org/hadoop/PerformanceTuning 성능] 성능 튜닝
  * [http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html 하둡 윈도우/이클립스 튜토리얼] 

 * 맵/리듀스
  * [http://wiki.apache.org/hadoop/HadoopMapReduce 하둡 맵/리듀스]
  * [http://wiki.apache.org/hadoop/HadoopMapRedClasses 하둡 맵/리듀스 클래스]
  * [http://wiki.apache.org/hadoop/HowManyMapsAndReduces 필요한 맵, 리듀스 개수 추산법]
  * [http://wiki.apache.org/hadoop/TaskExecutionEnvironment 실행환경]
  * [http://wiki.apache.org/hadoop/HowToDebugMapReducePrograms 맵/리듀스 프로그램 디버깅]

 * 예제
  * [http://wiki.apache.org/hadoop/WordCount WordCount]
  * [http://wiki.apache.org/hadoop/PythonWordCount Python Word Count]
  * [http://wiki.apache.org/hadoop/C++WordCount C/C++ Word Count]
  * [http://wiki.apache.org/hadoop/Grep Grep]
  * [http://wiki.apache.org/hadoop/Sort Sort]
  * [http://wiki.apache.org/hadoop/RandomWriter RandomWriter]
  * [http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample HDFS에 읽고 쓰기]
  
 * 아마존(Amazon)
  * [http://wiki.apache.org/hadoop/AmazonEC2 AmazonEC2를 이용한 하둡 실행] 
  * [http://wiki.apache.org/hadoop/AmazonS3 AmazonS3를 이&#50863;한 하둡 실행]

 * 벤치마크
  * [http://wiki.apache.org/hadoop/HardwareBenchmarks 하드웨어 벤치마크]
  * [http://wiki.apache.org/hadoop/DataProcessingBenchmarks 데이터 처리 벤치마크]

 * 서브 프로젝트
  * [http://wiki.apache.org/hadoop/Hbase :Hbase], a Bigtable-like structured storage system for Hadoop HDFS
  * [http://wiki.apache.org/pig/ Apache Pig] is a high-level data-flow language and execution framework for parallel computation. It is built on top of Hadoop Core.
  * [http://wiki.apache.org/hadoop/Hive Hive] a data warehouse infrastructure which allows sql-like adhoc querying of data (in any format) stored in Hadoop
  * [http://wiki.apache.org/hadoop/ZooKeeper ZooKeeper] is a high-performance coordination service for distributed applications.

 * Contrib
  * [http://wiki.apache.org/hadoop/HadoopStreaming HadoopStreaming] (Useful for using Hadoop with other programming languages)
  * [http://wiki.apache.org/hadoop/DistributedLucene DistributedLucene], a Proposal for a distributed Lucene index in Hadoop
  * [http://wiki.apache.org/hadoop/MountableHDFS MountableHDFS], Fuse-DFS & other Tools to mount HDFS as a standard filesystem on Linux (and some other Unix OSs)
  * [http://wiki.apache.org/hadoop/HDFS-APIs HDFS-APIs] in perl, python, php, etc
  * [http://wiki.apache.org/hadoop/Chukwa Chukwa] a data collection, storage, and analysis framework

== 개발자 문서 ==
 * [http://wiki.apache.org/hadoop/Roadmap Roadmap], listing release plans.
 * [http://wiki.apache.org/hadoop/HowToContribute HowToContribute]
 * [http://wiki.apache.org/hadoop/HowToDevelopUnitTests HowToDevelopUnitTests]
 * [http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment HowToSetupYourDevelopmentEnvironment]
 * [:CodeReviewChecklist: HowToCodeReview]
 * [http://wiki.apache.org/hadoop/CodeReviewChecklist Jira] usage guidelines
 * [http://wiki.apache.org/hadoop/HowToCommit HowToCommit]
 * [http://wiki.apache.org/hadoop/HowToRelease HowToRelease]
 * [http://wiki.apache.org/hadoop/HudsonBuildServer HudsonBuildServer]
 * [http://wiki.apache.org/hadoop/DevelopmentHints DevelopmentHints]
 * [http://wiki.apache.org/hadoop/ProjectSuggestions ProjectSuggestions]
 * [http://wiki.apache.org/hadoop/HadoopUnderIDEA Building/Testing under IntelliJ IDEA]

== 관련 리소스 ==
 * [http://wiki.apache.org/nutch/NutchHadoopTutorial Nutch Hadoop Tutorial] (Useful for understanding Hadoop in an application context)
 * [http://www.alphaworks.ibm.com/tech/mapreducetools IBM MapReduce Tools for Eclipse] (An Eclipse plug-in that simplifies the creation and deployment of MapReduce programs)
 * Hadoop IRC channel is #hadoop at irc.freenode.net.
 * [http://www.tom-doehler.de/wordpress/index.php/2007/12/19/spring-and-hadoop/ Using Spring and Hadoop] (Discussion of possibilities to use Hadoop and Dependency Injection with Spring)
 * [http://wiki.apache.org/hama Hama], a Distributed Matrix Computational Package based on Hadoop Map/Reduce
 * [http://heart.korea.ac.kr Heart], a Planet-Scale RDF Data Store and a Distributed Processing Engine
 * [http://lucene.apache.org/mahout Mahout], scalable Machine Learning algorithms using Hadoop
 * [http://opensolaris.org/os/project/livehadoop/ Live Hadoop] A three-node, distributed Hadoop cluster running on an !OpenSolaris live CD
 * [https://rc.usf.edu/trac/hadoop/wiki/SGEIntegration SGE Integration] A guide on tight-integration of Hadoop on Sun Gridengine