· KLDP.org · KLDP.net · KLDP Wiki · KLDP BBS ·
Apache Nutch MultiLingual Support

[http]SenÀ» ÀÌ¿ëÇÒ ¼ö ÀÖ°Ô ÇØÁÖ´Â NutchAnalysis.jj ÆÐÄ¡¹æ¹ýÀ» Âü°íÇØ ÀϺ»¾î 󸮰¡ °¡´ÉÇØÁ³À½. ¾ÆÁ÷ ÀÚµ¿ ÀνÄÀº ¸øÇÏ°í ÀÖÀ½.
[http]ÀϺ»¾î N-Gram Profile ¸¸µé±â¸¦ ÀÌ¿ëÇØ ÀϺ»¾î°¡ ÀÚµ¿ ÀÎ½ÄµÇ°Ô ÇÒ ¼ö ÀÖ´Ù. ¾ÆÁ÷ Å×½ºÆ®´Â ¸øÇغÃÀ½.

¾î¶»°Ô?

nutchÀÇ [http]°ø½Ä À§Å°¸¦ µû¶ó Çغ»´Ù. [http]FAQµµ ÂüÁ¶ÇÒ ¸¸ÇÑ ³»¿ëÀÌ ÀÖ´Ù.

java.netÀÇ [http]Introduction to Nutch, Part 1: Crawling°ú [http]Introduction to Nutch, Part 2: Searchingµµ Âü°íÇØ º¸¸é ÁÁÀ»µí..

½ÃÀÛ

·ÎÄÿ¡ nutch binary¸¦ ¼³Ä¡ÇÑ ÈÄ¿¡

dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ mkdir test
dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ vi test/nutch
dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ cat test/nutch 
http://my.domain.name/

Å©·Ñ·¯°¡ ¿ÜºÎ ¸µÅ©·Î ³ª°¡´Â °ÍÀ» ¸·±â À§ÇØ ¾Æ·¡¿Í °°ÀÌ ¼öÁ¤.
dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ vi conf/crawl-urlfilter.txt

# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*my.domain.name/

¹®Á¦ ¹ß»ý ¹× ÇØ°á

nutch Å©·Ñ·¯°¡ »ó¼¼ÇÑ ¸Þ½ÃÁö ¾øÀÌ °è¼Ó NullPointerExceptionÀ» ³ÂÀ½. ã¾Æº» °á°ú ±âº» ¼³Á¤¿¡ Ãß°¡ÀûÀ¸·Î ÇÊ¿äÇÑ ³»¿ëÀÌ ´©¶ôµÊ.

dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ vi conf/nutch-site.xml

ÇÁ·ÎÆÛƼµé Áß¿¡ Å©·Ñ·¯ÀÇ Á¤º¸¸¦ ³Ö¾îÁØ ÈÄ¿¡ ¹®Á¦¾øÀÌ ½ÇÇàµÊ. ±âº»°ªÀÌ ºñ¾î À־ ExceptionÀÌ ¹ß»ýµÈ °ÍÀ¸·Î º¸ÀÓ.

<name>http.agent.name</name>
<value>My Nutch Test</value>

<name>http.agent.description</name>
<value>Test</value>

<name>http.agent.url</name>
<value>no</value>

<name>http.agent.email</name>
<value>no</value>

ID
Password
Join
You have an ability to sense and know higher truth.


sponsored by andamiro
sponsored by cdnetworks
sponsored by HP

Valid XHTML 1.0! Valid CSS! powered by MoniWiki
last modified 2010-01-14 14:28:37
Processing time 0.0034 sec