· KLDP.org · KLDP.net · KLDP Wiki · KLDP BBS ·
Apache Nutch MultiLingual Support

[http]SenÀ» ÀÌ¿ëÇÒ ¼ö ÀÖ°Ô ÇØÁÖ´Â NutchAnalysis.jj ÆÐÄ¡¹æ¹ýÀ» Âü°íÇØ ÀϺ»¾î 󸮰¡ °¡´ÉÇØÁ³À½. ¾ÆÁ÷ ÀÚµ¿ ÀνÄÀº ¸øÇϰí ÀÖÀ½.
[http]ÀϺ»¾î N-Gram Profile ¸¸µé±â¸¦ ÀÌ¿ëÇØ ÀϺ»¾î°¡ ÀÚµ¿ ÀÎ½ÄµÇ°Ô ÇÒ ¼ö ÀÖ´Ù. ¾ÆÁ÷ Å×½ºÆ®´Â ¸øÇغÃÀ½.

¾î¶»°Ô?

nutchÀÇ [http]°ø½Ä À§Å°¸¦ µû¶ó ÇØº»´Ù. [http]FAQµµ ÂüÁ¶ÇÒ ¸¸ÇÑ ³»¿ëÀÌ ÀÖ´Ù.

java.netÀÇ [http]Introduction to Nutch, Part 1: Crawling°ú [http]Introduction to Nutch, Part 2: Searchingµµ Âü°íÇØ º¸¸é ÁÁÀ»µí..

½ÃÀÛ

·ÎÄÿ¡ nutch binary¸¦ ¼³Ä¡ÇÑ ÈÄ¿¡

dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ mkdir test
dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ vi test/nutch
dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ cat test/nutch 
http://my.domain.name/

Å©·Ñ·¯°¡ ¿ÜºÎ ¸µÅ©·Î ³ª°¡´Â °ÍÀ» ¸·±â À§ÇØ ¾Æ·¡¿Í °°ÀÌ ¼öÁ¤.
dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ vi conf/crawl-urlfilter.txt

# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*my.domain.name/

¹®Á¦ ¹ß»ý ¹× ÇØ°á

nutch Å©·Ñ·¯°¡ »ó¼¼ÇÑ ¸Þ½ÃÁö ¾øÀÌ °è¼Ó NullPointerExceptionÀ» ³ÂÀ½. ã¾Æº» °á°ú ±âº» ¼³Á¤¿¡ Ãß°¡ÀûÀ¸·Î ÇÊ¿äÇÑ ³»¿ëÀÌ ´©¶ôµÊ.

dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ vi conf/nutch-site.xml

ÇÁ·ÎÆÛƼµé Áß¿¡ Å©·Ñ·¯ÀÇ Á¤º¸¸¦ ³Ö¾îÁØ ÈÄ¿¡ ¹®Á¦¾øÀÌ ½ÇÇàµÊ. ±âº»°ªÀÌ ºñ¾î À־ ExceptionÀÌ ¹ß»ýµÈ °ÍÀ¸·Î º¸ÀÓ.

<name>http.agent.name</name>
<value>My Nutch Test</value>

<name>http.agent.description</name>
<value>Test</value>

<name>http.agent.url</name>
<value>no</value>

<name>http.agent.email</name>
<value>no</value>

ID
Password
Join
You are tricky, but never to the point of dishonesty.


sponsored by andamiro
sponsored by cdnetworks
sponsored by HP

Valid XHTML 1.0! Valid CSS! powered by MoniWiki
last modified 2010-01-14 14:28:37
Processing time 0.0038 sec