Describe cjftn0/2008-03-30 -¹ÌÆý𣿡 10cvÀÇ Æò±ÕÀÌ 70%Á¤µµ¶ó°í ¸»¾¸µå·È´Âµ¥, À̹øÀÇ °á°ú´Â ¾à°£ ³·°Ô ³ª¿Ô½À´Ï´Ù. ÀÌ·± ÀÌÀ¯´Â ·£´ý »ùÇøµ¿¡ ÀÇÇؼ­ ÃʱâÀÇ »ùÇõéÀ» n1,n2,.....,n10±îÁöÀÇ µ¥ÀÌÅÍ ¼ÂÀ¸·Î ³ª´©°Ô µÇ´Âµ¥, ÀÌ µ¥ÀÌÅÍ ¼ÂÀÇ °¢ ¸ð¼ö(mean, variance)°¡ ¸Å¹ø ¸ðµ¨ ¸¸µé ¶§ ¸¶´Ù º¯°æµÇ°Ô µÇ°í, ÀÌ·¯ÇÑ ¿µÇâÀÌ ¸Å¹ø °á°ú¿¡ ¿µÇâÀ» Áֱ⠶§¹®À¸·Î »ý°¢µË´Ï´Ù. ¾ÕÀ¸·Î ¿©·¯ ¹ø ¸ðµ¨À» ¸¸µé¾î¼­ ¸ðµ¨ Á¤È®µµÀÇ Æò±ÕÀ» »êÃâÇÏ¿© °á°ú¿¡ Æ÷ÇÔ½ÃÅ°°Ú½À´Ï´Ù. ±³¼ö´Ô °¡¸£Ä§À¸·Î Áö±Ý±îÁö °£°ú ÇÏ°í ÀÖ¾ú´ø ºÎºÐÀ» »õ»ï ±ú´Ý°Ô µÇ¾ú½À´Ï´Ù. ^^;; -ºÐ·ù ¸ðµ¨ ±¸¼º¿¡ »ç¿ëµÈ ¹æ½Ä 1.msc.features.select ·Î ¼Ó¼ºÂ÷¿ø ÁÙÀÓ 2.tree ¸ðµ¨ - »ç¿ëµÈ feature selection ¹æ½ÄÀº gini 3. 10 cross validation -½Ãµµ Çغ» ¹æ¹ý 1.¼Ó¼º ¼±ÅÃÇÏ´Â ´Ù¸¥ ¸Þ¼­µå ÀÌ¿ë- relifcat feature selection °á°ú- 10cvÀÇ Æò±Õ ¾à 58% 2.rpart ÀÌ¿ëÇÏ¿© ¸ðµ¨ ±¸Ãà½Ãµµ -º¤ÅÍÇÒ´ç ¿¡·¯ or too many element specified 3.msc.features.select¿¡¼­ RemCorrcol, keepCol ÆĶó¹ÌÅÍ Á¶Á¤ 0.98/0.72 => null vector (¼±ÅÃµÈ ¼Ó¼ºÀÌ ¾ø´Â cv ¹ß»ý) 0.98/0.75 => null vector (¼±ÅÃµÈ ¼Ó¼ºÀÌ ¾ø´Â cv ¹ß»ý) 0.98/0.8 => null vector (¼±ÅÃµÈ ¼Ó¼ºÀÌ ¾ø´Â cv ¹ß»ý) -½Ãµµ µµÁß ¾òÀº ¾ÆÀ̵ð¾î 1.¿ÀºÐ·ùÀ²ÀÌ 15% ÀÌÇÏÀÎ 10cvµé Áß¿¡¼­ °øÅëµÈ ¼Ó¼ºÀ» ¹ß°ß ÀÌ ¼Ó¼ºÀÌ »ç¿ë ¾È µÈ 10cv´Â ¿ÀºÐ·ùÀ²ÀÌ ³ô´Ù´Â °ÍÀ» ¹ß°ß =>¿ÀºÐ·ùÀ²ÀÌ 15% ÀÌÇÏÀÎ 10cvµé Áß¿¡¼­ ¼±ÅÃµÈ ¼Ó¼ºµéÀÇ Àüü ÁýÇÕ¿¡¼­ ºó¹ßÇÏ ´Â ¼Ó¼º°ú °øÅë¼Ó¼ºÀ» ÀÌ¿ëÇÏ¿© ¸ðµ¨ ±¸Ãà.[´ÙÀ½ÁÖ °èȹ] [ºó¹ßÇÏ´Â ¼Ó¼ºÀÇ ºóµµ¼ö ¹× °øÅë¼Ó¼º(5¹øÀÇ ºóµµ¼ö °¡Áø ¼Ó¼º)] 5¹øÀÇ 10 cross validationÀ» ÅëÇØ ¸¸µé¾îÁø 10cv Áß¿¡¼­ ¾à15% ÀÌÇÏÀÇ ¿ÀºÐ·ùÀ²À» ³ªÅ¸ ³½ 10cv ¸¸ ¼±ÅÃÇÏ¿©, ÀÌ°ÍÀÇ ¼±Åà µÇ¾îÁø ¼Ó¼ºÀÇ °øÅëµÈ ¼Ó¼º°ú ºóµµ¼ö¸¦ ±¸Çß½À´Ï´Ù. ¼Ó¼º ºóµµ¼ö "X228769_at" °øÅë¼Ó¼º ¡°X227094_at¡± 4 "X219821_s_at" 3 "X1557483_at" 3 "X1564190_x_at" 2 "X221572_s_at" 2 "X227356_at" 2 "X219429_at" 2 ------------------------------------------ Classification tree: tree(formula = class ~ ., data = iter10, na.action = na.pass, split = c("gini"), x = FALSE, y = TRUE) Variables actually used in tree construction: [1] "X219821_s_at" "X1557483_at" "X227094_at" "X212707_s_at" [5] "X234668_at" "X220455_at" "X206819_at" "X1564190_x_at" [9] "X228769_at" Number of terminal nodes: 13 Residual mean deviance: 0.5868 = 69.24 / 118 Misclassification error rate: 0.1527 = 20 / 131 > a10 <-classError(p10, iter_t10[,num]) > a10 $misclassified [1] 7 11 $errorRate [1] 0.1538462 ----------------------------------------------------- Classification tree: tree(formula = class ~ ., data = iter7, na.action = na.pass, split = c("gini"), x = FALSE, y = TRUE) Variables actually used in tree construction: [1] "X219821_s_at" "X210387_at" "X221572_s_at" "X1564190_x_at" [5] "X228769_at" "X227094_at" "X227733_at" Number of terminal nodes: 12 Residual mean deviance: 0.4563 = 54.3 / 119 Misclassification error rate: 0.1145 = 15 / 131 > a7 <-classError(p7, iter_t7[,num]) > a7 $misclassified [1] 3 10 $errorRate [1] 0.1538462 ------------------------------------------------ Classification tree: tree(formula = class ~ ., data = iter10, na.action = na.pass, split = c("gini"), x = FALSE, y = TRUE) Variables actually used in tree construction: [1] "X230134_s_at" "X228769_at" "X204712_at" "X227094_at" "X205695_at" [6] "X227356_at" "X1557483_at" "X223517_at" Number of terminal nodes: 13 Residual mean deviance: 0.4802 = 56.66 / 118 Misclassification error rate: 0.1069 = 14 / 131 > a10 <-classError(p10, iter_t10[,num]) > a10 $misclassified [1] 7 9 $errorRate [1] 0.1538462 ----------------------------------------------------- Classification tree: tree(formula = class ~ ., data = iter6, na.action = na.pass, split = c("gini"), x = FALSE, y = TRUE) Variables actually used in tree construction: [1] "X219821_s_at" "X219429_at" "X243476_at" "X228769_at" "X222283_at" [6] "X201611_s_at" "X1557483_at" "X227094_at" Number of terminal nodes: 13 Residual mean deviance: 0.4657 = 54.96 / 118 Misclassification error rate: 0.1221 = 16 / 131 > a6 <-classError(p6, iter_t6[,num]) > a6 $misclassified [1] 5 10 $errorRate [1] 0.1538462 -------------------------------------------------- Classification tree: tree(formula = class ~ ., data = iter8, na.action = na.pass, split = c("gini"), x = FALSE, y = TRUE) Variables actually used in tree construction: [1] "X206766_at" "X219429_at" "X221572_s_at" "X226745_at" "X228769_at" [6] "X227356_at" "X236717_at" "X223147_s_at" "X226748_at" "X222657_s_at" Number of terminal nodes: 13 Residual mean deviance: 0.473 = 55.82 / 118 Misclassification error rate: 0.1145 = 15 / 131 > a8 <-classError(p8, iter_t8[,num]) > a8 $misclassified [1] 5 11 $errorRate [1] 0.1538462 --------------------------------------------------