Next:
15.3 Term Distribution Model
Up:
15.2 The Vector Space
Previous:
15.2.1 Vector similarity
15.2.2 Term weighting
vector space $B$N3FMWAG$r$I$N$h$&$JCM$K$9$l$P$$$$$N$+(B?
3$B$D$N(B frequency (Table 15.3)
term frequency
t
i
,
j
(
d
j
$BCf$N(B
w
i
$B$N?t(B)
document frequency
df
i
(
w
i
$B$,4^$^$l$k(B document$B?t(B)
collection frequency
cf
i
(
w
i
$B$NAm?t(B)
$B4K>W4X?t(B
$B?M4V$N463P$O(B log $B$K6a$$(B, $B=EMWEY$O(B $B4K>W4X?t(B
f
(
tf
) $B$GI=8=$5$l$k$3(B $B$H$,B?$$(B
document frequency
$B0UL#E*$K=EMW$J8l(B(keyword)$B$O(B1$B$D$N(B document $B=8Cf$7$d$9$$(B
df
i
$B>.(B
$B0UL#E*$K=EMW$G$J$$8l(B(function word $B$J$I(B)$B$O6Q0l$KJ,I[(B
df
i
$BBg(B
Table 15.4
insurance $B$N$[$&$,(B try $B$h$j=EMW8l(B
tf
idf
inversed document frequency (idf)
, (N $B$O(B document $B?t(B)
w
i
$B$,$9$Y$F$N(B 1 $B$D$N(B document $B$K4^$^$l$k(B
w
i
$B$,$9$Y$F$N(B document $B$K4^$^$l$k(B
idf
=0
term frequency $B$H(B document frequency $B$rAH$_9g$o$;$k(B
$B$$$m$$$m$J(B tf
idf
Table 15.5
1999-08-03