Program to compute feature strength in text corpus.
Common practice in textual data processing is to statistically model the word associations with class. Design a program to measure strength between term t and class c using two way contingency table of term t and class c, where
- A is the number of times t and c co-occur.
- B is number of times term t occur without c.
- C is the number of times c occurs without t.
- D is the number of times neither c nor t occurs.
- N is the total number of documents.
Data: Text corpus 10K data points. Each data point comprises of Class (c) Textual Description: Each word in description can be treated as individual term (t).Input format example
2007 - 2011 Polaris Outlaw 525 2x4 IRS Heavy Duty Rear Inboard CV Boot Kit Pair JP
NuMe Pentacle 2 in One Curler / Deep Waver 100% Tourmaline infused ceramic CN
Ladies Women Female Real 925 Silver Mesh Lab Diamond Promise Cocktail Ring Big JP
PANASONIC TC-P42C2 BOARD TNPA5066 (2)SN TNPA5066 2 5066AB JP
PLUG N PLAY CONSOLE SPONGEBOB SQUAREPANTS LOT 2 JAKKS PACIFIC GAMES 2003 2005 CN
Chi-square (t,c) = Develop a program to compute the chi-square for all possible combinations of term (t) and class (c). Program should output accurate results with up to 3 decimal places. Program should be designed with minimum possible time and space complexity.Output Format Example
|Time Limit:||1 sec|
|Source Limit:||50000 Bytes|
|Languages:||C, CPP14, JAVA, PYTH, PYTH 3.6, PYPY, CS2, PAS fpc, PAS gpc, RUBY, PHP, GO, NODEJS, HASK, SCALA, D, PERL, FORT, WSPC, ADA, CAML, ICK, BF, ASM, CLPS, PRLG, ICON, SCM qobi, PIKE, ST, NICE, LUA, BASH, NEM, LISP sbcl, LISP clisp, SCM guile, JS, ERL, TCL, PERL6, TEXT, SCM chicken, CLOJ, FS|
Fetching successful submissions