Motivation: The quantity of criminal cases year 2009 in Taiwan is up to 1.8 millions, Each prosecutor must handle over 211 cases per month, complaints on over loading is laud and clear. While 70 % of criminal cases are drug Abuse, public danger, larceny and fraud, these types of criminal cases may have different story though, the complexity are relative simple than cases of killing, corruption etc., but prosecutors still spend costly time on these cases handling. In this paper we try to use text mining technology to provide solution on this issue. Approach: We use the police’s investigation document of criminal case to compare with judgment history of court, and use Cosine Similarity algorithm to calculate coefficient of similarity, base on the highest coefficient, we find the closest judgment of this type of criminal case, that can be used to decide and generate the draft of indictment for prosecutor.
[1]
Vijay V. Raghavan,et al.
Structural abstractions of hypertext documents for Web-based retrieval
,
1998,
Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).
[2]
K. Bretonnel Cohen,et al.
Getting Started in Text Mining
,
2008,
PLoS Comput. Biol..
[3]
김현철.
[서평]「Data Mining Techniques : For Marketing, Sales, and Customer Support」
,
1999
.
[4]
Tong Zhang,et al.
Text Mining: Predictive Methods for Analyzing Unstructured Information
,
2004
.
[5]
Bo Pang,et al.
Thumbs up? Sentiment Classification using Machine Learning Techniques
,
2002,
EMNLP.
[6]
Keh-Jiann Chen,et al.
Unknown Word Extraction for Chinese Documents
,
2002,
COLING.