按Enter到主內容區
:::

應用資料探勘技術建構竊盜犯罪預測模型Defining Prediction Models for Three Types of Theft Crimes by Applying Data Mining Techniques

  • 發布日期:
  • 最後更新日期:109-05-13
  • 資料點閱次數:914
竊盜犯罪一直是發生率高且嚴重的犯罪型態,竊盜犯罪的發生對民眾將產生恐懼不安、財務上的損害,並影響民眾治安觀感。故本研究以臺北市2012至2014年間各里發生之三類竊盜犯罪案件(住宅竊盜、機車竊盜、汽車竊盜)為研究對象,運用巨量分析中資料探勘技術,針對不同種類竊盜定義重要預測變項並建立預測模型。 資料探勘技術選擇重要預測變項時,適用自變項多、變數間相互作用與迴歸模式皆未知的情況,特別適合建立巨觀犯罪預測模型。研究將犯罪資料分為犯罪次數與犯罪熱點分組兩種預測方式,分析不同分類方式的預測結果,並加入迴歸分析比較。針對犯罪次數,使用分類樹與隨機森林預測,並以傳統線性迴歸模型為參考,比較其與資料探勘模型之模型配適;針對犯罪熱點分組,依據文獻分為「Ignatans熱點」、「Eck熱點」、「Weisburd熱點」三種分組,使用分類樹與隨機森林預測顯著因子,並以混淆矩陣評估分類樹、隨機森林、支持向量機器的預測能力。本研究使用次級資料,資料來源為國土資訊系統社會經濟資料庫、臺北市政府警察局委託中央警察大學進行臺北市錄影監視系統運用於犯罪偵防效能之研究,分析軟體為Arc GIS與RStudio。 結果顯示在犯罪次數預測:(1)單獨生活戶越多、遷入人口數越多,監視器密度越高,共同生活戶越少,住宅竊盜越多;(2)總面積越大、遷入人口數越多、共同生活戶越多,低教育程度人口比越高、老化指標越高,汽車竊盜越多;(3) 單獨生活戶越多,低教育程度人口比越高、監視器密度越高、老化指標越高,機車竊盜數越多;(4)模式評比顯示隨機森林在各類竊盜的模型配適度解釋力最佳。在犯罪熱點分組預測:(1)共同生活戶越少、遷出人口數越少,單獨人口數越多,住宅竊盜數越多;(2)結婚對數越多、低教育程度人口數越多,汽車竊盜數越多;(3)單獨生活戶越多、遷出人口數越少、監視器密度越高,機車竊盜數越多;(4)混淆矩陣預測顯示住宅竊盜犯罪以Weisburd熱點分組的分類樹預測結果最佳,機車竊盜、汽車竊盜以Eck熱點分組的支持向量機器預測結果最佳。 本研究建議方面,針對潛在的影響因子,對犯罪高風險的鄰里進行宣導與防治對策,並增加資料動態與正確性以提升解釋力;考慮以質性訪談搭配資料探勘結果,深入探討犯罪原因。Theft is a prevalent and severe type of crime which would significantly cause fear of crime, property loss, and poor perception of public security. Through data mining techniques, this study seeks to define variable selection and establish model building, based on the crime dataset of Taipei City, Taiwan during the period of 2012 to 2014. Data mining techniques are suitable for exploring possible predictor variables that exist in current socio-economic and demographic datasets. However, the form of crime prediction models is unknown and the relationships between variables are also unclear. Crime data was divided into crime frequency and crime hotspot groups, we applied data mining techniques and analyzed via regression analysis. For crime frequency: we applied decision trees, random forest models and linear regression models to compare the fit of the three models. For crime hot spot groups, we divided crime data into groups of Ignatans hot spot, Eck hot spot, and Weisburd hot spot in accordance with a recent study. In addition, we applied decision trees and random forest models to find predictive factors and generated a confusion matrix to evaluate the predictive ability of the data mining techniques. Crime data was obtained from Taipei City Police Department and the external factors such as demographic and socio-economic variables were acquired from National Geographic Information System. Analysis software in study was Arc GIS and RStudio. The results show that in crime frequency (1) For burglary theft, areas with more independent living households, move-in residents, CCTV density but less common living households tend to have higher crime rates. (2) For car theft, areas that are larger, with more move-in residents, common living households, residents with lower education levels and aging index tend to have higher crime rates. (3) For motorcycle theft, areas with more independent living households, residents with lower education levels, CCTV density and aging index tend to have higher crime rates. (4) Model evaluation finds that best explanatory power is produced by random forest model, and random forest model is worse than traditional regression model in burglary theft. The results show that in crime hot spot groups (1) for burglary theft, areas with less move-out residents, common living households and more independent living households tend to have higher crime rates. (2) for car theft, areas with residents with lower education levels and married tend to have higher crime rates. (3) for motorcycle theft, areas with more independent living households, CCTV density but less move- out residents tend to have higher crime rates. (4) From the confusion matrix, decision trees model with Weisburd hot spot group predicts best result in burglary theft; support vector machine with Eck hot spot group predicts the best result. This study suggests that first, the authorities should consider generating awareness and develop prevention strategies against high-risk village or neighborhood according to related factors. Second, adding dynamic information and increasing the accuracy of it to strengthen the explanatory power. Last but not least, applying qualitative research method in coordination with data mining techniques is helpful for investigating the causes of crime in deep.

資料來源:http://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/ccd=MtgIBv/record?r1=2&h1=5

回頁首