Association Rule Mining

Concept

In [2]:
library(arules)
library(dplyr)
In [41]:
par(family ="NanumBarunGothic")
In [3]:
tr <- read.delim("dataTransactions.tab", stringsAsFactors=FALSE)
head(tr)
datetimecustidstoreproductbrandcornerimportamountinstallment
12000-05-01 10:4318313 신촌점 4104840008000 샤넬 화장품 1 113000 3
22000-05-01 11:0018313 신촌점 2.7e+12 식품 일반식품 0 91950 3
32000-05-01 11:3327222 신촌점 4545370944500 까사미아 가구 0 598000 3
42000-05-01 11:4327222 신촌점 4500860043900 대아통상 기타 0 20100 1
52000-05-01 11:5327222 신촌점 4538130048700 토이플러스 문화완구 0 24000 1
62000-05-01 12:0027222 신촌점 4406010020474 베베 유아동복 0 28000 1
  • 일반식품, 화장품 제외
    • 가장 많이 팔린 Data
    • 돌려봐야 계속 나오게 된다.
  • Custid와 Corner 가 중복이 되지 않도록 추출.
In [5]:
tr.filter <- tr %>%
  filter(!(corner %in% c("일반식품","화장품"))) %>%
  distinct(custid, corner)
In [6]:
head(tr.filter)
custidcorner
127222가구
227222기타
327222 문화완구
427222 유아동복
547084 스포츠
631090 스포츠
  • Custid별로 corner 데이터를 짤라 List로 형성.
In [7]:
head(split(tr.filter$corner, tr.filter$custid))
$`10070`
  1. "유니캐주얼"
  2. "캐릭터캐주얼"
  3. "유아동복"
  4. "장신구"
  5. "영캐주얼"
  6. "스포츠"
  7. "니트단품"
$`10139`
  1. "스포츠"
  2. "영캐주얼"
  3. "엘레강스캐주얼"
  4. "문화완구"
  5. "섬유"
  6. "타운모피"
  7. "유아동복"
$`10208`
  1. "유아동복"
  2. "문화완구"
  3. "니트단품"
  4. "트래디셔널캐주얼"
  5. "유니캐주얼"
  6. "장신구"
  7. "조리욕실"
  8. "가전"
  9. "섬유"
  10. "스포츠"
  11. "영캐주얼"
  12. "도자기크리스탈"
  13. "가구"
  14. "기타"
  15. "캐릭터캐주얼"
  16. "침구수예"
  17. "타운모피"
  18. "피혁"
$`10275`
  1. "캐릭터캐주얼"
  2. "니트단품"
  3. "정장셔츠"
$`10350`
  1. "문화완구"
  2. "트래디셔널캐주얼"
  3. "섬유"
  4. "스포츠"
  5. "유니캐주얼"
  6. "캐릭터캐주얼"
  7. "장신구"
$`10425`
  1. "피혁"
  2. "니트단품"
In [19]:
# custid별로 corner를 자른다. 
trans <- as(split(tr.filter$corner, tr.filter$custid), "transactions") #transactions 메소드 
trans
transactions in sparse format with
 487 transactions (rows) and
 24 items (columns)
In [8]:
# trans <- read.transactions("dataTransactions.tab", format = "single", sep="\t", cols = c(2,6), skip=1) 
# 일반식품, 화장품 제거없이 하려면 위와 같이 해도 상관 없다.

Example of Transactions from a Matrix

In [9]:
a_matrix <- matrix(
      c(1,1,1,0,0,
    1,1,0,0,0,
    1,1,0,1,0,
    0,0,1,0,1,
    1,1,0,1,1), ncol = 5)
In [10]:
dimnames(a_matrix) <-  list(
    c("a","b","c","d","e"),
    paste("Tr",c(1:5), sep = ""))

a_matrix
Tr1Tr2Tr3Tr4Tr5
a11101
b11101
c10010
d00101
e00011
  • Transactions : count of rows
    • 사용자의 수라고 보면 될 것같다. ( 위의 데이터에 )
  • Items : count of columns
    • 아이템 수
In [11]:
trans2 <-  as(a_matrix, "transactions")
trans2
transactions in sparse format with
 5 transactions (rows) and
 5 items (columns)
  • 각 사용자가 포함한 Item을 list형태로 ( R의 list는 아님 ) 만들어 놓음. 집합 형태로
In [12]:
inspect(trans2)
    items             transactionID
[1] {Tr1,Tr2,Tr3,Tr5} a
[2] {Tr1,Tr2,Tr3,Tr5} b
[3] {Tr1,Tr4}         c
[4] {Tr3,Tr5}         d
[5] {Tr4,Tr5}         e

Example of transaction from data.frame

In [13]:
a_df <- data.frame(
    age = as.factor(c(6,8,7,6,9,5)),
    grade = as.factor(c(1,3,1,1,4,1)))
In [14]:
a_df
agegrade
161
283
371
461
594
651
In [15]:
trans3 <- as(a_df, "transactions")
inspect(trans3)
    items           transactionID
[1] {age=6,grade=1} 1
[2] {age=8,grade=3} 2
[3] {age=7,grade=1} 3
[4] {age=6,grade=1} 4
[5] {age=9,grade=4} 5
[6] {age=5,grade=1} 6
In [16]:
options(repr.plot.width=4,repr.plot.height=3)
image(trans3)

Continue Lectures

In [20]:
inspect(trans[1:2])
    items
[1] {니트단품,스포츠,영캐주얼,유니캐주얼,유아동복,장신구,캐릭터캐주얼}
[2] {문화완구,섬유,스포츠,엘레강스캐주얼,영캐주얼,유아동복,타운모피}
    transactionID
[1] 10070
[2] 10139
  • size(trans) each transaction length
In [21]:
transactionInfo(trans[size(trans) > 20])
transactionID
8415968
42042322
In [22]:
image(trans[1:5])
In [23]:
options(repr.plot.width=4,repr.plot.height=5)
image(sample(trans, 100, replace = FALSE), main = "matrix diagram") # Sampling
  • Table()과 동일한 결과를 출력한다.
In [24]:
t(itemFrequency(trans, type="absolute"))
가구가전기타니트단품도자기크리스탈디자이너부띠끄문화완구생활용품섬유수입명품...유아동복장신구정장셔츠조리욕실침구수예캐릭터캐주얼타운모피트래디셔널캐주얼피혁행사장
41 14095 25810574 1583 259107...22217814916574 21233 18932313
In [25]:
table(tr.filter$corner)
            가구             가전             기타         니트단품
              41              140               95              258
  도자기크리스탈   디자이너부띠끄         문화완구         생활용품
             105               74              158                3
            섬유         수입명품           스포츠   엘레강스캐주얼
             259              107              281              163
        영캐주얼       유니캐주얼         유아동복           장신구
             235              266              222              178
        정장셔츠         조리욕실         침구수예     캐릭터캐주얼
             149              165               74              212
        타운모피 트래디셔널캐주얼             피혁           행사장
              33              189              323               13 
In [26]:
t(round(itemFrequency(trans)[order(itemFrequency(trans), decreasing = TRUE)],2))
피혁스포츠유니캐주얼섬유니트단품영캐주얼유아동복캐릭터캐주얼트래디셔널캐주얼장신구...가전수입명품도자기크리스탈기타디자이너부띠끄침구수예가구타운모피행사장생활용품
0.660.580.550.530.530.480.460.440.390.37... 0.290.220.220.2 0.150.150.080.070.030.01
In [43]:
options(repr.plot.width=4,repr.plot.height=4)
itemFrequencyPlot(trans, support=0.2, cex.names=0.8,family = "HYsanB")

In [44]:
itemFrequencyPlot(trans, topN = 20, main = "support top 20 items")

  • rules <- apriori(trans, parameter=list(support=0.2, confidence=0.8), appearance=list(rhs="스포츠",default="lhs"))
  • 결과에 스포츠만 만든다.
In [27]:
rules <- apriori(trans, parameter=list(support=0.2, confidence=0.8))
summary(rules)
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.8    0.1    1 none FALSE            TRUE       5     0.2      1
 maxlen target   ext
     10  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 97

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[24 item(s), 487 transaction(s)] done [0.00s].
sorting and recoding items ... [17 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 done [0.00s].
writing ... [70 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
set of 70 rules

rule length distribution (lhs + rhs):sizes
 2  3  4
 1 40 29

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
    2.0     3.0     3.0     3.4     4.0     4.0

summary of quality measures:
    support         confidence          lift
 Min.   :0.2012   Min.   :0.8000   Min.   :1.233
 1st Qu.:0.2115   1st Qu.:0.8182   1st Qu.:1.283
 Median :0.2259   Median :0.8413   Median :1.353
 Mean   :0.2341   Mean   :0.8444   Mean   :1.383
 3rd Qu.:0.2464   3rd Qu.:0.8624   3rd Qu.:1.463
 Max.   :0.3265   Max.   :0.9160   Max.   :1.696

mining info:
  data ntransactions support confidence
 trans           487     0.2        0.8
rule1 rule2 rule3
2 3 4
1 40 29
a->b a+b->c a+b+c->d
In [28]:
inspect(rules)
     lhs                                     rhs          support   confidence
[1]  {트래디셔널캐주얼}                   => {피혁}       0.3264887 0.8412698
[2]  {스포츠,엘레강스캐주얼}              => {피혁}       0.2012320 0.8596491
[3]  {스포츠,조리욕실}                    => {피혁}       0.2053388 0.8403361
[4]  {조리욕실,피혁}                      => {스포츠}     0.2053388 0.8196721
[5]  {영캐주얼,장신구}                    => {피혁}       0.2012320 0.8596491
[6]  {유니캐주얼,장신구}                  => {피혁}       0.2012320 0.8521739
[7]  {니트단품,장신구}                    => {섬유}       0.2032854 0.8114754
[8]  {니트단품,장신구}                    => {피혁}       0.2114990 0.8442623
[9]  {섬유,장신구}                        => {피혁}       0.2156057 0.8333333
[10] {스포츠,장신구}                      => {피혁}       0.2114990 0.8306452
[11] {영캐주얼,트래디셔널캐주얼}          => {유니캐주얼} 0.2053388 0.8264463
[12] {영캐주얼,트래디셔널캐주얼}          => {니트단품}   0.2094456 0.8429752
[13] {영캐주얼,트래디셔널캐주얼}          => {스포츠}     0.2053388 0.8264463
[14] {영캐주얼,트래디셔널캐주얼}          => {피혁}       0.2197125 0.8842975
[15] {유니캐주얼,트래디셔널캐주얼}        => {스포츠}     0.2299795 0.8115942
[16] {유니캐주얼,트래디셔널캐주얼}        => {피혁}       0.2464066 0.8695652
[17] {니트단품,트래디셔널캐주얼}          => {섬유}       0.2135524 0.8062016
[18] {니트단품,트래디셔널캐주얼}          => {스포츠}     0.2156057 0.8139535
[19] {니트단품,트래디셔널캐주얼}          => {피혁}       0.2381930 0.8992248
[20] {섬유,트래디셔널캐주얼}              => {스포츠}     0.2217659 0.8000000
[21] {섬유,트래디셔널캐주얼}              => {피혁}       0.2340862 0.8444444
[22] {스포츠,트래디셔널캐주얼}            => {피혁}       0.2628337 0.8951049
[23] {트래디셔널캐주얼,피혁}              => {스포츠}     0.2628337 0.8050314
[24] {섬유,유아동복}                      => {피혁}       0.2114990 0.8174603
[25] {영캐주얼,캐릭터캐주얼}              => {피혁}       0.2320329 0.8759690
[26] {유니캐주얼,캐릭터캐주얼}            => {피혁}       0.2299795 0.8358209
[27] {니트단품,캐릭터캐주얼}              => {피혁}       0.2464066 0.8633094
[28] {섬유,캐릭터캐주얼}                  => {피혁}       0.2505133 0.8413793
[29] {스포츠,캐릭터캐주얼}                => {피혁}       0.2464066 0.9160305
[30] {스포츠,영캐주얼}                    => {유니캐주얼} 0.2710472 0.8198758
[31] {영캐주얼,유니캐주얼}                => {피혁}       0.2956879 0.8421053
[32] {섬유,영캐주얼}                      => {니트단품}   0.2546201 0.8000000
[33] {니트단품,영캐주얼}                  => {피혁}       0.2874743 0.8484848
[34] {섬유,영캐주얼}                      => {피혁}       0.2689938 0.8451613
[35] {스포츠,영캐주얼}                    => {피혁}       0.2813142 0.8509317
[36] {니트단품,유니캐주얼}                => {피혁}       0.2813142 0.8303030
[37] {섬유,유니캐주얼}                    => {피혁}       0.2895277 0.8294118
[38] {스포츠,유니캐주얼}                  => {피혁}       0.3223819 0.8532609
[39] {니트단품,섬유}                      => {피혁}       0.2977413 0.8479532
[40] {니트단품,스포츠}                    => {피혁}       0.2936345 0.8461538
[41] {섬유,스포츠}                        => {피혁}       0.3100616 0.8388889
[42] {스포츠,유니캐주얼,트래디셔널캐주얼} => {피혁}       0.2094456 0.9107143
[43] {유니캐주얼,트래디셔널캐주얼,피혁}   => {스포츠}     0.2094456 0.8500000
[44] {섬유,영캐주얼,유니캐주얼}           => {니트단품}   0.2032854 0.8250000
[45] {니트단품,섬유,유니캐주얼}           => {영캐주얼}   0.2032854 0.8181818
[46] {니트단품,영캐주얼,유니캐주얼}       => {스포츠}     0.2114990 0.8240000
[47] {니트단품,스포츠,영캐주얼}           => {유니캐주얼} 0.2114990 0.8442623
[48] {니트단품,스포츠,유니캐주얼}         => {영캐주얼}   0.2114990 0.8174603
[49] {니트단품,영캐주얼,유니캐주얼}       => {피혁}       0.2238193 0.8720000
[50] {섬유,영캐주얼,유니캐주얼}           => {스포츠}     0.2032854 0.8250000
[51] {섬유,스포츠,영캐주얼}               => {유니캐주얼} 0.2032854 0.8181818
[52] {섬유,영캐주얼,유니캐주얼}           => {피혁}       0.2197125 0.8916667
[53] {섬유,영캐주얼,피혁}                 => {유니캐주얼} 0.2197125 0.8167939
[54] {스포츠,영캐주얼,유니캐주얼}         => {피혁}       0.2381930 0.8787879
[55] {영캐주얼,유니캐주얼,피혁}           => {스포츠}     0.2381930 0.8055556
[56] {스포츠,영캐주얼,피혁}               => {유니캐주얼} 0.2381930 0.8467153
[57] {니트단품,스포츠,영캐주얼}           => {섬유}       0.2012320 0.8032787
[58] {섬유,스포츠,영캐주얼}               => {니트단품}   0.2012320 0.8099174
[59] {니트단품,섬유,영캐주얼}             => {피혁}       0.2258727 0.8870968
[60] {섬유,영캐주얼,피혁}                 => {니트단품}   0.2258727 0.8396947
[61] {니트단품,스포츠,영캐주얼}           => {피혁}       0.2217659 0.8852459
[62] {섬유,스포츠,영캐주얼}               => {피혁}       0.2197125 0.8842975
[63] {섬유,영캐주얼,피혁}                 => {스포츠}     0.2197125 0.8167939
[64] {니트단품,섬유,유니캐주얼}           => {피혁}       0.2258727 0.9090909
[65] {니트단품,유니캐주얼,피혁}           => {섬유}       0.2258727 0.8029197
[66] {니트단품,스포츠,유니캐주얼}         => {피혁}       0.2279261 0.8809524
[67] {니트단품,유니캐주얼,피혁}           => {스포츠}     0.2279261 0.8102190
[68] {섬유,스포츠,유니캐주얼}             => {피혁}       0.2361396 0.8914729
[69] {섬유,유니캐주얼,피혁}               => {스포츠}     0.2361396 0.8156028
[70] {니트단품,섬유,스포츠}               => {피혁}       0.2320329 0.9040000
     lift
[1]  1.268416
[2]  1.296127
[3]  1.267008
[4]  1.420571
[5]  1.296127
[6]  1.284857
[7]  1.525824
[8]  1.272928
[9]  1.256450
[10] 1.252397
[11] 1.513080
[12] 1.591197
[13] 1.432311
[14] 1.333291
[15] 1.406571
[16] 1.311078
[17] 1.515908
[18] 1.410660
[19] 1.355797
[20] 1.386477
[21] 1.273203
[22] 1.349585
[23] 1.395197
[24] 1.232518
[25] 1.320733
[26] 1.260201
[27] 1.301646
[28] 1.268581
[29] 1.381136
[30] 1.501051
[31] 1.269676
[32] 1.510078
[33] 1.279294
[34] 1.274283
[35] 1.282984
[36] 1.251881
[37] 1.250537
[38] 1.286495
[39] 1.278493
[40] 1.275780
[41] 1.264826
[42] 1.373120
[43] 1.473132
[44] 1.557267
[45] 1.695551
[46] 1.428071
[47] 1.545698
[48] 1.694056
[49] 1.314749
[50] 1.429804
[51] 1.497949
[52] 1.344401
[53] 1.495408
[54] 1.324984
[55] 1.396105
[56] 1.550189
[57] 1.510412
[58] 1.528797
[59] 1.337511
[60] 1.585005
[61] 1.334721
[62] 1.333291
[63] 1.415582
[64] 1.370673
[65] 1.509737
[66] 1.328247
[67] 1.404187
[68] 1.344109
[69] 1.413518
[70] 1.362997
  • 뻔한 얘기들이 많이 나온다.
    • 즉, Level Down이 필요하다.
In [29]:
inspect(sort(rules, by = "lift")[1:30])
     lhs                                   rhs          support   confidence
[1]  {니트단품,섬유,유니캐주얼}         => {영캐주얼}   0.2032854 0.8181818
[2]  {니트단품,스포츠,유니캐주얼}       => {영캐주얼}   0.2114990 0.8174603
[3]  {영캐주얼,트래디셔널캐주얼}        => {니트단품}   0.2094456 0.8429752
[4]  {섬유,영캐주얼,피혁}               => {니트단품}   0.2258727 0.8396947
[5]  {섬유,영캐주얼,유니캐주얼}         => {니트단품}   0.2032854 0.8250000
[6]  {스포츠,영캐주얼,피혁}             => {유니캐주얼} 0.2381930 0.8467153
[7]  {니트단품,스포츠,영캐주얼}         => {유니캐주얼} 0.2114990 0.8442623
[8]  {섬유,스포츠,영캐주얼}             => {니트단품}   0.2012320 0.8099174
[9]  {니트단품,장신구}                  => {섬유}       0.2032854 0.8114754
[10] {니트단품,트래디셔널캐주얼}        => {섬유}       0.2135524 0.8062016
[11] {영캐주얼,트래디셔널캐주얼}        => {유니캐주얼} 0.2053388 0.8264463
[12] {니트단품,스포츠,영캐주얼}         => {섬유}       0.2012320 0.8032787
[13] {섬유,영캐주얼}                    => {니트단품}   0.2546201 0.8000000
[14] {니트단품,유니캐주얼,피혁}         => {섬유}       0.2258727 0.8029197
[15] {스포츠,영캐주얼}                  => {유니캐주얼} 0.2710472 0.8198758
[16] {섬유,스포츠,영캐주얼}             => {유니캐주얼} 0.2032854 0.8181818
[17] {섬유,영캐주얼,피혁}               => {유니캐주얼} 0.2197125 0.8167939
[18] {유니캐주얼,트래디셔널캐주얼,피혁} => {스포츠}     0.2094456 0.8500000
[19] {영캐주얼,트래디셔널캐주얼}        => {스포츠}     0.2053388 0.8264463
[20] {섬유,영캐주얼,유니캐주얼}         => {스포츠}     0.2032854 0.8250000
[21] {니트단품,영캐주얼,유니캐주얼}     => {스포츠}     0.2114990 0.8240000
[22] {조리욕실,피혁}                    => {스포츠}     0.2053388 0.8196721
[23] {섬유,영캐주얼,피혁}               => {스포츠}     0.2197125 0.8167939
[24] {섬유,유니캐주얼,피혁}             => {스포츠}     0.2361396 0.8156028
[25] {니트단품,트래디셔널캐주얼}        => {스포츠}     0.2156057 0.8139535
[26] {유니캐주얼,트래디셔널캐주얼}      => {스포츠}     0.2299795 0.8115942
[27] {니트단품,유니캐주얼,피혁}         => {스포츠}     0.2279261 0.8102190
[28] {영캐주얼,유니캐주얼,피혁}         => {스포츠}     0.2381930 0.8055556
[29] {트래디셔널캐주얼,피혁}            => {스포츠}     0.2628337 0.8050314
[30] {섬유,트래디셔널캐주얼}            => {스포츠}     0.2217659 0.8000000
     lift
[1]  1.695551
[2]  1.694056
[3]  1.591197
[4]  1.585005
[5]  1.557267
[6]  1.550189
[7]  1.545698
[8]  1.528797
[9]  1.525824
[10] 1.515908
[11] 1.513080
[12] 1.510412
[13] 1.510078
[14] 1.509737
[15] 1.501051
[16] 1.497949
[17] 1.495408
[18] 1.473132
[19] 1.432311
[20] 1.429804
[21] 1.428071
[22] 1.420571
[23] 1.415582
[24] 1.413518
[25] 1.410660
[26] 1.406571
[27] 1.404187
[28] 1.396105
[29] 1.395197
[30] 1.386477
In [30]:
rules.target <- subset(rules, rhs %in% "스포츠" & lift > 1.4)
In [31]:
inspect(sort(rules.target, by="confidence"))
     lhs                                   rhs      support   confidence
[1]  {유니캐주얼,트래디셔널캐주얼,피혁} => {스포츠} 0.2094456 0.8500000
[2]  {영캐주얼,트래디셔널캐주얼}        => {스포츠} 0.2053388 0.8264463
[3]  {섬유,영캐주얼,유니캐주얼}         => {스포츠} 0.2032854 0.8250000
[4]  {니트단품,영캐주얼,유니캐주얼}     => {스포츠} 0.2114990 0.8240000
[5]  {조리욕실,피혁}                    => {스포츠} 0.2053388 0.8196721
[6]  {섬유,영캐주얼,피혁}               => {스포츠} 0.2197125 0.8167939
[7]  {섬유,유니캐주얼,피혁}             => {스포츠} 0.2361396 0.8156028
[8]  {니트단품,트래디셔널캐주얼}        => {스포츠} 0.2156057 0.8139535
[9]  {유니캐주얼,트래디셔널캐주얼}      => {스포츠} 0.2299795 0.8115942
[10] {니트단품,유니캐주얼,피혁}         => {스포츠} 0.2279261 0.8102190
     lift
[1]  1.473132
[2]  1.432311
[3]  1.429804
[4]  1.428071
[5]  1.420571
[6]  1.415582
[7]  1.413518
[8]  1.410660
[9]  1.406571
[10] 1.404187
In [32]:
rule.interest <- subset(rules, items %in% c("장신구", "섬유"))
inspect(rule.interest[1:10])
     lhs                            rhs      support   confidence lift
[1]  {영캐주얼,장신구}           => {피혁}   0.2012320 0.8596491  1.296127
[2]  {유니캐주얼,장신구}         => {피혁}   0.2012320 0.8521739  1.284857
[3]  {니트단품,장신구}           => {섬유}   0.2032854 0.8114754  1.525824
[4]  {니트단품,장신구}           => {피혁}   0.2114990 0.8442623  1.272928
[5]  {섬유,장신구}               => {피혁}   0.2156057 0.8333333  1.256450
[6]  {스포츠,장신구}             => {피혁}   0.2114990 0.8306452  1.252397
[7]  {니트단품,트래디셔널캐주얼} => {섬유}   0.2135524 0.8062016  1.515908
[8]  {섬유,트래디셔널캐주얼}     => {스포츠} 0.2217659 0.8000000  1.386477
[9]  {섬유,트래디셔널캐주얼}     => {피혁}   0.2340862 0.8444444  1.273203
[10] {섬유,유아동복}             => {피혁}   0.2114990 0.8174603  1.232518
In [32]:
write(rules.target, file="arules.csv", sep=",", row.name=F)
In [35]:
library(pmml) # 세계 표준 문서
In [36]:
write.PMML(rules.target, file = "arules.xml")
"arules.xml"

Visualize Association Rules using arulesViz package

In [38]:
library(arulesViz)
In [39]:
plot(rules)
In [43]:
plot(sort(rules, by = "lift")[1:20], method = "grouped")

In [44]:
plot(rules, method = "graph", control = list(type="items"))

Exercise

In [3]:
data <- read.delim("shoppingmall.txt", stringsAsFactors=FALSE)
In [4]:
head(data,3)
IDheelteeskirtknitjacketjewelrycoatflatshortsblous
111000000010
221000000100
331000000110
In [6]:
st <- as.matrix(data[,-1])
st[1:5,]
heelteeskirtknitjacketjewelrycoatflatshortsblous
1000000010
1000000100
1000000110
1000110000
1000000000
In [7]:
trans <- as(st, "transactions")
In [46]:
inspect(trans[1:20])
     items
[1]  {heel,shorts}
[2]  {heel,flat}
[3]  {heel,flat,shorts}
[4]  {heel,jacket,jewelry}
[5]  {heel}
[6]  {heel,jewelry,shorts,blous}
[7]  {heel,jewelry}
[8]  {heel}
[9]  {heel,jacket,shorts,blous}
[10] {heel}
[11] {heel}
[12] {heel,tee,skirt,jacket,jewelry,shorts}
[13] {heel}
[14] {heel,shorts,blous}
[15] {heel}
[16] {heel,skirt}
[17] {heel}
[18] {heel,shorts}
[19] {heel,jewelry,blous}
[20] {heel,knit,blous}
In [49]:
options(repr.plot.width=4,repr.plot.height=3)
image(trans[1:5])
In [47]:
inspect(trans[1:2])
    items
[1] {heel,shorts}
[2] {heel,flat}
In [51]:
options(repr.plot.width=4,repr.plot.height=7)
image(sample(trans, 100, replace = FALSE), main = "matrix diagram")
In [52]:
rules <- apriori(trans, parameter=list(support=0.01, confidence=0.8))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.8    0.1    1 none FALSE            TRUE       5    0.01      1
 maxlen target   ext
     10  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 7

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 786 transaction(s)] done [0.00s].
sorting and recoding items ... [10 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 7 8 done [0.00s].
writing ... [869 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
In [53]:
summary(rules)
set of 869 rules

rule length distribution (lhs + rhs):sizes
  2   3   4   5   6   7   8
  5  45 185 306 233  83  12

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  2.000   4.000   5.000   5.167   6.000   8.000

summary of quality measures:
    support          confidence          lift
 Min.   :0.01018   Min.   :0.8000   Min.   :1.625
 1st Qu.:0.01272   1st Qu.:0.8333   1st Qu.:1.885
 Median :0.01654   Median :0.8889   Median :2.031
 Mean   :0.02155   Mean   :0.8947   Mean   :2.055
 3rd Qu.:0.02417   3rd Qu.:0.9474   3rd Qu.:2.196
 Max.   :0.10051   Max.   :1.0000   Max.   :4.345

mining info:
  data ntransactions support confidence
 trans           786    0.01        0.8
In [57]:
inspect(sort(rules, by = "lift")[1:20])
     lhs                                   rhs      support    confidence
[1]  {knit,jewelry,flat}                => {jacket} 0.01145038 0.8181818
[2]  {knit,jewelry,flat,blous}          => {jacket} 0.01145038 0.8181818
[3]  {tee,knit,flat}                    => {jacket} 0.01017812 0.8000000
[4]  {tee,knit,flat,blous}              => {jacket} 0.01017812 0.8000000
[5]  {tee,knit,jewelry,flat}            => {jacket} 0.01017812 0.8000000
[6]  {tee,knit,jewelry,flat,blous}      => {jacket} 0.01017812 0.8000000
[7]  {heel,tee,skirt,flat,shorts,blous} => {jacket} 0.01526718 0.8000000
[8]  {skirt,jewelry,coat}               => {tee}    0.01399491 1.0000000
[9]  {skirt,coat,shorts}                => {tee}    0.01781170 1.0000000
[10] {jewelry,coat,shorts}              => {tee}    0.01526718 1.0000000
[11] {skirt,jacket,coat,shorts}         => {tee}    0.01272265 1.0000000
[12] {jacket,coat,shorts,blous}         => {tee}    0.01272265 1.0000000
[13] {jacket,jewelry,coat,shorts}       => {tee}    0.01017812 1.0000000
[14] {heel,jacket,coat,shorts}          => {tee}    0.01399491 1.0000000
[15] {skirt,jewelry,coat,blous}         => {tee}    0.01145038 1.0000000
[16] {skirt,coat,shorts,blous}          => {tee}    0.01526718 1.0000000
[17] {skirt,jewelry,coat,shorts}        => {tee}    0.01272265 1.0000000
[18] {heel,skirt,jewelry,coat}          => {tee}    0.01272265 1.0000000
[19] {heel,skirt,coat,shorts}           => {tee}    0.01653944 1.0000000
[20] {jewelry,coat,shorts,blous}        => {tee}    0.01399491 1.0000000
     lift
[1]  4.345209
[2]  4.345209
[3]  4.248649
[4]  4.248649
[5]  4.248649
[6]  4.248649
[7]  4.248649
[8]  2.487342
[9]  2.487342
[10] 2.487342
[11] 2.487342
[12] 2.487342
[13] 2.487342
[14] 2.487342
[15] 2.487342
[16] 2.487342
[17] 2.487342
[18] 2.487342
[19] 2.487342
[20] 2.487342