Web A Survey of Sparse data Handling on Web Marketing Data 1 1 Takumi Uchida 1 Kenichi Yoshida 1 1 1 3-29-1 Abstract: Web marketing is one of the important company activities. Here, its data is often high dimension and sparse. This sparseness disturbs the analysis of web marketing. To alleviate this problem, various approaches are studied. For example, clustering of users and items are used to improve collaborative filtering results. Predictor variable reduction is used to estimate user s response from web-advertisement. In this paper, we classify purposes of web marketing in three objective (i.e., Customer Attraction, Customer Retention, and Cross-Sales), and survey solutions for sparse data problem for each objective. 1 [2] Web Customer Attraction( ), Customer Retention(), Cross-Sales( ) CRM 3 1) Customer Relationship Management(CRM) [7] Customer Attraction Direct Marketing Customer Retention Loyalty Program, Cross-Sales Market basket analysis 2) Web [2] 3) Web 3 Web 3 2 Web Customer Attraction ( ) ( 3-29-1) E-mail:skillful.boy@gmail.com Computational Advertisement ([1, 3, 9] ) Web % Customer Retention Loyalty Program e Cross-Sales Web
Webマーケティングの 目 的 施 策 代 表 的 な 手 法 Customer Attention Computational Advertisement Logistic Regression Customer Retention Loyalty Program Matrix Factorization 3.1 [5] Cross-Sales Recommender System Content-Based Recommendations [5] Personalization Collaborative Filtering Feature-based modeling Similarity-based collaborative filtering 1: Web Web ([8, 10]) Personalization [6] Cross-Sales Web Customer Attention, Customer Retention, Cross-Sales Collaborative Filltering Association rules URL Web 1 Computational Advertisement Personalization 3 Computational Advertisement Computational Advertisement(CA) Web ( ) (, ) Matrix factorization Feature-based modeling [9] Web [11] 3.2 Similarity-based collaborative filtering k-nearest neighbor Personalization Matrix factorization SVD Personalization 3.2 [11] AIC Web ( ) p i c i
n i CVR c i /n i CVR p i Binominal(p i ) p i p i c i /n i p i CVR c i E[c i ] = n i p i (1) V ar[c i ] = n i p i (1 p i ) (2) (2) CVR( p i ) V ar[ p i ] = V ar[ c i ] = n i p i (1 p i ) n i n 2 = p i(1 p i ) (3) i n i (3) CVR p i CVR [11] CVR i CVR E i CVR E i = p i (1 p i ) n i (, p i = c i /n i ) (4) CVR CVR 20 CVR {, } {20, 30, 40 } {,, } 2*3*3=18 CVR CVR CVR {,, 30, 40 } セグメントID 10-19 20-29 エリア: 東 京 クリック 数 CV 数 ad01 0 0 0 500 0 ad02 0 0 1 600 2 ad03 0 1 0 450 4 ad04 0 1 1 300 2 ad05 1 0 0 800 1 ad06 1 0 1 950 2 ad07 1 1 0 50 0 ad08 1 1 1 150 2 セグメントID 10-19 20-29 2 エリア: 東 京 AIC 1,000 クリック 数 CV 数 ad01+ad02 0 0-1,100 2 ad03+ad04 0 1-750 6 ad05+ad06 1 0-1,750 3 ad07+ad08 1 1-200 2 AIC 950 1 左 記 のデータに 対 して 以 下 のモデルに 従 うポアソン 回 帰 分 析 を 行 う log = + 2 回 帰 結 果 のAICの 値 が 改 善 する 説 明 変 数 を 探 索 し 削 除 する 3AICの 値 が 改 善 し 続 けるま で 上 記 12を 繰 り 返 す 結 果 広 告 セグメントの 数 が 減 り 各 セグメントのデータ 量 が 増 える 2: CVR { } {20, } {,, } 1*2*3=6 18 CVR CVR ( 2) CVR ( ) ( 2 ad01 ad02 ) logcv R i = log c i n i = β 0 + J β j x ij (5) j=1 (5) x ij CVR i j AIC 80.97% 64.5% 4 Personalization Personalization Content-Based Recommendations(CBR) Collaborative Filtering(CF)
CBR ( ) Web CF k-nearest neighbor 4.1 Collaborative Filtering Web CF u v i r u,r v u v s(u, v) = i ((r u,i r u )(r v,i r v )) i ((r u,i r u ) 2 ) i ((r v,i r v ) 2 ) (6) k-nearest neighbor neighbor V v i u p(u, i) = r u + v s(u, v)(r v,i r v ) v s(u, v) (7) user-base item-base CF CBR 4.2 Matrix Factorization Web Personalization Matrix Factorization(MF) [4] Netflix Prize contest 1: R um Movie1 Movie2 Movie3 User1 5 User2 4 User3 2 5 User4 1 4 User5 4 User6 3 2 1 5 ( 1 ) R um MF 8 r um = µ + b user (u) + b movie (m) + a um (8) u m r um µ b user (u)( ) ( )b movie (m) SVD( ) a um 4 4 SVD a um base um = µ+b user (u)+b movie (m) 8 base um a um a um = r um base um (9) a um A f 1,f 2 2 F u F m A 10 A = F u F T m (10) R um base um A fu, fm 10 3
A m 1 m 2 m 3 u 1 0.00 u 2 0.00 u 3-0.58 0.58 u 4-0.88 0.88 u 5 0.00 u 6 0.21-0.21 F u f 1 f 2 = u 1 fu 11 fu 21 u 2 fu 21 fu 22 u 3 fu 31 fu 23 u 4 fu 41 fu 24 u 5 fu 51 fu 25 u 6 fu 61 fu 26 F m m 1 m 2 m 3 f 1 fm 11 fm 21 fm 31 f 2 fm 12 fm 22 fm 32 3: 10 F u,f m u 2 m 1 u m S k min : um (u,m) S(a k fu uk fm mk ) 2 (11) MF A k Personalization 11 5 Web Customer Attention, Customer Retention, Cross-Sales Computational Advertisement Personalization Computational Advertisement Feature-based modeling Personalization MF SVD CF k-nearest neighbor Web Computational Advertisement Personalization Web... [1] Deepak Agarwal, Rahul Agrawal, Rajiv Khanna, and Nagaraj Kota. Estimating rates of rare events with multiple hierarchies through scalable log-linear models. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 213 222. ACM, 2010. [2] Alex G Büchner and Maurice D Mulvenna. Discovering internet marketing intelligence through online analytical web usage mining. ACM Sigmod Record, Vol. 27, No. 4, pp. 54 61, 1998. [3] Patrali Chatterjee, Donna L Hoffman, and Thomas P Novak. Modeling the clickstream: Implications for web-based advertising efforts. Marketing Science, Vol. 22, No. 4, pp. 520 541, 2003.
[4] Y. Koren. The bellkor solution to th netflix grand prize. 2009, 2009. http://www.netflixprize.com/assets/ GrandPrize2009_BPC_BellKor.pdf. [5] Bing Liu. Web data mining: exploring hyperlinks, contents, and usage data. Springer Science & Business Media, 2007. [6] Bamshad Mobasher, Robert Cooley, and Jaideep Srivastava. Automatic personalization based on web usage mining. Communications of the ACM, Vol. 43, No. 8, pp. 142 151, 2000. [7] Eric WT Ngai, Li Xiu, and Dorothy CK Chau. Application of data mining techniques in customer relationship management: A literature review and classification. Expert systems with applications, Vol. 36, No. 2, pp. 2592 2602, 2009. [8] Alexandrin Popescul, David M Pennock, and Steve Lawrence. Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pp. 437 444. Morgan Kaufmann Publishers Inc., 2001. [9] Matthew Richardson, Ewa Dominowska, and Robert Ragno. Predicting clicks: estimating the click-through rate for new ads. In Proceedings of the 16th international conference on World Wide Web, pp. 521 530. ACM, 2007. [10] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Application of dimensionality reduction in recommender system-a case study. Technical report, DTIC Document, 2000. [11] Takumi Uchida, Koken Ozaki, and Kenichi Yoshida. Toward a faithful bidding of web advertisement. In HCI in Business, pp. 112 118. Springer, 2014.