No. 178 A Clustering Method for Extracting Closely Similar Recipes in User-generated Recipe Sites 2016 3
2 1 2 Summary Nowadays, blog type recipe portal site such as Recipe blog and user-generated recipe sites such as Cookpad become popular. It is easy for users to post and browse the information of food and recipes. For example, Recipe blog users post information such as trivia and health food. In the case of the Cookpad, the content is consists of ingredients list, images and cooking directions. Recipe cites are generated by many people, than much information exists on the recipe cites. Therefore, it is difficult to comprehend recipes. We propose two methods of how to extract recipe information from the internet. First, recently, people concern about food for the health-conscious is heightened. Therefore, it is easy for users to get the information of food for the health-conscious. However, these recipes may not be usually food, such as smoothie and potage. Therefore, we propose a method to extract alternative ingredients for health-conscious. Second, deliberately or accidentally, numerous closely similar recipes are posted among the user-generated recipes. These recipes cause information overload. In fact, they impede user s recipe searches. We proposed a clustering method to extract closely similar recipes in user generated recipe sites. We propose a method to extract alternative ingredients of health-conscious and closely similar recipes from recipe cites. Therefore, it becomes easy to the user s recipe search.
1 1 2 2 3 4 3.1............................... 5 3.2............ 6 3.3................... 8 3.4.................................... 9 3.4.1............ 10 3.4.2 Web N............... 12 3.5........................... 14 4 15 4.1...................... 16 4.1.1 1........... 16 4.1.2 2.......... 17 4.2...................... 20 4.2.1................. 21 4.2.2........ 21 4.2.3 Repeated Bisection.............................. 23 4.3................................ 23 4.4.......................................... 24 4.4.1 1................... 24 4.4.2 2 S RF IIF................... 26 4.4.3 3......................... 26 5 28
1............... 4 2..................................... 7 3..................................... 18 4................ 21 5................. 22 6 Repeated Bisection.............................. 23 7................................ 25 8..................................... 31
1............................... 6 2............................... 8 3................................. 9 4.................................... 9 5...................... 10 6 1,000................... 10 7.............................. 11 8 Dice 5....... 12 9 5... 13 10 7 5.............. 14 11 7 5......... 15 12......................... 16 13............................... 17 14................................ 19 15........................ 26 16 3.......................... 27 17 S RF IIF.................. 28 18 F................................ 28 19................... 29 20....... 30
1 1 FOODIES 2 3 4 2 1 5 2 46,000 2016 1 1 http://www.recipe-blog.jp/ 2 FOODIES http://recipe.foodiestv.jp/ 3 http://cookpad.com/ 4 http://recipe.rakuten.co.jp/ 5 - http://www.nisshin-oillio.com/report/report/images/120723/120723.pdf 1
[1] 2 3 4 5 2 Google 6 7 8 [2] [3] [4] word2vec Teng [5] 2 Forbes [6] 6 - - Google http://www.google.co.jp /landing/recipes/ 7 http://nestle.jp/recipe/ 8 http://recipe.gnavi.co.jp/ 2
[7] Web [8] Geleijnse [9] Pinxteren [10] [11] [12] DP Wang [13] Li [14] [15] Kuo [16] [17] [18] Web n-gram [19] & [20] 2 3
1: 3 1 1 2 4
3 4 5 3.1 1 2 1 n 3 n 9 44 100 10 90 9 http://www.nii.ac.jp/dsc/idr/rakuten/rakuten.html 10 100 - http://chefgohan.gnavi.co.jp/base100/ 5
1: 1 1 2 3.2 2 1 2 3 6
2: α=5 β=50 4 3 2013 10 1 2014 9 31 1 11 65,192 12 2 3 2 3 11 http://ameblo.jp/ 12 http://fooddb.mext.go.jp/ 7
2: A C B1 3.3 4 (IPC) A23L( ) A47J() H05B( 8
3: B1 C 4: ) (1994 2009 ) 91,736 5 3.2 3.4 Google Web N 1 [21] [22] [22] 3 9
5: 6: 1,000 3.4.1 [23] 240 1,000 71 6 (1) 1,000 7 (2) (3) 10
7:... Dice Dice [24] Dice (4) 3 Dice Dice dice(x, Y ) = 2 X Y X + Y (1) X Y cosine(x, y) = xi, y i x 2 i y 2 i (2) euclid(x, y) = (xi y i ) 2 (3) manhattan(x, y) = x i y i (4) x Dice y Dice i 1,000 3.2 Dice 5 8 8 11
8: Dice 5 Dice 0.035 0.014 0.012 0.006 0.006 Dice 0.146 0.025 0.022 0.019 0.012 Dice 0.01 0.006 0.004 0.004 0.002 Dice 9 9 3.4.2 Web N Web N 1 [21] N Web 200 20 7 3.4.1 12
9: 5 7 N Google Web N 1 [21] 20 7 3.2 7 5 10 10 7 Dice 7 11 11 13
10: 7 5 9178 2446 1331 833 870 82 76 52 23 20 871 355 150 112 66 3.5 2 7 12 12 14
11: 7 5 4 15
12: 4.1 2 4.1.1 1 20 8 5 16
13: 2.73 2.65 2.38 2.68 4.15 3.44 4.58 4.43 2.55 2.50 5 25 5 1 2 3 4 5 25 5 13 13 4.1.2 2 12 25 50 1 2 3 3 14 5 17
3: 1 2 3 4 5 14 5 5 4 83.7% 5 4 80% 10 75% 18
14: 4 5 (%) (%) (%) 91.7 75.0 83.3 41.6 de 83.3 91.7 91.7 25.0 58.3 100.0 100.0 41.6 25.0 100.0 75.0 83.3 de 25.0 91.7 83.3 16.7 91.7 83.3 91.7 25.0 75.0 91.7 83.3 33.3!? 100.0 75.0 75.0 8.3 75.0 83.3 75.0 25.0 50.0 75.0 75.0 50.0 83.3 83.3 83.3 25.0! 66.7 91.7 75.0 50.0 66.7 75.0 83.3 58.3 91.7 91.7 83.3 8.3!! 91.7 83.3 91.7 33.3 20! 66.7 91.7 91.7 8.3 16.7 75.0 75.0 8.3 75.0 83.3 91.7 8.3 91.7 75.0 75.0 50.0 83.3 83.3 83.3 8.3 83.3 75.0 83.3 33.3 66.7 91.7 75.0 25.0 75.0 75.0 100.0 41.7 66.7 75.0 83.3 25.0? 75.0 75.0 100.0 25.0 (%) 69.3 83.7 83.3 30.3 20 % 83.3% 70% 30.3% 19
4.2 4 4 4 13 75.1% 26.4% 56.5% 5 1. 2. 3. 4. 5. 6. 13 http://www.maruha-nichiro.co.jp/news center/research/pdf/20130227 recipe cyousa.pdf 20
4: 4.2.1 4.1 1 Repeated Bisection [25] Repeated Bisection 4.2.3 4.2.2 1 1 1 1 4.1 21
5: RF-IIF Recipe Frequency-Inverted Ingredient Frequency) [26] S RF IIF S RF IIF S RF IIF i,m = α log R m + β log R m R it,m R io,m (5) i m R m m R it,m m i t R io,m m i o α i β i 1 1 Repeated Bisection 22
4.2.3 Repeated Bisection 6: Repeated Bisection Repeated Bisection bayon 14 CLUTO 15 K-means k = 2 n 1 n Repeated Bisection 6 1 1 2 2 3 4 5 6 5 4.3 PHP 16 Solr 17 18 14 Bayon - a simple and fast clustering tool - Google Project Hosting http://code.google.com/p/bayon/ 15 CLUTO - Software for Clustering High-Dimensional Datasets http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview 16 PHP http://php.net/ 17 Apache Solr http://lucene.apache.org/solr/ 18 http://www.nii.ac.jp/dsc/idr/cookpad/cookpad.html 23
bayon 7 a 7 b 7 c 7 d 7 7 a 7 b 7 c 7 d 4.4 3 S RF IIF 4.4.1 1 Repeated Bisection Repeated Bisection Repeated Bisection GibbsLDA++ 19 1 15 Repeated Bisection Repeated Bisection Bayon 1.0 N R GibbsLDA++ N L Repeated Bisection N R 15 GibbsLDA++ α = 50/N L β=0.1 1,000 16 3 16 GibbsLDA++ 3 1-3 Repeated Bisection 1 2 19 GibbsLDA++ http://gibbslda.sourceforge.net/ 24
7: 25
15: N L N R 8,562 250 247 3871 200 183 5643 250 233 1 2 Repeated Bisection 4.4.2 2 S RF IIF S RF IIF α β α=1.0 0.1 1 0.1 17 Bayon 1.0 F 18 18 β 0.5 F β=0.5 4.4.3 3 2 17 1 19 19 1 26
16: 3 GibbsLDA++ Repeated bisection 0.13136 0.99987 1 0.11073 0.00754 0.09697 0.00660 0.19656 0.99999 2 0.13123 0.00386 0.08967 0.00386 0.13484 0.99983 3 0.11810 0.00920 0.10130 0.00920 0.18139 0.99994 4 0.18139 0.00566 0.08768 0.00566 GibbsLDA++ Repeated bisection 0.10287 0.99991 1 0.03541 0.00523 0.03541 0.00262 0.11051 0.97670 2 0.11051 0.10310 0.09686 0.10310 0.09064 0.99871 3 0.09064 0.03318 0.07578 0.02709 0.13066 0.99927 4 0.07891 0.02498 0.05304 0.02040 GibbsLDA++ Repeated bisection 0.10818 0.95634 1 0.09742 0.00263 0.05436 0.00263 0.07013 0.81649 2 0.05942 0.00828 0.04872 0.06296 0.07581 0.81649 3 0.04687 0.01424 0.03530 0.00282 0.27663 0.94721 4 0.05966 0.00472 0.02515 0.00472 27
17: S RF IIF 5,885 135 28,525 230 8,446 146 9,147 142 5,284 98 18: F F 0.1 0.4118 0.7778 0.5385 0.2 0.3333 0.7778 0.4667 0.3 0.3684 0.7778 0.5 0.4 0.4667 0.7778 0.5833 0.5 0.5833 0.7778 0.6667 0.6 0.4118 0.7778 0.5685 0.7 0.3889 0.7778 0.5185 0.8 0.4615 0.6667 0.5455 0.9 0.3529 0.6667 0.4615 1 0.4444 0.4444 0.4444 19 1 20 20 1 8 8 0.6 0 1 2 1 2-3 2 5 28
19: 1 2 3! 1 2 3!!!!? 1 2 3?!! 1 2 3!!? 1 2 3! 2 1 1 29
20: 1 2 3 & 1 2 3!!!..15..!! 1 2 3 15 1 2 3 () 1 2 3!,, & 7 30
8: S RF IIF 2 1 1 2 100cc 500cc 31
26 4 28 3 2 DE 2014 160 (SIG-DBS) 2014,, 7 (DEIM2015) 2015 ARG 4 Web 2015 8 (DEIM2016) 2016 (to appear) 32
, Vol. 8, No. 2, pp. 73-87, 2016. 7 (DEIM2015) 2015 8 (DEIM2016) 2016 (to appear) Shunsuke Hanai, Hidetsugu Nanba, Akiyo Nadamoto, Clustering for Closely Similar Recipes to Extract Spam Recipes in User-generated Recipe Sites The 17th International Conference on Information Integration and Web-based Applications & Services(iiWAS 15), December 11-13, Brussels, Belgium, pp. 252-256, 2015. - 160 (SIG-DBS) 2014 [1] pp.364-371 1997. [2] Shidochi, Y., Takahashi, T., Ide, I. and Murase, H. Finding replaceable materials in cooking recipe texts considering characteristic cooking actions, Proc. ACM multimedia 2009 workshop on Multimedia for cooking and eating activities, pp. 9-14, 2009. [3], vol.113, no.214, DE2013-36, pp. 19-24, 2013 33
[4] word2vec vol.114 no. 204 DE2014-31 pp. 41-46 2014 [5] Teng, C., Lin, Y. and Adamic, L. A. Recipe recommendation using ingredient networks, Proc. 4th International Conference on Web Science, pp. 298-307, 2011. [6] Forbes, P. and Zhu, M. Content-boosted matrix factorization for recommender systems: experiments with recipe recommendation, Proc. 5th ACM conference on Recommender systems, pp. 261-264, 2011. [7] WWW 16 2010 [8] Vol.8 No.4 pp.1-6 2010 [9] Geleijnse, G., Nachtigall, P., van Kaam, P. and Wijgergangs, L. A personalized recipe advice system to promote healthful choices, Proc. 16th international conference on Intelligent user interfaces, pp. 437-438, 2011. [10] Pinxteren, Y. V., Geleijnse, G. and Kamsteeg, P. Deriving a recipe similarity measure for recommending healthful meals, Proc. 16th international conference on Intelligent user interfaces, pp. 105-114, 2011. [11],, 4 D9-2, 2012. [12], 14 pp.959-962 2008. [13] Wang, L., Li, Q., Li, N., Li, G. and Yang, Y. Substructure similarity measurement in chinese recipes, Proc. 17th International Conference on World Wide Web, pp. 979-988, 2008. [14] Li, Q., Chen, W. and Yu, L. Community-based recipe recommendation and adaptation in peer-to-peer networks, Proc. 4th International Conference on Uniquitous Information Management and Communication, pp. 18:1-18:6, 2010. [15] Mori, S., Sasada, T., Yamakata, Y. and Yoshino, K. A machine learning approach to recipe text processing, Proc. 1st Cooking with Computer Workshop, pp. 29-34, 2012. [16] Kuo, F., Li, C., Shan, M. and Lee, S. Intelligent menu planning: recommending set of recipes by ingredients, Proc. ACM multimedia 2012 workshop on Multimedia for cooking and eating activities, pp. 1-6, 2012. [17] Yamakata, Y., Imahori, S., Sugiyama, Y., Mori, S. and Tanaka, K. Feature extraction and summarization of recipes using flow graph, Proc. 5th International Conference on Social Informatics, pp. 241-254, 2013. 34
[18] Web, D 90(11) 2989-2999 2007 [19] Web,. ET 111.332: pp. 1-6 2011 [20],. ET, 110.453: pp. 119-124 2011 [21],, Web N,, 2007. [22] BMFT 2010 [23] Twitter Web vol.114 no.204 DE2014-31 pp. 19-24 2014 [24] Twitter DEIM Forum 2014 B6-6 2014 [25] Zhao, Y. and Karypis, G. Comparison of agglomerative and partitional document clustering algorithms, Proc. SIAM Workshop on Clustering High-dimensional Data and its Applications, pp. 83-93, 2002. [26],,,,,, vol.31, no.3, pp.70-78, 2013 35