Social Network Mining
Social network Semantic Web, KM, Our lives are enormously influenced by relations to others. SNS Mixi, myspace, LiveJournal, Yahoo!360 FOAF WebBlog Web mining Social network mining A social network of WWW2006 organizers
Extracted Social Network for JSAI community (Japan Society of Artificial Intelligence)
project coauthorship laboratory Co-attendance to a conference
How to make the network?! Node! Edge Conference participants / members of a society We need a list of names and affiliations. Obtained by Web information. Using a search engine, the co-occurrence of two persons name on the Web is measured. Relation types(edge labels) Coauthor, Lab, Proj, Conf Also, keywords of each researcher, clusters of reserachers
Two names 742 hits Publication -> coauthorship My homepage Laboratory page -> same laboratory relation
Co-occurrence In JP domain, Yutaka Matsuo (X) AND Mitsuru Ishizuka (Y1): 124 hits Yutaka Matsuo (X) AND Riichiro Mizoguchi (Y2): 11 hits Y1: 791 hits Y2: 813 hits X: 500 hits Jaccard coefficient XY1 / X Y1 = 124 / (791+500-124) = 0.11 Jaccard coefficient XY2 / X Y2 = 11 / (813+500-11) = 0.08 Simpson(overlap) coefficient XY1 / min( X, Y1 ) = 0.248 = 0.11 Simpson(overlap) coefficient XY2 / min( X, Y1 ) = 0.022 X and Y1 is a stronger relation than X and Y2!
Rel(X, Y) > TH Jaccard = XY / XY AI research community, Computer Science community
WebEmailFOAFPublication Rel(X, Y) > TH Jaccard = XY / XY X > K, Y > K Semantic Web community
Webface-to-face community Rel(X, Y) > TH overlap = XY /min( X, Y ) X > K, Y > k AI
Referral Web (H. Kautz et al, 1997) Jaccard Flink (P. Mika, 2004, Free Email, FOAF, Web Jaccard Semantic Web OR Ontology ISWC04 SW challenge award, ISWC05 best paper A. McCallum et al. (2004- e-mail Web [Harada04]NTT [Faloutsos04] [Kees04]
T. Finin(U. Maryland) DBLPFOAF COI (Conflict of Interest) L. Adamic (Friends and neighbors on the Web) EmailSNSBlog Staab(Karsruhe) PANKOW Hearst mountains such as Mt. Fuji, Mt. Akagi, Web Self-annotating Web
ubiquitous OR pervasive Semantic Web OR Ontology Yutaka Matsuo ( University of Tokyo OR National institute of Advanced Industrial Science and Technology OR AIST [Bekkerman05][Li05][Guha][Lloyd05][Marlin05][Bollegara06] [Mann03][Wacholder97] X -> 3G1-5 Extracting Key Phrases to Disambiguate Personal Names on the Web
NLP Web as corpusa. Kilgarriff, M. Baroni@U. of Bologna) Web WebWeb countnlp Lapata, U. Sheffield) TOEFL(Turney, National Research Council) (IJCAI05)
Web
Small pseudocodes GoogleHit: GoogleTop: k
X Y Y.. Name disambiguation Node expansion
Advanced algorithms A1: A2: O(n^2) O(n) A3:
GoogleCoocGoogleTop Web 1000 Web, GoogleTop [Anagnostopoulos05,Bar-Yossef06] GoogleCooc Search engine for NLP [Cafarella05] Relate-Identify
Relate-Identify Initial data of persons Improved identification - Name disambiguation - Record linkage - Centrality, Clustering IDENTIFY - StructuralEquiv X Adjacent matrix X Name: Affiliation: Research topic: Publication: - ExtractKeywords RELATE - GetSocialNet - ClassifyRelation - ContextSim Extract relation X W Affiliation matrix For integration of multiple social networks, see Y.matsuo et. al, Spinning social networks for Semantic Web to appear in AAAI-06
Social Network Mining 3D4-3 Web 1A3-4 Web
Social Network Mining Web Web
AI:
-- by
Web Identify-Relate