AWS Gunosy AWS Summit Tokyo 2018/06/01
自己紹介 - 米田 武 / Takeshi Yoneda / マスタケ - Github/Twitter: @mathetake - 2017/03/31: - MSc. in Mathematics at Osaka University - 2017/04/01~ - Machine learning engineer at Gunosy Inc. - Apply mathematics to - Recommendation System - Machine learning - Optimization problems - data engineering
-- AWS
10,000/day+ AWS
Examples: etc
Amazon s3 AWS Lambda Amazon DynamoDB Amazon Kinesis NoSQL DynamoDB Accelerator(DAX) Amazon EMR DynamoDB Hadoop
1.
Local Popularity() / => => #1.
Local Popularity() / => => #1.
* * #1.
Articles Title: Content: Title: Content: Title: Content: #1.
Articles Title: Content: Title: Content: Title: Content: #1.
Articles Title: Content: Title: Content: Title: Content: #1.
#1.
M #1.
lpush/rtrim #1.
u1 u2 u3 u6 u5 #1.
u1 u2 u3 u6 u5 #1.
u1 u2 u3 u5 u6 #1.
u1 u2 u3 u5 u6 #1.
#1.
Trigger Update Trigger Put Click logs stream [kinesis stream] Click logger [lambda] Click logs table [DynamoDB] UserVectorizer [lambda] UserVetor table [DynamoDB] BatchGet Trigger Put Crawler [EC2] ArticleVectorizer [EC2] ArticleVector table [DynamoDB] [DAX] : Gunosy, Gunosy -1- http://tech.gunosy.io/entry/realtime-vectorization-with-dynamodb #1.
Trigger Update Trigger Put Click logs stream [kinesis stream] Click logger [lambda] Click logs table [DynamoDB] UserVectorizer [lambda] UserVetor table [DynamoDB] BatchGet Trigger Put Crawler [EC2] ArticleVectorizer [EC2] ArticleVector table [DynamoDB] [DAX] : Gunosy, Gunosy -1- http://tech.gunosy.io/entry/realtime-vectorization-with-dynamodb #1.
Trigger Update Trigger Put Click logs stream [kinesis stream] Click logger [lambda] Click logs table [DynamoDB] UserVectorizer [lambda] UserVetor table [DynamoDB] BatchGet Trigger Put Crawler [EC2] ArticleVectorizer [EC2] ArticleVector table [DynamoDB] [DAX] : Gunosy, Gunosy -1- http://tech.gunosy.io/entry/realtime-vectorization-with-dynamodb #1.
Client app Log stream[kinesis stream] fluentd Server[EC2] fluentd Server Click logs stream[kinesis stream] filteramazon kinesis stream Trigger Update Log Stream [kinesis stream] Fluentd Server [EC2] Click logs stream [kinesis stream] Click logger [lambda] Click logs table [DynamoDB] #1.
Click logs stream Click logger[aws lambda] Kinesis stream Click logger Click logs table[dynamodb] updatem Trigger Update Log Stream [kinesis stream] Fluentd Server [EC2] Click logs stream [kinesis stream] Click logger [lambda] Click logs table [DynamoDB] #1.
Click logs stream Click logger[aws lambda] Kinesis stream Click logger Click logs table[dynamodb] updatem remove (ltrim) Log 1 Log 2 Log 3 Log M-1 Log M list append (rpush) #1.
Trigger Update Trigger Put Click logs stream [kinesis stream] Click logger [lambda] Click logs table [DynamoDB] UserVectorizer [lambda] UserVetor table [DynamoDB] BatchGet Trigger Put Crawler [EC2] ArticleVectorizer [EC2] ArticleVector table [DynamoDB] [DAX] : Gunosy, Gunosy -1- http://tech.gunosy.io/entry/realtime-vectorization-with-dynamodb #1.
Click logs table -> UserVectorizer[AWS lambda] DynamoDBStream PUT Trigger Put Click logs table [DynamoDB] UserVectorizer [lambda] UserVetor table [DynamoDB] BatchGet ArticleVector table [DynamoDB] [DAX] #1.
Click logs table -> UserVectorizer[AWS lambda] DynamoDBStream PUT Lambda Trigger Put Click logs table [DynamoDB] UserVectorizer [lambda] UserVetor table [DynamoDB] BatchGet ArticleVector table [DynamoDB] [DAX] #1.
UserVectorizer (DAX) > ArticleVector table[dynamodb] DynamoDBGET DAX UserVectorizer Trigger Put Click logs table [DynamoDB] UserVectorizer [lambda] UserVetor table [DynamoDB] BatchGet ArticleVector table [DynamoDB] [DAX] #1.
DAX(Amazon DynamoDB Accelerator) DynamoDB() ReadHeavy SDKDAXTCP Lambda DAX DynamoDB Trigger Put Click logs table [DynamoDB] UserVectorizer [lambda] UserVetor table [DynamoDB] BatchGet ArticleVector table [DynamoDB] [DAX] #1.
Trigger Update Trigger Put Click logs stream [kinesis stream] Click logger [lambda] Click logs table [DynamoDB] UserVectorizer [lambda] UserVetor table [DynamoDB] BatchGet Trigger Put Crawler [EC2] ArticleVectorizer [EC2] ArticleVector table [DynamoDB] [DAX] Crawler ArticleVectorizer[EC2]ArticleVector tableput #1.
Trigger Update Trigger Put Click logs stream [kinesis stream] Click logger [lambda] Click logs table [DynamoDB] UserVectorizer [lambda] UserVetor table [DynamoDB] BatchGet Trigger Put Crawler [EC2] ArticleVectorizer [EC2] ArticleVector table [DynamoDB] [DAX] () #1.
2.
Kinesis / DynamoDB / DAX / lambda / #2.
= #2.
= #2.
= #2.
(, etc.) t a - - b #2.
t a - - b #2.
t a - - b #2.
t ab a - - b #2.
t ba a - - b #2.
t ba a - - b #2.
50msec or die. https://tenshoku.mynavi.jp/it-engineer/knowhow/naoya_sushi/05
in 50 msec Background UserVector table GET user vector 0 50 (msec) #2.
in 50 msec Background UserVector table GET user vector 0 50 (msec) DynamoDB #2.
in 50 msec Background UserVector table GET user vector 0 50 (msec) DynamoDB #2.
in 50 msec Background UserVector table GET user vector 0 50 (msec) DynamoDB #2.
in 50 msec Background UserVector table GET user vector 0 50 (msec) #2.
in 50 msec Background UserVector table GET user vector 0 50 (msec) #2.
in 50 msec Background UserVector table GET user vector 0 50 (msec) * 25msec / request *Application Load Balancer `TargetResponseTime` #2.
in 50 msec Background UserVector table GET user vector 0 50 (msec) #2.
API DataLake API[EC2] Hive Metastore ~MB digdag #2.
API DataLake API[EC2] Hive Metastore ~MB digdag #2.
API s3(mb) DataLake API[EC2] Hive Metastore ~MB digdag #2.
API DataLake API[EC2] Hive Metastore ~MB digdag #2.
API s3 DataLake API[EC2] Hive Metastore ~MB digdag #2.
API s3 DataLake API[EC2] Hive Metastore ~MB digdag #2.
API DataLake API[EC2] Hive Metastore ~MB digdag #2.
API EMR(Elastic MapReduce) DataLake API[EC2] Hive Metastore ~MB digdag #2.
DataLake Hive Metastore ~MB digdag RDS digdag airflow #2.
RemoteHive Metastore DataLake Hive Metastore ~MB digdag RDS digdag airflow #2.
RemoteHive Metastore DataLake Hive Metastore ~MB digdag RDS digdag airflow #2.
(Remote) Hive Metastore HDFS* (Spark/Hive/Presto)Metastore Amazon RDS for MySQL CREATE EXTERNAL TABLE `users`( `id` bigint, `enabled` boolean, `admin_enabled` boolean, `created_at` string, `updated_at` string ) STORED AS PARQUET LOCATION 's3://hogefuga-log/hive/user.db/users' TBLPROPERTIES ( parquet.compress'='snappy'); *HDFS = Hadoop Distributed File System. s3file SystemHadoop S3 filesystem s3hdfs #2.
Spark on EMRMetaStore SparkQL DataLake Hive Metastore ~MB digdag RDS digdag airflow #2.
Hive MetastorePresto DataLake Hive Metastore ~MB digdag RDS digdag airflow #2.
Airflow DataLake Hive Metastore ~MB digdag RDS digdag airflow #2.
/s3 Metastore DataLake Hive Metastore ~MB digdag RDS digdag airflow #2.
S3 + Hive Metastore = Awesome! HQL > Drop Table SELECT column1, column2 FROM hive.db.hoge WHERE dt = 20180601 S3 < write(=s3put) EMR Hive MetastoreEMR,,, #2.
Lambda, Kinesis stream, DynamoDB, DAX DAX API DynamoDB & s3 S3 + Hive Metastore & S3&
We are hiring! 募集職種例 データ分析エンジニア サービスのKPI等の統計情報の設計 / 収集 / 分析 機械学習 自然言語処理エンジニア 上記技術を含め数理モデルを駆使したアルゴリズムの開発 https://gunosy.co.jp/recruit/requirements/engineer/ アプリ開発エンジニア ios / Android アプリの開発 サーバーサイドエンジニア 各プロダクトのサーバーサイド開発 Gunosy 採用