Good Data Lake Labs on AWS

These are some of good labs that can be tried on AWS regarding Big Data leveraging on Data Lake strategy.

Please kindly read through in here:
– simple lab: https://bit.ly/2wk6q6O
– moderate lab: https://shorturl.at/gk037
– longer lab (the diagram below): https://github.com/aws-samples/amazon-serverless-datalake-workshop (it will create custom page, i.e: https://s3.us-east-1.amazonaws.com/starxforce-ecommerce-datalake-ingestionbucket-ip2te8auqgxv/instructions/instructions.html)

data lake architecture

Note: This sql code below is just my quick demo for lab purpose (please ignore this):

=== athena ===

SELECT state,
        request as page,
         count(request) AS totalviews
    FROM zipcodesdata z, joindatasets  m
    WHERE z.zipcode = m.zip
    GROUP BY  state, request
    ORDER BY  state

=== redshift spectrum ===

SELECT count(*) as TotalCount FROM "weblogs"."useractivity" where request like '%Dogs%';

SELECT username, COUNT(timestamp) 
FROM local_weblogs.useractivity
GROUP BY username;

SELECT username, COUNT(timestamp) 
FROM weblogs.useractivity
GROUP BY username;

SELECT ua.username, first_name, last_name, COUNT(timestamp) 
FROM local_weblogs.useractivity ua
INNER JOIN weblogs.userprofile up ON ua.username = up.username
GROUP BY ua.username, first_name, last_name limit 100;

SELECT * FROM local_weblogs.useractivity_byuser LIMIT 100;

Kind Regards,
Doddi Priyambodo

Leave a Reply