Friday, December 7, 2018

Takeaways and highlights from AWS re:Invent 2018

I had an opportunity to attend the seventh installment of AWS re:Invent. It was indeed a large gathering with more than 50,000 in attendance. Despite the size of crowds, the conference was very well run. I was not able to reserve a seat for all the workshops and sessions I was interested in advance but I was able to attend most of these workshops and sessions by queuing up in the walk-up line. Here are some of the takeaways and highlights.

Machine Learning was front and center

AWS provides ML capabilities at three levels of abstraction.
  1. Fully managed services such as AWS Rekognition, Polly, Amazon Comprehend (NLP), Alexa, etc. AWS introduced Textract, a smart OCR service.
  2. Managed execution of pre-built or custom ML models. SageMaker fills this role. 150+ machine learning algorithms are being made available in the AWS Marketplace.
  3. Infrastructure for running ML tools such as MXNet, PyTorch, TensorFlow, etc.
  • Several new sub-services for SageMaker were announced. For models that require a manual effort to train, AWS introduced SageMaker Ground Truth for classification of data via Mechanical Turk, or other sources. AWS also introduced SageMaker RL which performs training of models through rewards over time. To promote this service, AWS introduced DeepRacer, a fully autonomous 1/18th scale race car driven by reinforcement learning to help developers gain hands-on working knowledge of SageMaker RL. Amazon SageMaker Neo enables machine learning models to train once and run anywhere in the cloud and at the edge with optimal performance.
  • AWS also announced AWS Inferentia, a new inference chip (yes, a custom-built computer chip) that promises to significantly reduce the time it takes to draw inference from an ML model.
  • ML Insights works with Amazon QuickSight, a BI Service for interactive dashboards. ML Insights adds ML-powered anomaly detection, ML-powered forecasting, and Auto-narratives (add text descriptions automatically) to QuickSight dashboards.
  • AWS Personalize is managed service for building and consuming recommendations models. It does the heavy lifting needed to design, train, and deploy a machine learning model under the covers.
  • AWS RoboMaker provides a robotics development environment for application development (open-source robotics software framework, Robot Operating System,ROS), a robotics simulation service to accelerate application testing, and a robotics fleet management service for remote application deployment, update, and management.

AWS IoT services matured and can now connect to more things

  • AWS IoT Core can now ingest data directly, bypassing the MQTT broker, by having the thing publish data to $aws/rules/ruleName topic. This eliminates the additional time and cost of publishing data to an IoT topic before it reaches the rules engine for desired processing.
  • AWS IoT SiteWise opens up AWS IoT to data from industrial devices. It runs on a gateway that resides in customer's facilities and automates the process of collecting and organizing industrial equipment data.
  • AWS IoT Events - Managed service to analyze patterns in IoT data respond accordingly.
  • AWS IoT Things Graph can be used to connect devices and web services to build IoT applications. With this service, one can define interactions between them to build multi-step automation applications.

Serverless computing saw some important improvements

  • AWS Lambda now natively supports Ruby.
  • Lambda now supports custom runtimes (any runtime that can run on Unix) via Lambda Runtime API. C++ and Rust are now supported on Lambda using this new feature. Some other languages that third-parties have enabled on AWS Lambda are Erlang, Elexir, COBOL, and PHP. This feature will certainly encourage migration of legacy code to Lambda.
  • A new feature of Lambda called Layers allows lambda functions to share code and data. For example, if several Lambda functions use a common library, that library does not need to deployed (duplicated) for each these Lambda functions. Instead, the library can be pulled in from a remote repository.
  • Step function (orchestration of Lambda functions) can now invoke many AWS managed services (such as DynamoDB, AWS Batch, Amazon SQS, and Amazon SageMaker) directly in a defined flow.
  • A new service called AWS Serverless Application Repository allows cataloging/discovery/assembly of a serverless application from existing Lambda functions.
  • Lambda functions can now be placed behind an Application Load Balancer. This allows Lambda functions to be invoked directly via HTTP/HTTPS without having to use the API Gateway.
  • Firecracker is a lightweight virtualization that is based on KVM. Amazon uses this technology internally to for its AWS Lambda offering as well. According to AWS, this service allows "launching of lightweight micro-virtual machines (microVMs) in non-virtualized environments in a fraction of a second, taking advantage of the security and workload isolation provided by traditional VMs and the resource efficiency that comes along with containers".
  • AWS App Mesh - For monitoring and controlling communication across microservices on AWS such as ECS, EKS and Kubernetes running on EC2.
  • API Gateway now supports Web Sockets. For Single Page Apps, live updates from the Server are usually sent over Web Sockets. This makes API Gateway more desirable as a backend for interactive SPAs.
  • SNS now supports filtering of messages that are published to a given SNS topic. This can help discard undesirable messages at the SNS service level thus reducing traffic to a configured SNS recipient such as AWS Lambda or a microservice.

Databases and Storage

Amazon Aurora, a MySQL-based managed database service, was featured prominently in Werner's keynote. Amazon has famously vowed to get rid of all its Oracle databases. I imagine Aurora will replace a good number of these databases.
  • Amazon Aurora added a Global database feature that is designed for applications with a global footprint. It allows a single Aurora database to span multiple AWS regions, with fast replication to enable low-latency global reads and disaster recovery from region-wide outages. I imagine one of the main motivations for adding this feature was to match Microsoft Azure Cosmos Database's globally distributed storage model.
  • Amazon DynamoDB added ACID-compliant transactions across multiple tables in a given AWS region. This is important for applications that need to store data reliably across multiple tables in a single transaction. DynamoDB also added an On-demand pricing model where the application does not need upfront capacity planning (read/write capacity units).
  • Amazon Timestream is a new database offering optimized for storing timestream data and is more cost effective than other storage options such as RDS. This is an attractive option for storing large amount of streaming data such as Telemetry data from IoT devices.
  • Amazon had previously introduced AWS Glue to discover and catalog structured and unstructured data to aid in building of a Data Lake. Amazon has now introduced AWS Lake Formation, that sits on top of AWS Glue, and makes the job of configuring data sources and governance of the source data much simpler.
  • S3 added intelligent tiering which automatically moves data to different pricing/availability tiers of S3 based on S3 object access patterns.
  • AWS Transfer for SFTP is new fully managed SFTP service S3. This allows access to data stored in S3 buckets through SFTP protocol.
  • Amazon introduced Amazon FSx for Lustre, a fully managed file system for use with Lustre, a file system used for large-scale cluster computing. Similarly, Amazon FSx for Windows File Server delivers a managed Windows file system (supports SMB, NTFS and AD) for use with workloads on Windows Server.

Amazon finally gets into the Blockchain game

Two Blockchain related services were announced.
  1. Amazon Managed Blockchain is a fully managed service that makes it easy to create and manage scalable Blockchain networks using popular open source frameworks Hyperledger Fabric & Ethereum.
  2. Amazon Quantum Ledger Database (QLDB) is a purpose-built ledger database that provides a complete and verifiable history of application data changes. The database is append only/immutable (can't be edited) and cryptographically verified (to ensure contents have not been tampered).

Finally, there were new offerings in the area of DevOps and Security

  • AWS CodeDeploy now supports Blue/Green deployments for AWS Fargate and Amazon ECS.
  • AWS Security Hub enables AWS customers to centrally view and manage security alerts and automate compliance checks within and across AWS accounts.
  • AWS Control Tower helps create and maintain secure, well-architected multi-account AWS environments with respect to configuration of organizations, federated access, centralized logging, IAMs auditing, and workflows for provisioning new accounts.
  • AWS Well-Architected Tool can review state of workloads and compare them to the latest AWS architectural best practices.
  • AWS Outposts (later in 2019) - An on-premise hardware offering developed jointly by Amazon and VMWare. It is fully managed, maintained, and supported by AWS to deliver access to the latest AWS services on customer's site. It brings native AWS services, infrastructure, and operating models to virtually any data center, co-location space, or on-premises facility.
Well, that's all for this year. I believe we are nowhere near utilizing the full potential of AI, Machine Learning, and IoT data. I have no doubt we will see many more outstanding innovations in these areas in the near future. We are now well-beyond dynamic websites (Web 1.0) and mobile computing (Web 2.0). Warp speed to AI, ML and IoT (Web 3.0).

Saturday, October 20, 2018

Java Threading

Implement a class that implements IRunnable (requires run() method

// This code can reside inside or outside the class that implements IRunnable
Thread thread = new Thread(<instance of class that implements IRunnable)


thread.start();  // executes the run() method

public class SomeProcess implements Runnable{

Thread runner;

SomeProcess (){
  if (runner == null){
    runner = new Thread(this); // look for Run method in this class
    runner.start();  // executes the Run method
  }
}

public void Run (){
   Thread thisThread = Thread.currentThread();

   // thread terminates when Run() ends
}

OR

// set this.runner to null to terminate the thread by setting 
//  this.runner to null from anywhere else
// 
public void Run (){
   Thread thisThread = Thread.currentThread();
   while (runner == thisThread){

   }
   // thread terminates when Run() ends
}

}

Creating with an anonymous inner class

new Runnable() {
  public void run(){
  }}
}

Creating with a Closure

There is only required method for Runnable so we can define as Closure/lambda
Runnable runner = () ->  { // code for Run method }

Java Data Structures

BitSet -for representing binary data (indexed 0, 1 values)

ArrayList

Stack

HashMap - Dictiionary, key-value pairs

Generics - Create typed type structures
ArrayList<String>
HashMap<String, Float>

Java Access Modifiers


Wednesday, October 3, 2018

Aws IoT - Device Connectivity

Devices connect to Aws IoT core using one of the supported protocols. Data from the device is transported to the Aws IoT as JSON document.

Protocols

The message broker supports the use of the MQTT protocol to publish and subscribe and the HTTPS protocol to publish. The message broker also supports MQTT over the WebSocket protocol.
  • MQTT, Client Certificate, 8883, 443
  • HTTP, Client Certificate, 8443
  • HTTP, SigV4, 443
  • MQTT + WebSocket, SigV4, 443

Connectivity from Device using Client Certificate

Resources on Device

  1. x509 Certificate specific to the device (establishes device identity; equivalent to username in classic authentication). An X.509 certificate is a document that is used to prove ownership of a public key embedded in the cert. CA creates a certificate and signs it with a private key. Anyone can now validate your device certificate by checking its digital signature with the CA’s public key. (.pem.cer file)
  2. Private key corresponding to device's x509 (for signing communication)
  3. Root certificate for Aws IoT server (to verify the authenticity of certificate returned by Aws IoT to the device; Answers question: Am I talking to the real Aws IoT Server?). On Aws IoT Button, this certificate is already baked in. (.pem file)
  4. Client connectivity to Internet: WIFI SSID, password
  5. Aws IoT Server endpoint (region-specific, endpoint for multiple devices). For example: abc.iot.us-east-1.amazonaws.com

Resources on Aws IoT Server

  1. x509 Certificate specific to the device with an associated Certificate Id.
Each connected device is represented as thing. It has unique arn. For example:
arn:aws:iot:us-east-1:995042574424:thing/iotbutton_G030MD045XXXXX

Communication Flow

Two step process.

1) Establish secure communication between the device and Aws IoT server. This is just like connecting to a secure website. The Server sends it's certificate to Client. Client wants to make sure it is talking to the real AWS IoT. Client verifies that server cert is authentic by using the AWs IoT Service root certificate present on the device. The public key that’s embedded in the root certificate is used to validate the digital signature on the Server provided certificate. Client and Server then negotiate and use a shared secret to encrypt communication.

2) Next device identifies itself to Aws IoT server. Device sends a copy of its device certificate to the server. Device calculates a hash over the data sent to Server with its private key and sends it as the digital signature. AWS IoT is now in possession of the devices’ public key (which was in the device certificate) and the digital signature. It uses the device’s public key to check the accuracy of the digital signature. By using the unique identifier of the certificate, it knows exactly which device is establishing a MQTT session. From then on, all messages between the device and AWS IoT are secured using the shared secret (for efficiency).

Aws Resource Access

Aws resources that are allowed to be accessed by a device are specified by associating a policy with Device's certificate on the Aws IoT Server.

For example, the following policy publishes the data received from a device to an SNS topic.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "iot:Publish",
      "Effect": "Allow",
      "Resource": "arn:aws:iot:us-east-1:995042574424:topic/iotbutton/G030MD045XXXXX"
    }
  ]
}

1. Messages are published to a device specific SNS topic

2. Rules engine picks up the message from the SNS topic and then publishes the data to configured destination (under Act tab).

Data Flow

Data received from device by the Message Broker on Aws IoT Server and can be routed to another Aws resource by Aws IoT Server's Rule engine.
  1. DynamoDB
  2. Kinesis
  3. Lambda
  4. S3
  5. SNS
  6. SQS

Monday, April 23, 2018

React app with Redux and Redux Observables

Technologies

  • React js (for UI)
  • Redux (for state management; actions and reducers that set state)
  • React-Redux (for connecting Redux to React; redux actions and arbitrary properties that can be set as properties of the component; Provider component makes the store available to all components)
  • Redux Observable (for linking RxJS to Redux; applyMiddleware to redux store; takes care of RxJS subscriptions and un-subscribe)
  • RxJS (Reactive programming based on observables)

Packages

yarn add redux react-redux rxjs redux-observable

// rxjs requires tslib
yarn add tslib

The Idea

  • Redux Action is dispatched
  • Received by the Reducer
  • Since there is redux observable middle-ware applied, the action is then received by the Epic (redux observable action handler)
  • Epic is now free to either handle that action or not based on the action.type
  • Epic can fire an entirely new action that is then received by the Reducer
  • Which in turn may possibly be handled by another Epic (or not)

Data Retrieval Flow

Epics (Redux observables) are very good at handling asynchronous operations, errors, and mapping results.
  • An redux action is invoked (for example, by clicking a button)
  • Action is handled by the reducer (which can possibly place a loading message in the store, which in turn my be displayed by react component to the user)
  • The same action can then be optionally handled by an Epic (redux observables middleware)
  • The Epic can perform the asynchronous data fetch operation (ajax, for example)
  • When finished, Epic can produce another action with the data retrieved
  • That action is then handled by the reducer, that may remove the loading message from the store and set retrieved value in the store.
  • The retrieved value in the store may then be reflected in the react component

Sunday, April 22, 2018

Git Help

usage: git [--version] [--help] [-C ] [-c name=value]
           [--exec-path[=]] [--html-path] [--man-path] [--info-path]
           [-p | --paginate | --no-pager] [--no-replace-objects] [--bare]
           [--git-dir=] [--work-tree=] [--namespace=]
            []

These are common Git commands used in various situations:

start a working area (see also: git help tutorial)
   clone      Clone a repository into a new directory
   init       Create an empty Git repository or reinitialize an existing one

work on the current change (see also: git help everyday)
   add        Add file contents to the index
   mv         Move or rename a file, a directory, or a symlink
   reset      Reset current HEAD to the specified state
   rm         Remove files from the working tree and from the index

examine the history and state (see also: git help revisions)
   bisect     Use binary search to find the commit that introduced a bug
   grep       Print lines matching a pattern
   log        Show commit logs
   show       Show various types of objects
   status     Show the working tree status

grow, mark and tweak your common history
   branch     List, create, or delete branches
   checkout   Switch branches or restore working tree files
   commit     Record changes to the repository
   diff       Show changes between commits, commit and working tree, etc
   merge      Join two or more development histories together
   rebase     Reapply commits on top of another base tip
   tag        Create, list, delete or verify a tag object signed with GPG

collaborate (see also: git help workflows)
   fetch      Download objects and refs from another repository
   pull       Fetch from and integrate with another repository or a local branch
   push       Update remote refs along with associated objects

'git help -a' and 'git help -g' list available subcommands and some
concept guides. See 'git help ' or 'git help '
to read about a specific subcommand or concept.