Microsoft’s new database for globally-distributed applications

Azure Cosmos DB is a superset of the existing DocumentDB service, and Microsoft is transitioning all existing DocumentDB customers to Azure Cosmos DB, free of charge.

The system is designed to scale horizontally, whilst maintaining impressive performance and reliability. This is backed by a confident and generous service-level agreement.

Microsoft says that it can deliver single-digit millisecond latency at the 99th percentile, which is huge. It also says that 99.99 percent of all requests will “complete successfully,” and is also promising a 99.99 percent uptime availability.

This new effort from Redmond is extremely versatile, and can handle pretty much every type of data you’re likely to throw at it, including key-value, document, columnar, and graph types, in a variety of environments, including AI and IoT.

Azure Cosmos DB also plays nice with several NoSQL APIs including MongoDB, Table Storage, DocumentDB SQL, Gremlin, and Azure Tables. The Gremlin and Table Storage are currently in preview mode.

Another strength of Azure Cosmos DB is the ease and speed upon which data can be replicated in different Azure regions, allowing developers to quickly respond to regional surges of traffic. This elasticity doesn’t come at the expense of application downtime.

Amazon Kinesis Data Analytics Can Now Detect Hotspots in Real-Time Data Streams

Posted On: Mar 19, 2018

Starting today, Amazon Kinesis Data Analytics supports real-time hotspot detection, which allows you to automatically detect regions of high density in your data, like a high concentration of vehicles on a highway indicating traffic bottlenecks, surging rideshare requests in a certain area indicating a popular event, or higher sales of products within a category indicating feature similarity. Detecting hotspots like these can help you gain actionable insights quickly and react to changing customer and business needs promptly.
To get started, simply call the Kinesis Data Analytics hotspot function from your Kinesis application. The hotspot function automatically builds and trains an appropriate machine learning model to identify subsections of your data streams that need attention. It identifies and reports one or more bounding boxes of these subsections in real-time.

Kinesis Data Analytics hotspot detection is unsupervised, meaning that the function does not require you to label the data for model training. In addition, Kinesis Data Analytics automatically updates the model behind the scenes to adapt to changes in the data stream. For more information including sample code and hotspot visualization, see Detecting Hotspots on a Stream in the Kinesis Data Analytics developer guide.

Kinesis Data Analytics is the easiest way to process data streams in real time with standard SQL without having to learn new programming languages or processing frameworks. Kinesis Data Analytics is available in the US East (N. Virginia), US West (Oregon), and EU (Ireland) regions. To get started, visit the Kinesis Data Analytics management console.

Apache spark to power microsoft bigdata analytics

Microsoft is  making a serious commitment to the open source Apache Spark cluster computing framework.

After dipping its toes into the Spark ecosystem last year, the company today launched a number of Spark-based services out of preview and announced that the on-premises version of R Server for Hadoop (which uses the increasingly popular open source R language for big data analytics and modeling) is now powered by Spark.
In addition, Microsoft announced that R Server for HDInsight (essentially the cloud-based version of R Server) is coming out of preview later this summer and Spark for Azure HDInsight is now generally available with support for managed Spark services from Hortonworks. Power BI, Microsoft’s suite of business intelligence tools, will now also support Spark Streaming to allow users to push real-time data from Spark right into Power BI.