#PASSSummit day 2 keynote live blog

It’s the day 2 keynote! Rimma Nehme – a brilliant speaker – is presenting this keynote, titled “Globally Distributed Databases Made Simple’ I’ll be updating this for the next hour or so…let’s go.

Update: And we’re done! Wow. you’re gonna have to go watch the video, for sure, and so am I. That was a LOT of information, well presented. Not even “technical glitches aside”…those were AV issues, and she handled them with perfect grace.

Note: I’ve added a couple of side-note style commentaries in italics throughout…

The abstract says, in part,

Public clouds are quickly making massive-scale computing capabilities available to an ever-larger population of developers and data professionals. These computing capabilities are no longer a playground restricted to a small handful of large-scale internet services organizations. … In my talk this year, I will provide a deep dive understanding into Azure Cosmos DB, Microsoft’s globally distributed, multi-model database service that was 7 years in the making.

We start with a community video, with attendees talking about the Summit itself. Theese are nice.

Welcoming

It’s Kilt Day here at the Summit, and Grant Fritchey (PASS president) takes the stage in a lovely red tartan and dark sporin. He’s talking about who makes up the PASS community. And what PASS does, what events are coming up (several online 24 Hours of PASS events next year, among others).

Grant is welcoming VP of marketing, Denise McInerney. She’s up to talk about PASS’s global reach. There are over 2,000 viewers watching on PassTV right now!

Here comes the Passion Award announcement. It’s for the outstanding award of the year. It’s….Roberto Fonseca!

More takeaways:

  • You can buy recordings of the PASS Summit sessions. I recommend doing this.
  • The event evals ARE important.  Hit , download PASS events app, find PASS Summit 2017, do evals!
  • PASS Summit 2018 registration is now open! http://PASSsummit.com

And now, Dr. Rimma Nehme!

Dr. Nehme (who I may end up calling Rimma, because she’s very approachable) is the group product manager for Cosmos DB at Microsoft.

Rimma first gave a PASS keynote in 2014. Today we’re looking at CosmosDB…at globally distributed databases.

Let’s do some takeaway style notes as we go along:

  • “90% off the world’s data was created in the last two years alone.” Wow. In the next 3-5 years, we’re looking at at least 50 times more.
  • “Data never sleeps.” Every 60 seconds, 204 million emails are being generated.
  • Data is interconnected.  Buying a coke in Seattle can have an effect in a country in South America.

  • So we’re looking at HUGE, constantly changing, globally effecting data and databases.

  • “‘Project Florence’ is the blueprint of what is known today as Azure Cosmos DB.” Slide: “The entire distributed database system built from the ground…” circa 2010.
  • How should the DB be designed for the cloud?
    • Turnkey global distribution
    • Guaranteed low latency at 99% percentile worlwide
    • Guaranteed high availability within the region and globally,
    • Guaranteed consistency
    • Elastically scale throughput/storage any time, on demand, globally.
    • Comprehensive SLAs
    • Operate at low cost
    • Iterate and query without worrying about schema and index management
    • Provide a variety of data model and api choices.

“Even though it’s perceived as a new service, it’s 7 years in the making.” Don’t I know it. Minion Enterprise is over eight years old; we’ve been selling it for about 2-3 years.

Rimma put up a fairly complex architecture slide, and then put up a set of colorful scribbles all over it. One excellent thing she does: make tongue in cheek observations when she breaks a presentation rule (here, the overly complex slide). That makes it WORK, speakers.

There are 42 Azure regions across the world. Ooh, a virtual tour of the data centers! We love this stuff. We’re seeing external pictures of data centers, which isn’t quite as impressive..oh, here’s the inside. yep, that’s a lot of racks.

“The basic concept inside cosmos db is a container, a representation for the data with a data model…a table, a collection of documents, a graph” and more. From the slide:

  • DB account/DB may span clusters, regions
  • DB is scaled out in terms of containers
  • Designed to scale throughput and stored independently.

There simply isn’t time to come up with proper headings for this live blog

This is a fast presentation, but it’s easier to digest (esp. as compared to yesterday’s keynote) because it’s a good, contiguous story. We’re not going to get every aspect of every technical detail, but we get the sense of everything. Again: this woman is an absolute professional presenter. This will be worth re watching on PASS TV, for a few reasons.

Here’s an article and video that discusses some of what Dr. Nehme is speaking on right now.

Now, partitioning best practices. Now, resource model summary. (Hey, I’m not going to try to learn and summarize this at once!)

Adding regions for “turnkey global distribution” is a click away. In an Azure portal, you can pick where your data should be. That’s quite nice, of course. It looks pretty easy to set failover priority, too.

Dr. Nehme: “You can simulate a regional outage. (Don’t go crazy with this, guys.)”

Oh good, “policy-based geo-fencing”. Different parts of the world have requirements as to where the data can and can’t be stored, so they’ve implemented this to help. Very nice. That was a huge concern YEARS ago when MS couldn’t tell us where data in Tha Cloud would be stored.

Backups in the cloud…elastic scaleout…

Resource governance….I’m definitely falling behind.  I do like this on the slide: “Resource Governance cannot be an afterthought.” There’s a complex process to evaluate the cost of a query, to help with the resource governance. (On the slide, a read is one resource unit (RU), while a delete is 2 and  query is 4.)

Transparent horizontal partitioning, responsive partition management operations. Looks like you can decide what partition will provide what number of RUs, and there appears to be magic involved because I can’t quite keep up. Oh good, an example!

During peak season, elastically provision more resources on demand, and then when you don’t need them (after the peak) turn them off. This is better than on prem (she says) in terms of not having to buy and keep the huge server. Of course, this has been the appeal of the cloud all along; get what you need when you need it, and no more. Just, you know…remember to turn off the extra stuff after the busy times are done.

Nope, still no time for article headings…

“Guaranteed low latency” slide has very nice things:

On a customer application example, “This is literally the speed of light.” I do like that the speed of data globally really is the speed of light.

CAP Theorem

This is on consistency models. But, you guessed it…

Slide: “Consistency models in Azure Cosmose DB”. Most real life apps do not fall into those two categories. Instead, there are 5 well defined consistency levels with clear tradeoffs. Includes Strong, Bounded-stateless, Session, Consistent Prefix, Eventual.

Video time, Dr. Leslie Lamport “Foundations of Cosmos DB”

Oh it wouldn’t play. I’ll get the video after the keynote. Oh here it is!

Moving on

“Offering consistency for a price”. Different consistency levels have different costs.

Object Model

If you model data in the relational way, you collocate all the parameters in one document. How do you query that? With a SQL query, which behind the scene is represented with an ARS representation. It can be represented as a tree, and so can the results.

At a global scale, schema management becomes a covnersation nonstarter. Schema agnostic indexing… okay again, this is going by at light speed. She’s mentioned a couple of times that data is ingested and indexed automatically.

She’s skipping the physical index organization, and I cannot help but think we’re grateful, and look forward to studying it later. Well, those who are going to dig into Azure Cosmos DB are looking forward to studying!

She keeps saying, “Let me go a little bit faster here…” By now we are good-naturedly laughing along.

Slide: Query processing:

  1. Support for multiple query lang mappings via a compact Query IL grammar.
  2. Ability to call in and out of javascript contexts during execution
  3. Resource governed execution

https://twitter.com/DataSic/status/926126743727099905

Support for multiple apis, formats, and wire protocols. I’d be interested to hear from the field on how well this works, vs how big of a pain in the butt it is.

In Conclusion…

“Why should you care?” The one who survives is “the one who is most adaptable to change”.

Try Cosmos DB for free. Cool.

And we’re done!

1 thought on “#PASSSummit day 2 keynote live blog

  1. Greg Moore

    Some awesome stuff there and great job trying to keep up with Dr. Nehma, especially as she’d then go, “well now I need to go faster….”. Wow. Mind Blown.

Comments are closed.