It’s the day 2 keynote! Rimma Nehme – a brilliant speaker – is presenting this keynote, titled “Globally Distributed Databases Made Simple’ I’ll be updating this for the next hour or so…let’s go.
Update: And we’re done! Wow. you’re gonna have to go watch the video, for sure, and so am I. That was a LOT of information, well presented. Not even “technical glitches aside”…those were AV issues, and she handled them with perfect grace.
Note: I’ve added a couple of side-note style commentaries in italics throughout…
The abstract says, in part,
Public clouds are quickly making massive-scale computing capabilities available to an ever-larger population of developers and data professionals. These computing capabilities are no longer a playground restricted to a small handful of large-scale internet services organizations. … In my talk this year, I will provide a deep dive understanding into Azure Cosmos DB, Microsoft’s globally distributed, multi-model database service that was 7 years in the making.
We start with a community video, with attendees talking about the Summit itself. Theese are nice.
Welcoming
It’s Kilt Day here at the Summit, and Grant Fritchey (PASS president) takes the stage in a lovely red tartan and dark sporin. He’s talking about who makes up the PASS community. And what PASS does, what events are coming up (several online 24 Hours of PASS events next year, among others).
Kiltday @sqlpass #PASSsummit pic.twitter.com/0QunBDq6pz
— Erwin de Kreuk #MVP (@ErwindeKreuk) November 2, 2017
Most important advice today: Have fun + make connections 😀 #PASSsummit #SQLFamily pic.twitter.com/yeJvil0PGo
— Cathrine Wilhelmsen (@cathrinew) November 2, 2017
Grant is welcoming VP of marketing, Denise McInerney. She’s up to talk about PASS’s global reach. There are over 2,000 viewers watching on PassTV right now!
Here comes the Passion Award announcement. It’s for the outstanding award of the year. It’s….Roberto Fonseca!
More takeaways:
- You can buy recordings of the PASS Summit sessions. I recommend doing this.
- The event evals ARE important. Hit http://PASSsummit.com, download PASS events app, find PASS Summit 2017, do evals!
- PASS Summit 2018 registration is now open! http://PASSsummit.com
And now, Dr. Rimma Nehme!
Dr. Nehme (who I may end up calling Rimma, because she’s very approachable) is the group product manager for Cosmos DB at Microsoft.
Rimma first gave a PASS keynote in 2014. Today we’re looking at CosmosDB…at globally distributed databases.
Warning: @rimmanehme's keynote will be transformational, educational and sometimes funny. There might be momsplaining 😀 #PASSsummit
— Cathrine Wilhelmsen (@cathrinew) November 2, 2017
Let’s do some takeaway style notes as we go along:
- “90% off the world’s data was created in the last two years alone.” Wow. In the next 3-5 years, we’re looking at at least 50 times more.
- “Data never sleeps.” Every 60 seconds, 204 million emails are being generated.
- Data is interconnected. Buying a coke in Seattle can have an effect in a country in South America.
Can you imagine how fast the data do grow? @rimmanehme on the stage about global distributed scale #keynote #PASSSummit pic.twitter.com/sUDiHslxhr
— Kamil Nowinski (@nowinskik.bsky.social) (@NowinskiK) November 2, 2017
I remember when a few gigabytes *seemed* like a huge database. #passsummit
— Tim Mitchell (@Tim_Mitchell) November 2, 2017
- So we’re looking at HUGE, constantly changing, globally effecting data and databases.
"Imagine…" #PASSsummit pic.twitter.com/SG4s30jB3Z
— MidnightDBA (@MidnightDBA) November 2, 2017
- “‘Project Florence’ is the blueprint of what is known today as Azure Cosmos DB.” Slide: “The entire distributed database system built from the ground…” circa 2010.
- How should the DB be designed for the cloud?
- Turnkey global distribution
- Guaranteed low latency at 99% percentile worlwide
- Guaranteed high availability within the region and globally,
- Guaranteed consistency
- Elastically scale throughput/storage any time, on demand, globally.
- Comprehensive SLAs
- Operate at low cost
- Iterate and query without worrying about schema and index management
- Provide a variety of data model and api choices.
Indeed! #PASSsummit pic.twitter.com/M8dUN8IgEm
— MidnightDBA (@MidnightDBA) November 2, 2017
“If it was easy everyone would do it”. #PASSSummit
— Andre Ranieri😷 (@sqlinseattle) November 2, 2017
“Even though it’s perceived as a new service, it’s 7 years in the making.” Don’t I know it. Minion Enterprise is over eight years old; we’ve been selling it for about 2-3 years.
Rimma put up a fairly complex architecture slide, and then put up a set of colorful scribbles all over it. One excellent thing she does: make tongue in cheek observations when she breaks a presentation rule (here, the overly complex slide). That makes it WORK, speakers.
There are 42 Azure regions across the world. Ooh, a virtual tour of the data centers! We love this stuff. We’re seeing external pictures of data centers, which isn’t quite as impressive..oh, here’s the inside. yep, that’s a lot of racks.
“The basic concept inside cosmos db is a container, a representation for the data with a data model…a table, a collection of documents, a graph” and more. From the slide:
- DB account/DB may span clusters, regions
- DB is scaled out in terms of containers
- Designed to scale throughput and stored independently.
CosmosDB hardware architecture #Passsummit pic.twitter.com/0qNU7laaKK
— Gail Shaw (@SQLintheWild) November 2, 2017
There simply isn’t time to come up with proper headings for this live blog
This is a fast presentation, but it’s easier to digest (esp. as compared to yesterday’s keynote) because it’s a good, contiguous story. We’re not going to get every aspect of every technical detail, but we get the sense of everything. Again: this woman is an absolute professional presenter. This will be worth re watching on PASS TV, for a few reasons.
Here’s an article and video that discusses some of what Dr. Nehme is speaking on right now.
#PASSsummit #CosmosDB is easy to get started with too. If I can do it, I know you can.
— Grant Fritchey (@GFritchey) November 2, 2017
Now, partitioning best practices. Now, resource model summary. (Hey, I’m not going to try to learn and summarize this at once!)
Partitioning Best Practices #CosmosDB #PassSummit pic.twitter.com/2eRFtZhcuQ
— Denis Gobo (@DenisGobo) November 2, 2017
Adding regions for “turnkey global distribution” is a click away. In an Azure portal, you can pick where your data should be. That’s quite nice, of course. It looks pretty easy to set failover priority, too.
This model effectively turns the entire planet into a big data center. Deploy to one or more regions with just a few clicks. #passsummit
— Tim Mitchell (@Tim_Mitchell) November 2, 2017
Dr. Nehme: “You can simulate a regional outage. (Don’t go crazy with this, guys.)”
Oh good, “policy-based geo-fencing”. Different parts of the world have requirements as to where the data can and can’t be stored, so they’ve implemented this to help. Very nice. That was a huge concern YEARS ago when MS couldn’t tell us where data in Tha Cloud would be stored.
Backups in the cloud…elastic scaleout…
Plan for CosmosDB – true Active/Active multi-master read-write database. That's … impressive. #PASSsummit
— Gail Shaw (@SQLintheWild) November 2, 2017
#cosmosdb backup is not for typical disaster recovery, as there is geo-replication available #PASSsummit
— Markus Ehrenmüller-Jensen (@MEhrenmueller) November 2, 2017
“Optionally evict old data.” Henceforth, I shall refer to archival as eviction. #PASSsummit #lostourlease #everythingmustgo
— Brent Ozar (@BrentO) November 2, 2017
Resource governance….I’m definitely falling behind. I do like this on the slide: “Resource Governance cannot be an afterthought.” There’s a complex process to evaluate the cost of a query, to help with the resource governance. (On the slide, a read is one resource unit (RU), while a delete is 2 and query is 4.)
Fully resource governed stack #PASSsummit pic.twitter.com/RCsrY3vj7P
— Dr. Victoria Holt #DataToboggan (@victoria_holt) November 2, 2017
Transparent horizontal partitioning, responsive partition management operations. Looks like you can decide what partition will provide what number of RUs, and there appears to be magic involved because I can’t quite keep up. Oh good, an example!
RUs can be dynamically moved to meet demand, for example, from one time zone to the next to match local spikes in utilization. #passsummit
— Tim Mitchell (@Tim_Mitchell) November 2, 2017
Be careful how you configure throughput RU. There is a cost related to poor configuration. #SQLPASS #PASSSummit
— SQLElLídre (@DSFNet) November 2, 2017
During peak season, elastically provision more resources on demand, and then when you don’t need them (after the peak) turn them off. This is better than on prem (she says) in terms of not having to buy and keep the huge server. Of course, this has been the appeal of the cloud all along; get what you need when you need it, and no more. Just, you know…remember to turn off the extra stuff after the busy times are done.
Nope, still no time for article headings…
“Guaranteed low latency” slide has very nice things:
#CosmosDB latency guarantee #PASSSummit pic.twitter.com/jJDXzcprMC
— Denis Gobo (@DenisGobo) November 2, 2017
On a customer application example, “This is literally the speed of light.” I do like that the speed of data globally really is the speed of light.
I hear: Data automatically indexed upon ingestion, customer doesn’t need to specify anything. I think: TELL ME MORE TELL ME MORE #PASSSummit
— Kendra Little (@Kendra_Little) November 2, 2017
CAP Theorem
This is on consistency models. But, you guessed it…
She is actually packing a lot into it. Will need to watch multiple times
— Mala Mahadevan (@sqlmal) November 2, 2017
Red pill (strong consistency, higher latency) or blue pill (eventual consistency, low latency)? #passsummit
— Tim Mitchell (@Tim_Mitchell) November 2, 2017
Slide: “Consistency models in Azure Cosmose DB”. Most real life apps do not fall into those two categories. Instead, there are 5 well defined consistency levels with clear tradeoffs. Includes Strong, Bounded-stateless, Session, Consistent Prefix, Eventual.
The most common consistency model used in CosmosDB, is Session, not eventual. #PassSummit
— Gail Shaw (@SQLintheWild) November 2, 2017
Link to paper @rimmanehme mentioned in #PASSsummit keynote https://t.co/hVs7nN8XPu
— Louis Davidson (@drsql) November 2, 2017
TLA+ makes you pseudocode our your database logic to find out where it will break. It’s so cool: https://t.co/HcsxjZXd57 #PASSsummit
— Brent Ozar (@BrentO) November 2, 2017
Video time, Dr. Leslie Lamport “Foundations of Cosmos DB”
Oh it wouldn’t play. I’ll get the video after the keynote. Oh here it is!
#FoundationsOfAzureCosmosDBWithDrLeslieLamport https://t.co/7aGFUOgmAT #SQLPASS #PASSSummit
— SQLElLídre (@DSFNet) November 2, 2017
Moving on
“Offering consistency for a price”. Different consistency levels have different costs.
Object Model
If you model data in the relational way, you collocate all the parameters in one document. How do you query that? With a SQL query, which behind the scene is represented with an ARS representation. It can be represented as a tree, and so can the results.
At a global scale, schema management becomes a covnersation nonstarter. Schema agnostic indexing… okay again, this is going by at light speed. She’s mentioned a couple of times that data is ingested and indexed automatically.
#PASSsummit #complicated pic.twitter.com/SDKynOWVnK
— Gail Shaw (@SQLintheWild) November 2, 2017
She’s skipping the physical index organization, and I cannot help but think we’re grateful, and look forward to studying it later. Well, those who are going to dig into Azure Cosmos DB are looking forward to studying!
She keeps saying, “Let me go a little bit faster here…” By now we are good-naturedly laughing along.
Slide: Query processing:
- Support for multiple query lang mappings via a compact Query IL grammar.
- Ability to call in and out of javascript contexts during execution
- Resource governed execution
https://twitter.com/DataSic/status/926126743727099905
Support for multiple apis, formats, and wire protocols. I’d be interested to hear from the field on how well this works, vs how big of a pain in the butt it is.
Create UDFs with JavaScript? I have mixed feelings about that. #passsummit
— Tim Mitchell (@Tim_Mitchell) November 2, 2017
Them's some mighty big claims. #PASSsummit pic.twitter.com/lIXCDUMuwl
— MidnightDBA (@MidnightDBA) November 2, 2017
And predictive fault detection, just to add more complexity. Try to edict that a node will fail and do something before it does. #PASSsummit
— Gail Shaw (@SQLintheWild) November 2, 2017
In Conclusion…
“Why should you care?” The one who survives is “the one who is most adaptable to change”.
Try Cosmos DB for free. Cool.
Try #CosmosDB for free today! (no credit card or azure account needed) #PASSsummit
— Markus Ehrenmüller-Jensen (@MEhrenmueller) November 2, 2017
And we’re done!
Some awesome stuff there and great job trying to keep up with Dr. Nehma, especially as she’d then go, “well now I need to go faster….”. Wow. Mind Blown.