Designing Data-Intensive Applications - Data Models: Relational vs Document

1:53:53
 
Share
 

Manage episode 249561950 series 27319
By Allen Underwood, Michael Outlaw, and Joseph Zack. Discovered by Player FM and our community — copyright is owned by the publisher, not Player FM, and audio is streamed directly from their servers. Hit the Subscribe button to track updates in Player FM, or paste the feed URL into other podcast apps.

We’re comparing data models as we continue our deep dive into Designing Data-Intensive Applications as Coach Joe is ready to teach some basketball, Michael can’t pronounce 6NF, and Allen measured some geodesic distances just this morning.

For those reading these show notes via a podcast player, this episode’s full show notes can be found at https://www.codingblocks.net/episode123 where you can also join in on the conversation.

Sponsors
  • Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard.
  • Educative.io – Level up your coding skills, quickly and efficiently. Visit educative.io/codingblocks to get 20% off any course or, for a limited time, get 50% off an annual subscription.
  • ABOUT YOU – One of the fastest growing e-commerce companies headquartered in Hamburg, Germany that is growing fast and looking for motivated team members like you. Apply now at aboutyou.com/job.
Survey Says Which data model do you prefer?

Take the survey at: https://www.codingblocks.net/episode123.

News
  • We thank everyone that took a moment to leave us a review:
    • iTunes: BoulderDude333, the pang1, fizch26
  • Hurry up and get your tickets now for NDC { London }, January 27th – 31st, where Allen will be giving his talk, Big Data Analytics in Near-Real-Time with Apache Kafka Streams. This is your chance to kick him in the shins on the other side of the pond. (ndc-london.com)
  • Sign up for your chance to kick Joe in the shins at the South Florida Software Developers Conference 2020, February 29th, where he will be giving his talk, Streaming Architectures by Example. (fladotnet.com)
  • Want a chance to kick all three Coding Blocks hosts in the shins? Sign up for the 15th Annual Orlando Code Camp & Tech Conference, March 28th, for your chance to kick them all in the shins and grab some swag. (orlandocodecamp.com)
Data Models
  • Data models are one of the most important pieces of developing software.
    • It dictates how the software is written.
    • And it dictates how we think about the problems we’re solving.
  • Software is typically written by stacking layers of modeling on top of each other.
    • We write objects and data structures to reflect the real world.
    • These then get translated into some format that will be persisted in JSON, XML, relational tables, graph db’s, etc.
      • The people that built the storage engine had to determine how to model the data on disk and in memory to support things like search, fast access, etc.
        • Even further down, those bits have to be converted to electrical current, pulses of light, magnetic fields and so on.
  • Complex applications commonly have many layers: APIs built on top of APIs.
    • What’s the purpose of these layers? To hide the complexity of the layer below it.
      • The abstractions allow different groups of people (potentially with completely different skillsets) to work together.
  • There are MANY types of data models, all with different usages and needs in mind.
    • It can take a LOT of time and effort to master just a single model.
    • Data models have a HUGE impact on how you write your applications, so its important to choose one that makes sense for what you’re trying to accomplish.
Relational Model vs Document Model
  • Best-known model today is probably the ones based on SQL.
  • The relational model was proposed by Edgar Codd back in 1970.
  • The relational model organizes data into relations (i.e. tables in SQL) where each relation contains an unordered collection of tuples (i.e. rows in SQL).
    • People originally doubted it would work but it’s dominance has lasted since the mid-80’s, which the author points out is basically an eternity in software.
  • Origins were based in business data processing, particularly transaction processing.
  • There have been a number of competing data storage and querying approaches over the years.
    • Network and Hierarchical models in 70’s and 80’s,
    • Object databases were competitors in the late 80’s and early 90’s,
    • XML databases,
    • Basically a number a competitors over the years but nobody has dethroned the relational database.
  • Almost everything you see and use today has some sort of relational database working behind it.
NoSQL
  • NoSQL is the latest competitor to Relational Databases.
    • It was originally intended as a catchy Twitter hashtag for a meetup about open source, distributed, non-relational databases.
    • It has since been re-termed to “Not only SQL”.
  • What needs does NoSQL aim to address?
    • The need for greater scalability than traditional RDBMS’s can typically achieve, including very large datasets and fast writes.
    • The desire for FOSS (free and open source software), as opposed to very expensive, commercial RDBMS’s.
    • Specialized query operations that are not supported well in the relational model.
    • Shortcomings of relational models – need for more dynamic and/or expressive data models.
  • Different applications (or even different pieces of the same application) have different needs and may require different data models. For that reason, it’s very likely that NoSQL won’t replace SQL, but rather it’ll augment it.
    • This is referred to as polyglot persistence.
Object-Relational Mismatch
  • Most applications today are written in an object oriented programming language.
  • There’s typically a translation layer required to map the relational data models to an object model.
    • The disconnect between models can be referred to as impedance mismatch.
  • Frameworks like ActiveRecord, Hibernate, Entity Framework, etc., can reduce the boilerplate code needed for the translation but typically don’t fully hide the impedance mismatch issues.
Resources We Like
  • Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon)
  • Grokking the System Design Interview (Educative.io)
  • Monitor Azure DevOps workflows and pipelines with Datadog (Datadog)
  • Monitor Amazon EKS on AWS Fargate with Datadog (Datadog)
  • Best practices for tagging your infrastructure and applications (Datadog)
  • Introducing: Educative Subscriptions (Educative.io)
  • Santosh Hari – Not all data is created equal: NoSQL (YouTube)
  • TIOBE Index (tiobe.com)
  • Database Schema for Multiple Types of Products
Tip of the Week
  • Got data? Use DataGrip. One tool for many databases. (JetBrains)
  • KafkaHQ – A Kafka GUI for topics, data, consumer groups, schema registry and more. (GitHub)
  • Grafka – A GraphQL interface for Apache Kafka (GitHub)
  • Use Google Maps to measure geodesic distances (citylab.com)
  • How to undo (almost) anything with Git (GitHub)
  • Will Save the Galaxy for Food by Yahtzee Croshaw (Amazon)

126 episodes