SlideShare a Scribd company logo
1 of 19
Download to read offline
Miguel Angel Nieto
miguel.nieto@mongodb.com
Technical Services Engineer, MongoDB
Query Planner
The Question
● I worked for 6 years as MySQL Technical Support
Engineer.
● A large percentage of cases from customers were related
to bad query plans/wrong index selection.
● Query Planning is a complex piece of code with many
knobs that can be tuned.
● When I started working at MongoDB I found that the
number of cases on that topic was very very (very) low. So
I asked myself:
Why?
Plan selection in other databases
● Traditional databases use a statistics approach to choose
the best plan:
○ The information about data distribution is not
accurate.
○ It is estimated by reading data with random dives in
the index tree (MySQL).
○ When some prerequisites are met (like number of
modified rows) statistics are automatically
recalculated.
Plan selection in MongoDB
● MongoDB uses a empirical method:
○ If there is no cached plan, then all viable execution plans,
based on the available indexes, are created.
○ MongoDB runs the query multiple times, one for each query
plan and benchmarks them. It chooses the one that
provides the best performance.
○ Once done, the plan is cached.
■ Future queries with the same shape will re-use this
plan rather than re-running the candidate plans.
■ For each such query the performance of the cached
plan is evaluated. If the plan's performance decreases
beyond a given threshold, it is evicted from the cache
and the candidate test phase runs again. This is known
as re-planning (SERVER-15225)
Benchmarking the plans
● All possible plans are executed in round-robin fashion.
● It gathers execution metrics and then provide a score to each
plan.
● Sort the plans by score and choose the best one.
Execution Metrics (I)
● These are the metrics:
works
advanced
needTime
isEOF
Execution Metrics (II)
● Number of works:
■ The planner asks each plan for the next document, via a
call to work().
■ If the plan can supply a document, it responds with
'advanced'. Otherwise, the plan responds with
'needsTime'.
● If all documents have been retrieved, then isEOF = 1.
Early stop of query execution
● The query could be expensive, so there are limits to early
stop the execution. Execution stop if:
○ The maximum number of works has been reached.
○ The requested number of documents has been
retrieved (advanced).
○ We get isEOF (the resultSet has no more documents).
works
work()
isEOF advanced
Break
Number of work() calls before stopping
● internalQueryPlanEvaluationWorks = 10000
For large collections we take a fraction of the number of
documents:
● internalQueryPlanEvaluationCollFraction = 0.3
Then, get the maximum value.
internalQueryPlanEvaluationWorks
internalQueryPlanEvaluationCollFraction numRecords
works
Number of documents to retrieve before
stopping
● internalQueryPlanEvaluationMaxResults = 101
● query.getQueryRequest().getNToReturn()
○ Used in the old OP_QUERY protocol.
○ Drivers set 'ntoreturn' to min('batchSize', 'limit') in
order to fake the lack of 'limit' or 'batchSize'
mechanism in the protocol.
● query.getQueryRequest().getLimit()
○ Used in OP_QUERY protocol from 3.2 onwards.
getNToReturn
advanced getNToReturn internalQueryPlanEvaluationMaxResults
getLimit
advanced getLimit internalQueryPlanEvaluationMaxResults
advanced internalQueryPlanEvaluationMaxResults
advanced
Pick the best plan, count the scores
● baseScore = 1
● Productivity = queryResults / workUnits
● TieBreak (very small number) = min(1.0 / (10 * workUnits), 1e-4)
● noFetchBonus (covered index) = TieBreak or 0
● noSortBonus (blocking sort) = TieBreak or 0
● noIxisectBonus (avoiding index intersection) = TieBreak or 0
● tieBreakers = noFetchBonus + noSortBonus +
noIxisectBonus
● eofBonus (if during plan execution all possible documents are retrieved) = 0 | 1
Replanning: Automatic Plan Cache Eviction
● The stored data keep changing, it could possible that the
cached plan is not the best one anymore.
● While the cached plan is being used, MongoDB re-runs
the trial period for that plan and keeps a count of the
work() function calls.
● If the new trial period takes more than 10 times as many
works() as the original trial period, it evicts the plan from
the cache and re-tests all candidate plans to pick a new
winner.
● internalQueryCacheEvictionRatio = 10
maxWorksBeforeReplan internalQueryCacheEvictionRatio cachedWorks
currentWorks maxWorksBeforeReplan
replan()
Plans are not always cached
● In the following situations, the execution plan is not
cached:
○ Collection scan without sort()
○ hint()
○ min()
○ max()
○ explain()
○ Tailable cursors (they don’t use indexes)
○ snapshot()
○ A single viable plan
Query Planner Troubleshoot Example (I)
● We check all query shapes:
listQueryShapes
Query Planner Troubleshoot Example (II)
● Get the execution plan for that query:
getPlansByQuery
solution
score
works
isEOF
Query Planner Troubleshoot Example (III)
● Remove the query plan for a particular query:
clearPlansByQuery
"plans": [ ]
Query Planner Troubleshoot Example (IV)
● Remove all query plans on a particular collection:
clear
Query Planner Troubleshoot
● There are Plan Cache methods that can be used for
troubleshooting:
https://docs.mongodb.com/manual/reference/method/js-plan-cache/
● Check all query shapes:
○ db.collection.getPlanCache().listQueryShapes()
● Get the plan for a particular query:
○ db.collection.getPlanCache().getPlansByQuery(
<query>, <projection>, <sort> )
● Clean the plans for a particular query:
○ db.collection.getPlanCache().clearPlansByQuery()
● Clean all plans:
○ db.collection.getPlanCache().clear()
Thanks!

More Related Content

What's hot

F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsF9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsNational Cheng Kung University
 
A Journey from Hexagonal Architecture to Event Sourcing - SymfonyCon Cluj 2017
A Journey from Hexagonal Architecture to Event Sourcing - SymfonyCon Cluj 2017A Journey from Hexagonal Architecture to Event Sourcing - SymfonyCon Cluj 2017
A Journey from Hexagonal Architecture to Event Sourcing - SymfonyCon Cluj 2017Carlos Buenosvinos
 
From object oriented to functional domain modeling
From object oriented to functional domain modelingFrom object oriented to functional domain modeling
From object oriented to functional domain modelingMario Fusco
 
Indexing and Performance Tuning
Indexing and Performance TuningIndexing and Performance Tuning
Indexing and Performance TuningMongoDB
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File SystemAdrian Huang
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkCaserta
 
MongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenLorenzo Alberton
 
RxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsRxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsBrendan Gregg
 
MySQL User Group NL - MySQL 8
MySQL User Group NL - MySQL 8MySQL User Group NL - MySQL 8
MySQL User Group NL - MySQL 8Frederic Descamps
 
Integration of neutron, nova and designate how to use it and how to configur...
Integration of neutron, nova and designate  how to use it and how to configur...Integration of neutron, nova and designate  how to use it and how to configur...
Integration of neutron, nova and designate how to use it and how to configur...Miguel Lavalle
 
Cql – cassandra query language
Cql – cassandra query languageCql – cassandra query language
Cql – cassandra query languageCourtney Robinson
 
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
 
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)Gavin Guo
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
Reverse Engineering Malicious Javascript
Reverse Engineering Malicious JavascriptReverse Engineering Malicious Javascript
Reverse Engineering Malicious JavascriptYusuf Motiwala
 

What's hot (20)

F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsF9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
 
Lazy java
Lazy javaLazy java
Lazy java
 
A Journey from Hexagonal Architecture to Event Sourcing - SymfonyCon Cluj 2017
A Journey from Hexagonal Architecture to Event Sourcing - SymfonyCon Cluj 2017A Journey from Hexagonal Architecture to Event Sourcing - SymfonyCon Cluj 2017
A Journey from Hexagonal Architecture to Event Sourcing - SymfonyCon Cluj 2017
 
From object oriented to functional domain modeling
From object oriented to functional domain modelingFrom object oriented to functional domain modeling
From object oriented to functional domain modeling
 
Indexing and Performance Tuning
Indexing and Performance TuningIndexing and Performance Tuning
Indexing and Performance Tuning
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
MongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad Query
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
MongoDB (Advanced)
MongoDB (Advanced)MongoDB (Advanced)
MongoDB (Advanced)
 
RxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsRxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance Results
 
MySQL User Group NL - MySQL 8
MySQL User Group NL - MySQL 8MySQL User Group NL - MySQL 8
MySQL User Group NL - MySQL 8
 
Integration of neutron, nova and designate how to use it and how to configur...
Integration of neutron, nova and designate  how to use it and how to configur...Integration of neutron, nova and designate  how to use it and how to configur...
Integration of neutron, nova and designate how to use it and how to configur...
 
Cql – cassandra query language
Cql – cassandra query languageCql – cassandra query language
Cql – cassandra query language
 
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
 
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
Reverse Engineering Malicious Javascript
Reverse Engineering Malicious JavascriptReverse Engineering Malicious Javascript
Reverse Engineering Malicious Javascript
 

Similar to MongoDB Query Planner Empirical Method Chooses Best Execution Plan

Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodbDeep Kapadia
 
Our Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doOur Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doMetehan Çetinkaya
 
Indexing and Query Performance in MongoDB.pdf
Indexing and Query Performance in MongoDB.pdfIndexing and Query Performance in MongoDB.pdf
Indexing and Query Performance in MongoDB.pdfMalak Abu Hammad
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?Brent Ozar
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeIdo Shilon
 
Job Queues Overview
Job Queues OverviewJob Queues Overview
Job Queues Overviewjoeyrobert
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDBElieHannouch
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15MLconf
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConfXavier Amatriain
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systemsXavier Amatriain
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
 
Mongodb Performance
Mongodb PerformanceMongodb Performance
Mongodb PerformanceJack
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
complexity.pptx
complexity.pptxcomplexity.pptx
complexity.pptxDr.Shweta
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemMarsan Ma
 
Big data @ uber vu (1)
Big data @ uber vu (1)Big data @ uber vu (1)
Big data @ uber vu (1)Mihnea Giurgea
 
Spark Pitfalls meetup UnderscoreIL
Spark Pitfalls meetup UnderscoreILSpark Pitfalls meetup UnderscoreIL
Spark Pitfalls meetup UnderscoreILlioron22
 

Similar to MongoDB Query Planner Empirical Method Chooses Best Execution Plan (20)

Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodb
 
Our Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doOur Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.do
 
Indexing and Query Performance in MongoDB.pdf
Indexing and Query Performance in MongoDB.pdfIndexing and Query Performance in MongoDB.pdf
Indexing and Query Performance in MongoDB.pdf
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
Job Queues Overview
Job Queues OverviewJob Queues Overview
Job Queues Overview
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDB
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 
Lecture1
Lecture1Lecture1
Lecture1
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching Algorithms
 
Mongodb Performance
Mongodb PerformanceMongodb Performance
Mongodb Performance
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
complexity.pptx
complexity.pptxcomplexity.pptx
complexity.pptx
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking system
 
Big data @ uber vu (1)
Big data @ uber vu (1)Big data @ uber vu (1)
Big data @ uber vu (1)
 
Spark Pitfalls meetup UnderscoreIL
Spark Pitfalls meetup UnderscoreILSpark Pitfalls meetup UnderscoreIL
Spark Pitfalls meetup UnderscoreIL
 

More from Miguel Angel Nieto (14)

MySQL 5.6 GTID in a nutshell
MySQL 5.6 GTID in a nutshellMySQL 5.6 GTID in a nutshell
MySQL 5.6 GTID in a nutshell
 
MySQL - High Availability - Load Balacing - Cluster
MySQL - High Availability - Load Balacing - ClusterMySQL - High Availability - Load Balacing - Cluster
MySQL - High Availability - Load Balacing - Cluster
 
Curso SMTP avanzado
Curso SMTP avanzadoCurso SMTP avanzado
Curso SMTP avanzado
 
Apache avanzado
Apache avanzadoApache avanzado
Apache avanzado
 
Mysql Administracion
Mysql AdministracionMysql Administracion
Mysql Administracion
 
Replicación Mysql
Replicación MysqlReplicación Mysql
Replicación Mysql
 
Tomcat y Jboss
Tomcat y JbossTomcat y Jboss
Tomcat y Jboss
 
Curso SMTP
Curso SMTPCurso SMTP
Curso SMTP
 
Curso básico Linux
Curso básico LinuxCurso básico Linux
Curso básico Linux
 
Curso Squid avanzado
Curso Squid avanzadoCurso Squid avanzado
Curso Squid avanzado
 
Apache
ApacheApache
Apache
 
Nfs, Nis, DHCP
Nfs, Nis, DHCPNfs, Nis, DHCP
Nfs, Nis, DHCP
 
Monitorización
MonitorizaciónMonitorización
Monitorización
 
Administración Zimbra
Administración ZimbraAdministración Zimbra
Administración Zimbra
 

Recently uploaded

why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 

Recently uploaded (20)

why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 

MongoDB Query Planner Empirical Method Chooses Best Execution Plan

  • 1. Miguel Angel Nieto miguel.nieto@mongodb.com Technical Services Engineer, MongoDB Query Planner
  • 2. The Question ● I worked for 6 years as MySQL Technical Support Engineer. ● A large percentage of cases from customers were related to bad query plans/wrong index selection. ● Query Planning is a complex piece of code with many knobs that can be tuned. ● When I started working at MongoDB I found that the number of cases on that topic was very very (very) low. So I asked myself: Why?
  • 3. Plan selection in other databases ● Traditional databases use a statistics approach to choose the best plan: ○ The information about data distribution is not accurate. ○ It is estimated by reading data with random dives in the index tree (MySQL). ○ When some prerequisites are met (like number of modified rows) statistics are automatically recalculated.
  • 4. Plan selection in MongoDB ● MongoDB uses a empirical method: ○ If there is no cached plan, then all viable execution plans, based on the available indexes, are created. ○ MongoDB runs the query multiple times, one for each query plan and benchmarks them. It chooses the one that provides the best performance. ○ Once done, the plan is cached. ■ Future queries with the same shape will re-use this plan rather than re-running the candidate plans. ■ For each such query the performance of the cached plan is evaluated. If the plan's performance decreases beyond a given threshold, it is evicted from the cache and the candidate test phase runs again. This is known as re-planning (SERVER-15225)
  • 5. Benchmarking the plans ● All possible plans are executed in round-robin fashion. ● It gathers execution metrics and then provide a score to each plan. ● Sort the plans by score and choose the best one.
  • 6. Execution Metrics (I) ● These are the metrics: works advanced needTime isEOF
  • 7. Execution Metrics (II) ● Number of works: ■ The planner asks each plan for the next document, via a call to work(). ■ If the plan can supply a document, it responds with 'advanced'. Otherwise, the plan responds with 'needsTime'. ● If all documents have been retrieved, then isEOF = 1.
  • 8. Early stop of query execution ● The query could be expensive, so there are limits to early stop the execution. Execution stop if: ○ The maximum number of works has been reached. ○ The requested number of documents has been retrieved (advanced). ○ We get isEOF (the resultSet has no more documents). works work() isEOF advanced Break
  • 9. Number of work() calls before stopping ● internalQueryPlanEvaluationWorks = 10000 For large collections we take a fraction of the number of documents: ● internalQueryPlanEvaluationCollFraction = 0.3 Then, get the maximum value. internalQueryPlanEvaluationWorks internalQueryPlanEvaluationCollFraction numRecords works
  • 10. Number of documents to retrieve before stopping ● internalQueryPlanEvaluationMaxResults = 101 ● query.getQueryRequest().getNToReturn() ○ Used in the old OP_QUERY protocol. ○ Drivers set 'ntoreturn' to min('batchSize', 'limit') in order to fake the lack of 'limit' or 'batchSize' mechanism in the protocol. ● query.getQueryRequest().getLimit() ○ Used in OP_QUERY protocol from 3.2 onwards. getNToReturn advanced getNToReturn internalQueryPlanEvaluationMaxResults getLimit advanced getLimit internalQueryPlanEvaluationMaxResults advanced internalQueryPlanEvaluationMaxResults advanced
  • 11. Pick the best plan, count the scores ● baseScore = 1 ● Productivity = queryResults / workUnits ● TieBreak (very small number) = min(1.0 / (10 * workUnits), 1e-4) ● noFetchBonus (covered index) = TieBreak or 0 ● noSortBonus (blocking sort) = TieBreak or 0 ● noIxisectBonus (avoiding index intersection) = TieBreak or 0 ● tieBreakers = noFetchBonus + noSortBonus + noIxisectBonus ● eofBonus (if during plan execution all possible documents are retrieved) = 0 | 1
  • 12. Replanning: Automatic Plan Cache Eviction ● The stored data keep changing, it could possible that the cached plan is not the best one anymore. ● While the cached plan is being used, MongoDB re-runs the trial period for that plan and keeps a count of the work() function calls. ● If the new trial period takes more than 10 times as many works() as the original trial period, it evicts the plan from the cache and re-tests all candidate plans to pick a new winner. ● internalQueryCacheEvictionRatio = 10 maxWorksBeforeReplan internalQueryCacheEvictionRatio cachedWorks currentWorks maxWorksBeforeReplan replan()
  • 13. Plans are not always cached ● In the following situations, the execution plan is not cached: ○ Collection scan without sort() ○ hint() ○ min() ○ max() ○ explain() ○ Tailable cursors (they don’t use indexes) ○ snapshot() ○ A single viable plan
  • 14. Query Planner Troubleshoot Example (I) ● We check all query shapes: listQueryShapes
  • 15. Query Planner Troubleshoot Example (II) ● Get the execution plan for that query: getPlansByQuery solution score works isEOF
  • 16. Query Planner Troubleshoot Example (III) ● Remove the query plan for a particular query: clearPlansByQuery "plans": [ ]
  • 17. Query Planner Troubleshoot Example (IV) ● Remove all query plans on a particular collection: clear
  • 18. Query Planner Troubleshoot ● There are Plan Cache methods that can be used for troubleshooting: https://docs.mongodb.com/manual/reference/method/js-plan-cache/ ● Check all query shapes: ○ db.collection.getPlanCache().listQueryShapes() ● Get the plan for a particular query: ○ db.collection.getPlanCache().getPlansByQuery( <query>, <projection>, <sort> ) ● Clean the plans for a particular query: ○ db.collection.getPlanCache().clearPlansByQuery() ● Clean all plans: ○ db.collection.getPlanCache().clear()