Query Engine
Verifiable database results
Overview
Our Query Engine is an essential component in our data processing system, acting as an intermediary between enterprise clients and data storage. Its primary function is to interpret and transform incoming queries into a format that can be efficiently executed against the underlying data warehouse and proof system.
Query Transformation
The Query Engine is responsible for converting high-level, composite client queries into simpler and more manageable units, making use of set operations such as unions and intersections.
Key Functions:
- Decomposition: Breaks down complex queries into simpler sub-queries that can be more easily processed.
- Optimization: Analyzes queries to determine the execution plan, involving reordering operations.
- View Transformation: Translates external representations of data (as understood by the client) into the corresponding internal representations used within the data storage system.
Transformation Example:
Given a user-facing query such as โcount users where age > 18,โ the Query Engine performs the following transformations:
- Data of Birth calculation: Converts the
age
comparison into adate of birth
comparison, as the internal data storage uses dates of birth rather than ages. - Date Conversion: Translates the
age
threshold into the correspondingdate
threshold. For example:- External View:
Age > 18
- Internal View:
Date of Birth < 2005-08-12
(assuming the current year is 2023)
- External View:
Benefits
- Efficiency: By breaking down queries, the Query Engine can reduce computational overhead and improve response times serving both Warehouse and Proof System components
- Flexibility: The Query Engine can handle a variety of query types and structures, accommodating diverse client needs.
Query Cache
Our Query Engine also utilizes a Query Cache to enhance the speed and efficiency of query processing. It works by storing a salted hash of each query alongside its corresponding result. This cache configuration allows us to quickly retrieve the result for recurring queries without the need to perform the entire computation again. The cached data is transient, with a refresh / expiration time. This short lifecycle ensures the cache remains fresh and is eventually consistent with the underlying data. It reduces latency and increases the responsiveness of our system, especially for frequently requested queries.