About SlicingDice Technology
Learn more about which technologies power SlicingDice, and how it handles your data.
SlicingDice uses a few third-party services in its operations. These services and their purposes are listed below.
- API - Python-based API developed to handle all SlicingDice's orchestration upon request.
- ShannonDB - The home-grown analytics database that stores and queries all the data.
- Apache Kafka - Message broker and buffer for data received from API.
- Apache Zookeeper - Distributed tool used for managing shared configurations and storing their metadata.
- MySQL - Relational database used to store some customer information, metadata and permissions.
- Aerospike - Key-Value NoSQL database that stores API information that needs fast responses. Also used to cache queries performed and the results from saved queries.
- Backblaze B2 - External storage provider used for backup.
Network and monitoring
- CloudFlare - Used to DNS, Load Balance and DDoS protection.
- Pingdom - Used to test and monitor API and internal services availability.
- Sentry - Used to report errors automatically.
- PagerDuty - Used to test and monitor internal services and dispatch support engineers.
- Stripe - Used to handle customer billing information and processing.
- BigML - Machine Learning platform embedded on SlicingDice that provides a selection of robustly-engineered ML algorithms proven to solve real world problems.
As SlicingDice developed its own physical database technology from the ground up, all data is stored in its own hash and encoded binary format, making it harder for any non-authorized access to the original format of the stored data.
From an infrastructure perspective, SlicingDice strictly follows recommended approaches in server hardening and sensible information management. All servers are accessed exclusively using KVM over IP provided by its infrastructure partners.
It also stores all data on external object store providers, such as Backblaze B2. In case all the servers go down, It still keeps all data safely stored.
SlicingDice is compliant with the highest security standards and regulations.
Infrastructure and Redundancy
SlicingDice Physical Data Warehouse uses several infrastructure providers, such as OVH, Hetzner, Amazon Web Services, Alibaba and Microsoft Azure.
As it uses bare metal dedicated servers for cost and performance reasons, server nodes can fail at any time. This means that it's absolutely necessary for SlicingDice's operations to have data redundancy. SlicingDice currently achieves a high level of redundancy and availability by:
- Replicating its customer's data across 3 different data centers (or availability zones);
- Making hourly backups and storing it on local backup server;
- Storing a full daily copy of its backed up data on remote backup providers.
Besides having all these redundancy measures, SlicingDice also constantly perform unexpected actions and shutdowns on our production environment, similar to the Netflix Chaos Monkey approach, in order to test the resiliency of its services.
Data durability is one of the hardest things to guarantee in databases. There are many databases that claim to be ACID, but in reality are not.
Wrong or incomplete query answers can lead to wrong business decisions, which can end up being really expensive and damaging. Because of that, SlicingDice adopts several measures in order to assure the data durability
Every time you send an insertion request to your database, SlicingDice's platform (API) receives it and immediately sends it to one of our Kafka clusters. The platform will hold the insertion request confirmation until it is able to confirm that your insertion request was correctly stored on at least three nodes (3 replicas) from at least two of the Kafka clusters, one from the same datacenter/availability zone that received the insertion request and another cluster on a remote/different datacenter/availability zone.
SlicingDice currently has several independent data centers from different providers, in different countries and different availability zones, that operate simultaneously in a high-availability configuration. That means that two data centers or availability zones can fail and the service will continue to support data insertion and querying.
Once your insertion was correctly inserted on one of the ShannonDB nodes, your data is automatically replicated to another two nodes, located on the other two datacenters or availability zones.
Additionally, SlicingDice constantly performs remote backups of all data stored on it, so in a event of major hardware failures affecting all its datacenters, it is still able recover all data.
Unfortunately, data and database corruption are very common while moving or modifying it, for all types of databases and technology providers. But this is not acceptable for SlicingDice.
SlicingDice's data durability testing framework
The code coverage for SlicingDice and ShannonDB is higher than 98% and it is taken very seriously in the development process. To reach 98%, SlicingDice development team has taken a radical approach: build a database testing framework to be used as the source of truth when validating its system.
ShannonDB was built to perform analytical queries, so the team didn't know in advance what the users queries would look like. For example:
- How many columns they would use in a query;
- What combination of column types they would use in a same query;
- What if they try to make multiple boolean operation on top of multiple time-series columns, also combining non-time-series columns, how the system would behave.
So the team decided to build a database testing framework, that is basically a simpler and lighter version of the ShannonDB database that could generate testing data and also store them for comparison purposes.
The database testing framework works like this:
- Define the types of columns to test, how many different values to be inserted (whether they will be really used in queries or just be there to stress the system) and finally for how many Entity IDs this generated data will be inserted to.
- For each type of column defined, the database testing framework will first generate all the data and send it to be inserted on ShannonDB, also storing for itself a copy of the generated data for further comparison purposes.
- Once the all the data was completely inserted on ShannonDB, the framework then automatically generate all the possible combinations of supported queries based on the columns declared previously.
- These queries will then be issued to ShannonDB and the obtained results compared to the expected results based on the data stored on the test database.
- In order to the ShannonDB version be declared ready for production, it had to be tested with all the existing column types and supported query operations. If a single query failed with a difference of even a single ID, the version is rejected until correction.
Numbers of the testing framework. Test configurations:
- Entity IDs: 1,000
- Matched Values: 1,000
- Garbage Values: 1,000
- Column Types: All
- Query Types: All
- Days: 7 (distributing the generated data in 7 different days, as this affects the time-series queries)
- 3,646,986 unique insertion messages sent to ShannonDB (520,998 messages per day)
- 45,696 unique queries, each expecting a different result (6,528 queries per day)
========== Insertion Statistics ========== INFO: Quantity of insertion commands: 520998 INFO: Quantity of columns inserted: 4164994 INFO: Quantity of columns per type: string_test_column: 440000 time_series_decimal_test_column: 494998 time_series_string_test_column_2: 16000 boolean_test_column: 456000 decimal_not_overwrite_test_column: 4000 time_series_decimal_test_column_2: 16000 time_series_numeric_test_column: 494998 bitmap_test_column: 120000 numeric_not_overwrite_test_column: 4000 numeric_test_column: 482000 string_not_overwrite_test_column: 4000 time_series_string_test_column: 464998 decimal_test_column: 258000 range_test_column: 456000 uniqueid_test_column: 208000 date_not_overwrite_test_column: 4000 date_test_column: 222000 time_series_numeric_test_column_2: 16000 bitmap_not_overwrite_test_column: 4000
The team inserted data and ran queries for multiple days and in between also tested other things that could affect consistency, such as: restarting the server, moving shards between nodes, killing the process unsafely ( kill -9) and so on.