Just like many other software companies, we frequently run into the issue of collecting, storing and analyzing statistics. As our clients know, our products are capable of registering a large number of occurrences every second. For this very reason, we were very interested in the new ClickHouse product that has recently entered the market.
ClickHouse is a tool developed by the Russian IT company called Yandex to meet the challenges of Yandex.Metrica, the second-largest web analytics service in the world. ClickHouse allows analysis of data that is updated in real time. It manages extremely large volumes of data in a stable and sustainable manner. Since becoming publically available, the product has been actively expanding its capacities based on users’ requests.
So what are the main advantages of ClickHouse? Well, it allows users to…
- Run more queries in the same amount of time
- Test more hypotheses
- Slice and dice your data in many more new ways
- Look at your data from new angles
- Discover new dimensions
ClickHouse permits companies to add servers to their clusters when necessary, without investing additional time or money. This easily adaptable tool scales well both vertically and horizontally.
You may be wondering what sort of a solution ClickHouse would offer if it needed to add a new dimension or metric to its aggregated structure. Well, the dimension can be added easily enough, but what will happen to the previous periods, since you will most likely not have the original data for recounts?
ClickHouse has answered these questions and shared their philosophy about the necessity of keeping all data. For this purpose, they are offering great compression technology, as well as incredible speeds for processing requests. Their site has benchmarks, which allow the users to compare ClickHouse’s performance with that of its competitors: https://clickhouse.yandex/benchmark.html
All in all, ClickHouse seems to be a one-stop answer for many common issues – the volume of data collecting, speed of processing requests, the renewal and accessibility of data, and so on.
Some users may be turned off by the fact that ClickHouse does not permit users to delete or change data. Yes, data can be deleted through partitions, but they are made up of a month’s worth of data – and that’s a huge stratum of data for large systems. Furthermore, ClickHouse does not offer transactions. When fulfilling a request, during aggregation, it’s imperative that the results fit into the working memory on a single server. So be prepared for hardware with large working memory.
ClickHouse also has a fantastic mechanism of working with outside dictionaries, using different sources (example: CSV file, MySQL database, MongoDB or any other ODBC source). However, be careful! For example, the dictionary identifier should be a number that fits into Ulnt64.
All in all, this new product is showing a lot of promise. Even though its community is only just developing, it already offers great capabilities for storing and analyzing statistics. If you have lots of stats – definitely give this tool a try!