{"id":35701,"date":"2021-07-08T11:37:52","date_gmt":"2021-07-08T08:37:52","guid":{"rendered":"https:\/\/orbitsoft.com\/blog\/?p=35701"},"modified":"2023-06-30T16:06:40","modified_gmt":"2023-06-30T13:06:40","slug":"high-load-systems-development-for-data-processing","status":"publish","type":"post","link":"https:\/\/orbitsoft.com\/blog\/high-load-systems-development-for-data-processing\/","title":{"rendered":"High-load systems development for data processing"},"content":{"rendered":"\n<p>If a business makes money on the Internet, it is always &#8211; in one way or another &#8211; collecting and analyzing data, for example, types of products that users look at or skip. Or who is watching a video together, with the information about the type of device and the time of day. Such data is needed to understand how to develop one\u2019s product.<\/p>\n\n\n\n<p>The more data there is, the more difficult it is to process. Because of this, high-load systems become necessary. Using an example from our development team, we\u2019ll tell you how we design such solutions. In April, our system processed 2.4 billion impressions and 408 million clicks with no failures.<\/p>\n\n\n<div class=\"wp-block-lazyblock-heading lazyblock-heading-2RCuO\"><h2 class=\"article__h\">The business task is to increase conversion from content display. <\/h2><\/div>\n\n\n<p>One of the ways to make money for an online business is to manage content so that users take targeted actions as often as possible. Content in this case is a general designation. It can mean:<\/p>\n\n\n\n<ul>\n<li>videos<\/li>\n\n\n\n<li>news<\/li>\n\n\n\n<li>goods or services<\/li>\n<\/ul>\n\n\n\n<p> The target action also depends on the business. In one case it could be a review of a certain number of videos. In another \u2013 it might be subscription to trial lessons from the mailing list.<\/p>\n\n\n\n<p>Content management requires data on how the user works with the content. Maybe one skips the products cards with the highest margins for the company, or studies them, but doesn\u2019t put them in the basket. All this is needed to understand how and what to show to the user.<\/p>\n\n\n\n<p>A block of data gives us little information by itself. It needs to be saved and analyzed. For example, you can:<\/p>\n\n\n\n<ul>\n<li>collect statistical reports according to data segments and metrics<\/li>\n\n\n\n<li>compare these metrics for different periods<\/li>\n\n\n\n<li>group data by categories: countries, cities, device types, or time of day<\/li>\n<\/ul>\n\n\n\n<p>Without such details, it\u2019s difficult to determine why some content is not working as planned, and what to do to solve the problem.<\/p>\n\n\n\n<p>When there is not a lot of data, it can be processed manually, or with the help of simple programs. But with the growth of the user database, the amount of data also increases, so it becomes no longer possible to process, let\u2019s say, a billion pieces of data in this way.<\/p>\n\n\n\n<p>To process a larger load, data collection and analysis systems are being designed. The main requirement for them is to withstand high loads.<\/p>\n\n\n<div class=\"wp-block-lazyblock-heading lazyblock-heading-1HPiRw\"><h2 class=\"article__h\">OrbitSoft Experience: how data collection and analysis system operates <\/h2><\/div>\n\n\n<p>We have experience in running systems that handle such high loads. Therefore, using the example of one of our projects, we will show our approach to the development.<\/p>\n\n\n\n<p><strong>About the Project.<\/strong> Every day the ad network registers events: bids that users place, ad impressions, ad clicks. In order for analysts to work with data, it needs to be brought into a readable format &#8211; processed and loaded into a program with a clear interface.<\/p>\n\n\n\n<p><strong>System load.<\/strong> In April 2021 the ad network system processed:<\/p>\n\n\n\n<ul>\n<li>2,4 billion impressions<\/li>\n\n\n\n<li>408 million clicks<\/li>\n\n\n\n<li>1240 conversions<\/li>\n<\/ul>\n\n\n\n<p>Analysis. To design the system, we looked at the type and amount of data, the predictable load increase, the results to be obtained, and budget and resource limitation. After that, the development began.<\/p>\n\n\n<div class=\"wp-block-lazyblock-steps lazyblock-steps-Z1fwKLH\"><div class=\"styled-block\">\n  <div class=\"styled-block__main\">\n          <h3 class=\"styled-block__title\">\n        Algorithm of data collection and analysis in the ad network      <\/h3>\n        <ul class=\"steps__list\">\n              <li class=\"steps__item\">\n          <div class=\"steps__title\">\n            Clicks, bids, impressions          <\/div>\n                            <\/li>\n              <li class=\"steps__item\">\n          <div class=\"steps__title\">\n            File creation and compression          <\/div>\n                            <\/li>\n              <li class=\"steps__item\">\n          <div class=\"steps__title\">\n            Statistics calculation          <\/div>\n                            <\/li>\n              <li class=\"steps__item\">\n          <div class=\"steps__title\">\n            Recording of the results          <\/div>\n                            <\/li>\n              <li class=\"steps__item\">\n          <div class=\"steps__title\">\n            Data is transferred to MySQL          <\/div>\n                            <\/li>\n          <\/ul>\n  <\/div>\n  <\/div><\/div>\n\n\n<p><strong>System operation scheme:<\/strong><\/p>\n\n\n\n<ul>\n<li>The advertising platform registers events with Apache Kafka.<\/li>\n\n\n\n<li>Collector Service monitors the queue of events in Apache Kafka, adds more data to event information, and saves event information to .csv files.<\/li>\n\n\n\n<li>After completing this process, Collector Service compresses it and loads this file into HDFS, in a special directory for input data.<\/li>\n\n\n\n<li>Every 5 minutes Stats Service checks for new raw files in the HDSF input directory.<\/li>\n\n\n\n<li>Apache Hadoop calculates statistics based on input data and writes the results to files in a special directory on HDFS.<\/li>\n\n\n\n<li>Stats Server picks up the results of calculating statistics from HDFS and exports the data to MySQL.<\/li>\n\n\n\n<li>Stats Server moves the processed input files to an archive located also on HDFS. Files are grouped by day &#8211; this is necessary so that statistics can be recalculated for a certain period if there were any errors in the scripts.<\/li>\n<\/ul>\n\n\n<div class=\"wp-block-lazyblock-figure lazyblock-figure-ZB0gF7\"><figure class=\"article__figure \">\n        <div class=\"article__figure-img\" >\n        <img decoding=\"async\" src=\"https:\/\/orbitsoft.com\/blog\/wp-content\/uploads\/2021-05-16_22-39-49.png\" alt=\"\">\n    <\/div>\n                <figcaption>Such a system can be used both in hardware and in the cloud.<\/figcaption>\n    <\/figure><\/div>\n\n\n<p><strong>Results.<\/strong> All components of the architecture are horizontally scalable and provide fault tolerance.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Component<\/strong><\/td><td><strong>What it provides<\/strong><\/td><\/tr><tr><td>Apache Kafka Distributed Message Broker.<\/td><td>Processes up to several hundred thousand events per second.<br>Messages get through even if one of the servers goes down<br>Adapts to the growth of the load: it\u2019s easy to add new servers<\/td><\/tr><tr><td>Collector Service daemon in Go<\/td><td>Processes tens of thousands of messages per second<br>Adapts to the growth of the load: it can run several copies of the service in parallel and distribute work between them<\/td><\/tr><tr><td>Hadoop Distributed File System<\/td><td>Stores data securely <br>Gives the data that was asked for Records what was sent to the system. <br>Replicated blocks across data nodes will not lose data<\/td><\/tr><tr><td>Stats Server daemon in Java<\/td><td>Fast computation speed with Map Reduce computation model: the data array is divided into parts; each part is processed simultaneously and gathered in to one<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n<div class=\"wp-block-lazyblock-important-block lazyblock-important-block-dhik0\"><div class=\"important-box\">\n  <div class=\"important-box__main\">\n          <p class=\"important-box__h\">Experts will answer your questions<\/p>\n      \t<p><span style=\"color: #333333;\"><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">OrbitSoft experts answer questions from developers, business owners and managers. Ask about everything, even if you are just wondering.<\/span><\/span><\/span><\/p>\n<p><span style=\"color: #333333;\"><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">If you want us to analyze your situation or share our experience, write to <\/span><\/span><\/span><a href=\"mailto:Anna.mandrikina@orbitsoft.com\"><span style=\"color: #2987fa;\"><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">anna.mandrikina@orbitsoft.com.<\/span><\/span><\/span><\/a><\/p>  <\/div>\n  <\/div><\/div>\n\n<div class=\"wp-block-lazyblock-banner lazyblock-banner-Z2p5LGX\"><div \n  class=\"banner\n   \n  \" \n  >\n    <div class=\"banner__body\">\n        <h2 class=\"banner__h\">How can we\u00a0help?<\/h2>\n        <div class=\"banner__content\">\n            <p><strong>Artificial intelligence technology.<\/strong>\u00a0This technology helps customize ad\u00a0impressions for your audience, analyzes large amounts of\u00a0data fast and error-free, and much more.<\/p>\n<p><strong>Advertising management software<\/strong> &#8211; ready-made or\u00a0customed business solutions.<\/p>\n<p><strong>Ad\u00a0buying solutions:<\/strong>\u00a0DSP, SSP, and Ad\u00a0Exchange.<\/p>\n<p><strong>Development of\u00a0various products.<\/strong>\u00a0For example, an\u00a0application, an\u00a0online store, a payment system, or\u00a0a\u00a0video platform.<\/p>\n<p>There are over 100 digital specialists on\u00a0our team: developers, testers, designers, and project managers. We\u00a0have the resources to\u00a0work with systems of\u00a0any complexity.<\/p>        <\/div>\n                            <div \n              class=\"banner__button button js-form-modal\n               button_style_light-on-promo2\">\n              Order development                          <\/div>\n            <\/div>\n    <div class=\"banner__photo\">\n        <img decoding=\"async\" src=\"https:\/\/orbitsoft.com\/blog\/wp-content\/uploads\/friendship.png\" alt=\"\" class=\"banner__img\">\n    <\/div>\n<\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>If a business makes money on the Internet, it is always &#8211; in one way or another &#8211; collecting and analyzing data, for example, types of products that users look at or skip. Or who is watching a video together, with the information about the type of device and the time of day. Such data [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":35707,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[195],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>High-load systems development for data processing - OrbitSoft Blog<\/title>\n<meta name=\"description\" content=\"Case of high-load Ad management system developing: in a month it processed 2.4 billion impressions and 408 million clicks with no failures\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/orbitsoft.com\/blog\/high-load-systems-development-for-data-processing\/\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"High-load systems development for data processing - OrbitSoft Blog\" \/>\n<meta name=\"twitter:description\" content=\"Case of high-load Ad management system developing: in a month it processed 2.4 billion impressions and 408 million clicks with no failures\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/orbitsoft.com\/blog\/wp-content\/uploads\/Frame-16.jpg\" \/>\n<meta name=\"twitter:creator\" content=\"@orbitsoft\" \/>\n<meta name=\"twitter:site\" content=\"@orbitsoft\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"High-load systems development for data processing - OrbitSoft Blog","description":"Case of high-load Ad management system developing: in a month it processed 2.4 billion impressions and 408 million clicks with no failures","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/orbitsoft.com\/blog\/high-load-systems-development-for-data-processing\/","twitter_card":"summary_large_image","twitter_title":"High-load systems development for data processing - OrbitSoft Blog","twitter_description":"Case of high-load Ad management system developing: in a month it processed 2.4 billion impressions and 408 million clicks with no failures","twitter_image":"https:\/\/orbitsoft.com\/blog\/wp-content\/uploads\/Frame-16.jpg","twitter_creator":"@orbitsoft","twitter_site":"@orbitsoft","twitter_misc":{"Written by":"admin","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/orbitsoft.com\/blog\/high-load-systems-development-for-data-processing\/","url":"https:\/\/orbitsoft.com\/blog\/high-load-systems-development-for-data-processing\/","name":"High-load systems development for data processing - OrbitSoft Blog","isPartOf":{"@id":"https:\/\/orbitsoft.com\/blog\/#website"},"datePublished":"2021-07-08T08:37:52+00:00","dateModified":"2023-06-30T13:06:40+00:00","author":{"@id":"https:\/\/orbitsoft.com\/blog\/#\/schema\/person\/e515b3fa91e283750477594c4f028b7b"},"description":"Case of high-load Ad management system developing: in a month it processed 2.4 billion impressions and 408 million clicks with no failures","breadcrumb":{"@id":"https:\/\/orbitsoft.com\/blog\/high-load-systems-development-for-data-processing\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/orbitsoft.com\/blog\/high-load-systems-development-for-data-processing\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/orbitsoft.com\/blog\/high-load-systems-development-for-data-processing\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/orbitsoft.com\/blog\/"},{"@type":"ListItem","position":2,"name":"High-load systems development for data processing"}]},{"@type":"WebSite","@id":"https:\/\/orbitsoft.com\/blog\/#website","url":"https:\/\/orbitsoft.com\/blog\/","name":"OrbitSoft Blog","description":"Discover the latest in news and resources for OrbitSoft","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/orbitsoft.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/orbitsoft.com\/blog\/#\/schema\/person\/e515b3fa91e283750477594c4f028b7b","name":"admin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/orbitsoft.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/b1b269c579caf059f82b6d114c63fc49?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b1b269c579caf059f82b6d114c63fc49?s=96&d=mm&r=g","caption":"admin"},"url":"https:\/\/orbitsoft.com\/blog\/author\/admin\/"}]}},"_links":{"self":[{"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/posts\/35701"}],"collection":[{"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/comments?post=35701"}],"version-history":[{"count":12,"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/posts\/35701\/revisions"}],"predecessor-version":[{"id":36591,"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/posts\/35701\/revisions\/36591"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/media\/35707"}],"wp:attachment":[{"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/media?parent=35701"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/categories?post=35701"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/tags?post=35701"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}