{"id":36475,"date":"2023-04-26T10:58:01","date_gmt":"2023-04-26T07:58:01","guid":{"rendered":"https:\/\/orbitsoft.com\/blog\/?p=36475"},"modified":"2023-12-13T12:44:45","modified_gmt":"2023-12-13T09:44:45","slug":"images-quality","status":"publish","type":"post","link":"https:\/\/orbitsoft.com\/blog\/images-quality\/","title":{"rendered":"How data science saved hundreds of hours of work time"},"content":{"rendered":"<div class=\"wp-block-lazyblock-case lazyblock-case-5USMX\"><div class=\"styled-block\">\n  <div class=\"styled-block__main\">\n          <h3 class=\"styled-block__title\">\n        In brief      <\/h3>\n        <ul class=\"case__list\">\n            \n                    <li class=\"case__item\">\n              \n          <span class=\"case__order\">01<\/span>\n          <div class=\"case__body\">\n            <div class=\"case__title\">\n              <span>Customer<\/span>\n            <\/div>\n            <p><span style=\"font-weight: 400;\">Video aggregator, generates income from ad impressions<\/span><\/p>          <\/div>\n        <\/li>\n            \n                    <li class=\"case__item\">\n              \n          <span class=\"case__order\">02<\/span>\n          <div class=\"case__body\">\n            <div class=\"case__title\">\n              <span>Problem<\/span>\n            <\/div>\n            <p><span style=\"font-weight: 400;\">Each video has several preview images. Some of them are blurry. Users do not open such videos, and do not watch ads in them, and the company loses income<\/span><\/p>          <\/div>\n        <\/li>\n            \n                    <li class=\"case__item\">\n              \n          <span class=\"case__order\">03<\/span>\n          <div class=\"case__body\">\n            <div class=\"case__title\">\n              <span>Task<\/span>\n            <\/div>\n            <p><span style=\"font-weight: 400;\">Apply data science methods: develop a Python module that sorts images by quality to find and remove all poor-quality images<\/span><\/p>          <\/div>\n        <\/li>\n            \n                    <li class=\"case__item\">\n              \n          <span class=\"case__order\">04<\/span>\n          <div class=\"case__body\">\n            <div class=\"case__title\">\n              <span>Result <\/span>\n            <\/div>\n            <ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Conducted research and proposed a principle of image quality assessment. Together with the customer, we determined the criteria, i.e., which previews are considered high-quality, and which are not<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Developed a Python module that sorts images by degree of blur<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Module analyzed 50 million images: found 4.7 million in very good quality. The rest are of normal quality<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Spent a week from technical specification to result<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">The customer was able to remove all low-quality previews from the site and increase the number of views<\/span><\/li>\n<\/ul>          <\/div>\n        <\/li>\n          <\/ul>\n  <\/div>\n  <\/div><\/div>\n\n<div class=\"wp-block-lazyblock-heading lazyblock-heading-Z1Oqp0I\"><h2 class=\"article__h\">Customer: video aggregator <\/h2><\/div>\n\n\n<p>The company owns a video platform that collects entertaining videos from many other sites. When a visitor selects a video to watch, the aggregator redirects them to the content owners\u2019 website. The platform is rewarded when users watch ads on source sites.<\/p>\n\n\n<div class=\"wp-block-lazyblock-figure lazyblock-figure-1JvvoE\"><figure class=\"article__figure \">\n        <div class=\"article__figure-img\" >\n        <img decoding=\"async\" src=\"https:\/\/orbitsoft.com\/blog\/wp-content\/uploads\/\u0421\u043d\u0438\u043c\u043e\u043aeng.png\" alt=\"Screenshot from the platform\">\n    <\/div>\n                <figcaption><em>The company does not shoot its own videos, but collects videos from other sites.<\/em><\/figcaption>\n    <\/figure><\/div>\n\n\n<p>There are over 13 million videos hosted on the platform. From content owners, the company receives images for previews. Each video has from 1 to 10 of them, more than 50 million images in total.<\/p>\n\n\n<div class=\"wp-block-lazyblock-heading lazyblock-heading-2uSwsu\"><h2 class=\"article__h\">Issue: Some of the video preview images are of poor quality <\/h2><\/div>\n\n\n<p>Some of the previews were of poor quality, and the platform didn\u2019t perform any verification before downloading. Users didn\u2019t click on videos with blurry previews, and therefore didn\u2019t watch ads, and the company was losing revenue.<\/p>\n\n\n\n<p>The customer decided to remove all low-quality images, but doing so manually in such quantity is labor-intensive. If you looked at each image for even just one second, it would take about 14 thousand hours. The customer turned to OrbitSoft to automate the process.<\/p>\n\n\n<div class=\"wp-block-lazyblock-heading lazyblock-heading-3vc5b\"><h2 class=\"article__h\">Solution: developed a Python module that assessed image quality  <\/h2><\/div>\n\n\n<p>High-quality images are clear, low-quality images are blurry. Our task was to teach the algorithm to mathematically determine the degree of image blur. We started by studying specialized literature regarding this issue.<\/p>\n\n\n\n<p>For example, in the article <a href=\"http:\/\/isp-utb.github.io\/seminario\/papers\/Pattern_Recognition_Pertuz_2013.pdf\" rel=\"nofollow\">Analysis of focus measure operators in shape<\/a> almost 36 methods for calculating the blur index are considered. We noted for ourselves options that are easier to implement.<\/p>\n\n\n\n<p>A good solution is described in the article Diatom autofocusing in brightfield microscopy: A comparative study. In this variant, a single image channel is taken, and the absolute value of the Laplace operator is calculated. Based on this method, we developed a Python module that determines the amount of blur in an image.<\/p>\n\n\n<div class=\"wp-block-lazyblock-heading3 lazyblock-heading3-26UoXz\"><h3 class=\"article__h3\">How the module determines the amount of blur<\/h3><\/div>\n\n\n<p>Each picture is made up of pixels. If the image is blurry, a smooth gradient is obtained on the border of objects. That is, the color of neighboring pixels changes gradually. If the image is clear, then the boundaries between objects are clear, and the color value of neighboring pixels on the border of objects changes sharply.&nbsp;&nbsp;<\/p>\n\n\n<div class=\"wp-block-lazyblock-figure lazyblock-figure-Zcn52s\"><figure class=\"article__figure \">\n        <div class=\"article__figure-img\" >\n        <img decoding=\"async\" src=\"https:\/\/orbitsoft.com\/blog\/wp-content\/uploads\/\u0440\u0430\u0437\u043c\u044b\u0442\u0438\u0435.png\" alt=\"Blurry and clear images comparison \">\n    <\/div>\n                <figcaption><em>The top image is blurry: it is difficult to find the border between the shirt and the background. The color of the pixels smoothly flows from dark red to light red. The bottom image is clear: there is a border between the shirt and the background. Neighboring pixels are very different in color<\/em><\/figcaption>\n    <\/figure><\/div>\n\n\n<p>To determine the blurriness of an image, the Python module calculates the magnitude of the color gradient using the Laplace operator:<\/p>\n\n\n\n<p>1. Takes one color channel of an image and wraps it with the following core:<\/p>\n\n\n<div class=\"wp-block-lazyblock-figure lazyblock-figure-ZRhSpE\"><figure class=\"article__figure  article__figure_no-shadow\">\n        <div class=\"article__figure-img\" >\n        <img decoding=\"async\" src=\"https:\/\/orbitsoft.com\/blog\/wp-content\/uploads\/\u044f\u0434\u0440\u043e.png\" alt=\"3x3 Laplace core\">\n    <\/div>\n        <\/figure><\/div>\n\n\n<p>2. Calculates the variance of the result, i.e., the standard deviation squared.<\/p>\n\n\n\n<p>3. A coefficient is obtained by which one can judge the degree of blurring: the sharper the image, the greater the coefficient.<\/p>\n\n\n\n<p>All previews on the platform are colored, and each color has its own gradient. When there are many colors, matching takes a long time. Therefore, we decided to convert images to black and white monochrome.&nbsp;<\/p>\n\n\n<div class=\"wp-block-lazyblock-heading3 lazyblock-heading3-ZlTJRc\"><h3 class=\"article__h3\">How the module converts images to monochrome<\/h3><\/div>\n\n<div class=\"wp-block-lazyblock-figure lazyblock-figure-Z1tNAjU\"><figure class=\"article__figure \">\n        <div class=\"article__figure-img\" >\n        <img decoding=\"async\" src=\"https:\/\/orbitsoft.com\/blog\/wp-content\/uploads\/color-wheels-to-black.jpg\" alt=\"Color wheel\">\n    <\/div>\n                <figcaption><em>To evaluate blurring of a color picture, you need to determine the gradation of each of the colors. In monochrome images, only gray is used.<\/em><\/figcaption>\n    <\/figure><\/div>\n\n\n<p>To convert color images to black and white, it was necessary to connect a library that can work with color. We tested several Python libraries and settled on OpenCV.&nbsp;<\/p>\n\n\n\n<p>OpenCV is a computer vision library with which you can process, analyze, and classify images. It supports all popular file formats, and works well with image resolution, just what we needed for our task.<\/p>\n\n\n<div class=\"wp-block-lazyblock-heading3 lazyblock-heading3-Z1nXrBV\"><h3 class=\"article__h3\">How the module evaluates image quality<\/h3><\/div>\n\n\n<p>The result of the algorithm is a coefficient that takes a value from 9 to 2000. The higher the value, the better the image quality.&nbsp;<\/p>\n\n\n\n<p>An important point was to determine the threshold value of this coefficient: which images are considered sufficiently clear, and which are already blurry. We agreed with the customer that we will consider quality previews to be those with a coefficient of more than 100.<\/p>\n\n\n<div class=\"wp-block-lazyblock-figure lazyblock-figure-Z1Ja4C\"><figure class=\"article__figure \">\n        <div class=\"article__figure-img\" >\n        <img decoding=\"async\" src=\"https:\/\/orbitsoft.com\/blog\/wp-content\/uploads\/photo_2023-02-03_09-41-14.jpeg\" alt=\"script \">\n    <\/div>\n                <figcaption><em>The script analyzed the quality of each preview, adjusting for the number of pixels, and calculated images with a factor above and below the threshold<\/em><\/figcaption>\n    <\/figure><\/div>\n\n<div class=\"wp-block-lazyblock-heading lazyblock-heading-Z1HGH6q\"><h2 class=\"article__h\">Result: the module found 4.7 million poor quality images, and 4.5 million excellent ones <\/h2><\/div>\n\n<div class=\"wp-block-lazyblock-figure lazyblock-figure-bj29J\"><figure class=\"article__figure \">\n        <div class=\"article__figure-img\" >\n        <img decoding=\"async\" src=\"https:\/\/orbitsoft.com\/blog\/wp-content\/uploads\/tg_image_1134922659eng.png\" alt=\"Scheme\">\n    <\/div>\n                <figcaption><em>How the image quality evaluation module works:<\/em><br>1. <em>Converts the previews uploaded to the platform to monochrome<\/em><br><em>2. Calculates the blur factor\u00a0<\/em><br><em>3. Distributes images into one of two groups, depending on whether the coefficient is above or below the threshold value<\/em><\/figcaption>\n    <\/figure><\/div>\n\n\n<p>Data science methods have allowed us to automate image quality assessment. Manually, it would take more than 14 thousand working hours. It took us less than 40 hours, from setting the task to obtaining the result.<\/p>\n\n\n\n<p>The python module analyzed 50 million images and divided them into groups:&nbsp;<\/p>\n\n\n\n<ol>\n<li>4.7 million blurry images: ratio from 0 to 100<\/li>\n\n\n\n<li>The rest are quite clear, of which 4.5 million are very good quality, where the coefficient is more than 1000<\/li>\n<\/ol>\n\n\n\n<p>Over time, the customer removed all images of poor quality from the site. The number of views on the platform increased.&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The company owns a video platform that collects entertaining videos from many other sites. When a visitor selects a video to watch, the aggregator redirects them to the content owners\u2019 website. The platform is rewarded when users watch ads on source sites. There are over 13 million videos hosted on the platform. From content owners, [&hellip;]<\/p>\n","protected":false},"author":214,"featured_media":36486,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[196],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How data science saved hundreds of hours of work time - OrbitSoft Blog<\/title>\n<meta name=\"description\" content=\"We developed a module that analyzed 50MM images and found all the blurry ones, a task that manually would have taken more than 350 weeks\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/orbitsoft.com\/blog\/images-quality\/\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"How data science saved hundreds of hours of work time - OrbitSoft Blog\" \/>\n<meta name=\"twitter:description\" content=\"We developed a module that analyzed 50MM images and found all the blurry ones, a task that manually would have taken more than 350 weeks\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/orbitsoft.com\/blog\/wp-content\/uploads\/tg_image_162109899-1.jpeg\" \/>\n<meta name=\"twitter:creator\" content=\"@orbitsoft\" \/>\n<meta name=\"twitter:site\" content=\"@orbitsoft\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"elevina\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How data science saved hundreds of hours of work time - OrbitSoft Blog","description":"We developed a module that analyzed 50MM images and found all the blurry ones, a task that manually would have taken more than 350 weeks","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/orbitsoft.com\/blog\/images-quality\/","twitter_card":"summary_large_image","twitter_title":"How data science saved hundreds of hours of work time - OrbitSoft Blog","twitter_description":"We developed a module that analyzed 50MM images and found all the blurry ones, a task that manually would have taken more than 350 weeks","twitter_image":"https:\/\/orbitsoft.com\/blog\/wp-content\/uploads\/tg_image_162109899-1.jpeg","twitter_creator":"@orbitsoft","twitter_site":"@orbitsoft","twitter_misc":{"Written by":"elevina","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/orbitsoft.com\/blog\/images-quality\/","url":"https:\/\/orbitsoft.com\/blog\/images-quality\/","name":"How data science saved hundreds of hours of work time - OrbitSoft Blog","isPartOf":{"@id":"https:\/\/orbitsoft.com\/blog\/#website"},"datePublished":"2023-04-26T07:58:01+00:00","dateModified":"2023-12-13T09:44:45+00:00","author":{"@id":"https:\/\/orbitsoft.com\/blog\/#\/schema\/person\/f96c7f7c1bcb1cdf7e1750794548b6fa"},"description":"We developed a module that analyzed 50MM images and found all the blurry ones, a task that manually would have taken more than 350 weeks","breadcrumb":{"@id":"https:\/\/orbitsoft.com\/blog\/images-quality\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/orbitsoft.com\/blog\/images-quality\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/orbitsoft.com\/blog\/images-quality\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/orbitsoft.com\/blog\/"},{"@type":"ListItem","position":2,"name":"How data science saved hundreds of hours of work time"}]},{"@type":"WebSite","@id":"https:\/\/orbitsoft.com\/blog\/#website","url":"https:\/\/orbitsoft.com\/blog\/","name":"OrbitSoft Blog","description":"Discover the latest in news and resources for OrbitSoft","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/orbitsoft.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/orbitsoft.com\/blog\/#\/schema\/person\/f96c7f7c1bcb1cdf7e1750794548b6fa","name":"elevina","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/orbitsoft.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/9f569b41ea8902fc571542fc77005a24?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/9f569b41ea8902fc571542fc77005a24?s=96&d=mm&r=g","caption":"elevina"},"url":"https:\/\/orbitsoft.com\/blog\/author\/elevina\/"}]}},"_links":{"self":[{"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/posts\/36475"}],"collection":[{"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/users\/214"}],"replies":[{"embeddable":true,"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/comments?post=36475"}],"version-history":[{"count":6,"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/posts\/36475\/revisions"}],"predecessor-version":[{"id":36840,"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/posts\/36475\/revisions\/36840"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/media\/36486"}],"wp:attachment":[{"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/media?parent=36475"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/categories?post=36475"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/orbitsoft.com\/blog\/wp-json\/wp\/v2\/tags?post=36475"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}