A software error, or a bug, is a defect in the application, due to which it behaves differently than the user expects. For example, it freezes, displays incorrect data, charges users more than once for one purchase. Most frequently, bugs appear due to errors in the code, incorrect integration with other systems, or incorrect data entry.
Most bugs can be fixed before users encounter them. That’s why automatic and manual tests are carried out before launching the application. But there are errors that appear after launch. If it’s not fixed quickly, users will leave, and the company will lose money. So, you need to set up monitoring systems to prevent this: it will help you to find the cause and eliminate errors quickly.
In briefe
-
01
Project
VPN app with a paid subscription
-
02
Problem
The more time spent on fixing bugs in the application, the more users leave
-
03
Task
Develop a solution that will help you to fix errors in the application quickly, so the business loses less money
-
04
Results
- We connected Elasticsearch log viewing system and crash reporting system
- The number of technical support referrals decreased by a factor of 30
- Since 2022, there hasn’t been a single major glitch in the application
Project: Paid VPN app
A Canadian company runs an adult video platform. The company turned to OrbitSoft to avoid losing users in countries where such content is blocked. In 10 months, we have developed a VPN application for five operating systems. The client saved his audience and even received additional profit: the application is used not only by the subscribers, but also by other clients.
Read more about the project in the article «VPN application: how we fixed other people’s mistakes and launched the project». We described how we connected online payment, notification system, and CRM in this case «How to set up external integration using a VPN application as an example». In this article we’re focus on the organization of technical support and integration of error monitoring systems.
Problem: the more time spent fixing app bugs, the more users quit
When launching the application, we connected technical support, or helpdesk, to it. This is the main way to get information about bugs and feedback from users: they write to technical support when they encounter a bug. We ran into two problems:
- In tickets, people describe the problem in their own words. Most often they write something like «the page doesn’t open», «the application freezes» or «I paid for a subscription, but nothing happened». It’s not always clear to developers what is the cause of the error from such explanations. It takes a long time to ask all of the questions that can relate to a matter to find out what’s causing the problem.
- Users are reluctant to answer questions about bugs. If you ask them to do some complicated things, for example, get into a registry, turn VPN services on and off, people won’t bother at all: it’s easier for them to remove the application and install another one.
We developed systems that speed up bug fix process in the application
To speed up the processing of requests from users and give technical support more information about emerging problems, we decided to implement two systems
- The log viewer system collects information about user activities
- The crash reporting system generates detailed crash reports for developers
The log viewer tells about user actions
Logs are files with the information about user actions and server operation. For example, if user paid for the subscription and it doesn’t get activated, the log will record: from which IP address and through which payment system he tried to pay, how the payment request was transmitted to the payment system and where in this chain the data could get stuck. The developer opens the file, studies the information and understands what needs to be fixed so that the payment error doesn’t happen again.
At first, VPN application logs were stored on a single server. Developers could search for information in them manually. However, later we decided to make the architecture more robust and deployed a cluster of three servers. This is necessary for the application to be able to get scaled and not to worry about it can get frozen due to high volume of users.
With the new architecture, sorting through the logs manually has become inconvenient. For example, a payment notification was received from the payment system. One server places it in queue, then another server processes it. Developer had to connect to servers to trace the entire path of this notification through the logs, use UNIX commands to select log files for a certain period, open and read each of them. It took a lot of time, so we decided to automate the process.
Automated work with logs
We used a ready-made solution for monitoring logs — Elasticsearch. The system collects information from three servers into one storage and provides a convenient interface for the developer. There’s no need to connect to the server and go to the log files. You can search for information by keywords and see the result right in your browser.
The crash reporting system generates error reports
We chose a ready-made crash reporting library for .Net, in which the application is written. We connected it to the application and added the addresses where messages from users should come to. It took us only a day to set up this system.
How the report is generated:
- The application shows an error and the user sends a request to the technical support.
- The library generates an error report and sends it to the administrator by email. The report contains information about which part of the application and code caused the error, as well as log files with the user activity. They can be compared with the logs of our API server.
The crash reporting system not only generates error reports, but also collects feedback from users: it sends an email and asks to rate the application and the specific VPN servers. This is how statistics on server performance is collected — see which of them are problematic and also fix errors there.
Results
Integration error monitoring systems into the applications helps to quickly find out the cause of bug and fix it. For the VPN application, we set up and connected two systems:
- Elasticsearch log viewing system. With it, developers don’t have to manually investigate a lot of log files to fix a bug. All information is collected in one storage with a convenient web interface.
- The crash reporting system automatically generates an error report that helps the developer quickly understand which part of the application caused the crash.
We didn’t develop monitoring systems from scratch, but integrated third-party solutions that are well known to us. This made it possible to shorten the project time.
Feedback helped to improve the application: we finalized it according to crash reports, and now only minor bugs come to tech support. For example, users of a Windows application used to send 2−3 emails per day to technical support, and after the implementation of the system, 1−2 emails per month. Since 2022, the app hasn’t had a single major outage.