The Importance of Being Earnest in Monitoring Your Virtual Desktop User Experience
By Yury Magalif, Chief Architect Managed Services Cloud Computing, CDI
Before you can dynamically allocate resources in virtualized environments, you must monitor the user experience. In this blog, discover how and learn why it is even more important than it was in the physical environment.
Did you recently complete a long-awaited project to upgrade your network and virtualize your PCs, data centers, and infrastructure?
I’m guessing you might be facing some challenges with monitoring how it went and how users are enjoying (or NOT enjoying) their virtual desktop experience.
While your IT Director does glow a bit more radiantly walking down the hallway and whistle a bit more frequently in the elevator now that bulky physical desktops are gone, you still need to troubleshoot problems and optimize performance.
Plus, the new CIO wants a report that validates your infrastructure changes were and will continue to be a sound investment and the executive team wants to know in advance about any performance bottlenecks.
They ultimately want to snapshot, quantify, and track changes in the user experience for all users, on all devices, 24/7!
Monitoring the Virtual Desktop User Experience
In the past, physical machines offered IT shops the opportunity to customize the user experience (UX). Christine in marketing had more RAM than Bill in accounting, and Ramesh in services had access to more network storage than either of them.
But with virtual machines, many shops do not monitor user experience and use a policy where all 20,000 employees get precisely the exact same virtual desktop; same processor, same RAM, same configuration, and same access to resources.
As you might have imagined, Ramesh would be cursing your IT staff through a support chat app, and Bill would be overwhelmed.
Christine just walked out.
In other words, without monitoring the user experience, this failed policy would:
- Upgrade low-demand users who did not require access to advanced resources.
- Downgrade high-demand power users who previously enjoyed a superior level of service.
Therefore, remember this important rule of virtualization – because you can dynamically allocate and throttle resources, monitoring the user experience is even more important than it was in the physical environment.
Four Reasons Why You Should Monitor the User Experience
- Constant adjustments require usage data for maximum optimization. Monitoring helps you discover areas of improvement.
- Users would otherwise experience issues and wrongly assign blame to virtualization.
- Opportunities for automation, enhanced collection, and dynamic real-time reallocation of resources.
- You can now more easily do it; much easier than physical environments.
How to Measure the Virtual User Experience
A rule-of-thumb in this business is that the virtual user experience must be at least the same or better than the physical experience. We can’t declare victory until that assertion is shared by a clear majority of users.
Naturally, you may be wondering, how do we measure that objectively?
Let me address three primary methods below.
1.) Delays and Crashes
First, establish a rubric or benchmark based on a standard set of factors. Track the following three parameters and chart trends over time:
- App Load Delay
- Login Delay
- App Not Responding (ANR) and Similar Crashes
Delays and crashes are strong indicators of user frustration level. In any given range of time (three weeks, three days, or three hours), these numbers are going to point to the issue. Remember, lower numbers are better when it comes to measuring load times and crashes. Four crashes are better than 40 and a three-second load time is better than 30 seconds.
Like the indicators used by economists to describe trends in the business market, these are lagging indicators. For example, existing home sales, jobless claims, and new jobs for the past month. Lagging indicators reliably report on events that have already occurred.
2.) Technical Metrics
Second, track the following four major technical metrics:
- Disk Storage
- Network Traffic
Trending data from those four metrics add-up and empirically point to general environment issues that contribute to user frustration.
To continue our economic metaphor, these are leading indicators such as bond yields or new housing starts. They are based on conditions that offer insight as to what might occur if we can quickly assess the data and make accurate predictions. For example, don’t cutover to a new enterprise app that uses a lot of RAM if two-thirds of desktops are reporting out-of-memory issues.
3.) User Experience Feedback Surveys
Third, conduct user experience feedback surveys. Because the results will be swayed by the current mood of each user in a highly subjective manner, you’ll need participation and feedback from many users to reliably establish objective statistical significance that reflects the population mean.
You might include the following survey questions:
- How would you rate the speed of your virtual desktop?
- Would you consider any of the applications you use to be slow?
- If YES, please list which apps are slow and the time of day when they are slow.
- List any applications that you have used in the past three months that crashed?
- How often did each application crash?
Consult with your data scientist or marketing team to carefully construct the questions in your survey. For best results, you want to invest up front in getting the first survey as accurate as possible, and consistently track future results.
Skip the attempt to build a custom solution in-house. A few commercial tools are available to help you collect user experience data. Most solutions provide views with metrics that track architecture specs, infrastructure changes, desktops, laptops, workstations, kiosks, terminals, other devices, users, and apps.
Market tools include:
- Liquidware Stratusphere UX: The reliable established market leader in this segment.
- Lakeside Systrack: A good tool for automated reports and dashboards.
- ControlUp: Their real-time product includes a responsive dashboard that helps you resolve issues quickly.
- Nexthink: Another real-time product with historical usage and IT service performance records, visualizations, actionable dashboards, reporting, and feedback surveys integrated.
These solutions also include built-in root cause analysis and problem identification.
They all tend to be strong at monitoring crashes, delays, and metrics; however, they typically lack an end-user survey feedback function. Nexthink is an exception. It delivers on all three points I made in the previous section, including surveys, but has some other disadvantages such as configuration requirements and cost.
When it comes to evaluating the costs and features of these competitors, I invite you to compare and decide for yourself. I will suggest that you can likely conduct the surveys yourself using SurveyMonkey, SurveyGizmo, GetFeedback, or another popular online survey tool.
Data Collection Tips
- Collect metrics and feedback data for as large a user pool as possible with a consistent number of users. For example, if you cannot survey all 15,000 employees, poll 1,000 every quarter. If you can do it every 60 days or monthly, that’s even better. You also want to have data before a change to serve as a baseline, and after a change to make comparisons. For example, immediately before and immediately after a shift from physical to virtual desktops.
- Run the delay, crash, and technical metrics tools as often as possible. You want them capturing data almost constantly. Compare the data every month, examine reports, and look for trends.
- It’s also important to note that all the tools I mention are strictly for monitoring. They don’t perform any corrective actions. You could script your own, but most organizations today are cautious about building yet another in-house custom solution when the cloud promises so much including everything from automation to updates.
- Corrective automation tools on servers are available; however, not for virtual desktops. Some server real-time resource allocation features exist in Turbonomics and VMware vRealize Operations Manager/Automation.
Evaluate the Trends
After collecting the data, examine any trends. If you see an increase in crashes, delays, help desk tickets, and other common issues, the overall user experience at your organization is in trouble. Like a crime drama or forensics TV show, go into analysis mode to determine why.
Use the feedback surveys to substantiate the trends. It works both ways; you can also use the metrics to support a trend in user feedback results.
For users that report poor performance, your survey should also ask them to specify when it occurs. If you can, try to pinpoint a two-hour window. Then, focus on that time and try to determine a root cause. You also have the names and machine IDs to go on.
Other forensic analysis tips:
- Analyze just two or three users: They will reveal findings representative of a larger audience. Troubleshooting forensics for dozens of users will yield too many data points and too much variability.
- Focus on snippets of user experience feedback: For example, three users reported crashes while using the same streaming app at the same time.
- Look for patterns: For example, every 30 days you notice a block of days with high disk utilization metrics. Run another report for just that week and look for trends and sustained peaks. Within those peaks focus on just three hours, then one hour.
- Filter out false positives: When you upgraded to a new application, everyone’s RAM suddenly became insufficient in the metrics; however, a new patch next week fixes a known memory leak vulnerability.
- Memory is critical: The most common issues center around insufficient resources. Users often need more RAM. It’s typically more important than processor speed or flash storage.
After running monthly reports and tracking the trends, narrow your analysis window and draw your conclusions. It’s typical to prioritize the corrective actions that you want to make.
For example, after identifying a storage bottleneck or memory issue that impacts 500 users, you might choose to allocate more memory to the top 50 and monitor that change for a few days.
A perception issue also plays a role. Studies show that users do not notice an improvement unless it signifies at least a 20 percent increase over the previous state. In other words, don’t spread resource allocation adjustments so thin that each user is given a two percent incremental bump-up every six months. They won’t even notice the change. Better to boldly introduce a 20 percent increase today. Your users will definitely notice the improvement.
Monitor changes and look for new patterns for at least two full weeks after a significant change. Compare data before, during, and after the change. Look at variances expressed in units and as percentages. Make sure your audience, staff, and customers are aware of the changes. User engagement is helpful.
Finally, quantify the cost of slow performance in terms of its financial and political impacts:
When 500 users experience slow applications every day for a week, the lost productivity is significant. On a recent CDI engagement, we found an anti-virus process that crawled along very slowly during peak work hours. There was no need to impact users like this when the process could run after midnight.
Another financial example involves a hospital billing department. The accounts receivable team would face a severe challenge if slow network speeds prevented new billings from going out on time.
A critical medical procedure might require MRI images in the next 20 minutes while the patient remains under anesthesia. Now is not the time for performance delays.
Slow physical or virtualized environments also carry legal risks. A firm might be sued for losses involving delays for thousands of users.
Slow performance and a poor user experience does not reflect highly on the brand. Company executives and account managers want to look their best when showcasing new product demos. In these situations, some of your IT staff may receive phone calls from frustrated callers demanding a fix or your resignation.
Performance is no joke, especially when you factor in contractual service agreements and the competitive dynamics of the cloud economy. A sub-standard user experience impacts your bottom line and perception in the news and social media.
In the long run, prevention pays for itself, so fund your performance fixes and attack the next set of bugs early and often. Equipping your staff with faster performance is essential for business.
The Final Word
People expect robust, fast, responsive computing devices. They want to leverage powerful networks, platforms, and applications to increase their productivity. When a weak link in the system arises, it can snowball and user productivity can dramatically decline or drop-off altogether.
In the physical realm, you can still go buy a better laptop.
But in the virtual realm, monitoring the user experience is essential to identify pain points and make the right adjustments.