The superannuation and investment mobile app I’ve been working on over the last year has finally been released. It’s been on the app store for just over a month now* and this blog is about how we are using metrics to help keep tabs on the quality of our app.
The average app store rating is one useful metric to keep track of. We are aiming to keep it above 4 stars and we are also monitoring the feedback raised for future feature enhancement ideas. I did an analysis of the average app store reviews of other superannuation apps here to get a baseline of what the industry average is. If we are better than the industry average, we have a good app.
Analytics in mobile apps
We are using Adobe Analytics for tracking page views and interactions for our web and mobile app. On previous mobile app teams I’ve used mParticle and mixpanel. The framework here doesn’t matter, I’ve found adobe workspace to be a great tool for insights, once you know how to use it. Also Adobe has tons of online web tutorials for building out your own dashboards.
App versions over time
Here’s our app usage over time broken down by app version:
We have version 1.1 on the app store and released 1.0 nearly 2 months ago. We did an internal beta release with version 0.5.0. If anyone on the old versions tries to log in they’ll see a forced update view.
Crashes are a fact of life with any mobile app team, there are so many different variables that go into app crashes. However keeping track of them and aiming for low rates is a good thing to measure.
With version 1.1 we improved our crash rates on android from 2.77% to 0.11%. You can use a UI exerciser that is called monkey from the command line in your android emulator to try and find more crashes too. With the following command I can send a 1000 random UI events to the emulator:
I can also keep on eye on how many error messages are seen. The spike in the android app error messages was me throwing the chaos monkey at out production build for a bit. However when there is both a spike in android and iOS, I know I can ask, “was there something wrong with our backend that day?”
Test Vs Prod – page views
If every page has one event being tracked, we can compare our upcoming release candidate against production; say we see that 75 page views were triggered on the test build and we compare this to the 100 page views we can see in production. We can then say we’ve tested 75% of the app and haven’t seen any issues so far.
There’s no need to aim for 100% coverage, our unit tests do cover every screen but because they run on the internal CI network those events are never sent to adobe. We have over 500 unit/UI tests on both android and iOS (not that number of tests is a good metric, it’s an awful one by the way).
But if you’ve tested the main flows through your app and that’s gotten you 50% or 75% coverage you are now approaching diminishing returns. What’s the chances in finding a new bug? Or a new bug that someone cares about?
You could spend that extra hour or two getting to 90-95% but you could also be doing more useful stuff with your time. You should read my risk based framework if you are interested in finding out more.
If you are working on a new feature or flow of your app, you can measure how many people actually complete the task. E.g. first time log in, how many people actually log in successfully? How many people lock their accounts? If you are trying to improve this process you can track to see if the rates improve or decline.
You could also measure satisfaction after a task is completed and ask for feedback, a quick out of 5 score along the lines of, “did this help you? was it easy to achieve?”. You can put a feedback section somewhere in your app.
The tip of the iceberg
These metrics and insights I’ve shared with you are just a small subset of everything we are tracking. And is a small part of our overall test strategy. Adobe has been useful for digging down into mobile device’s and operating systems breakdowns too. There’s many ways you can cut the data to provide useful information.
What metrics have you found useful for your team and getting a gauge on quality? What metrics didn’t work as well as you had hoped?
This is not financial advice and the views expressed in this blog are my own. They are not reflective of my employers views
Bugasura is an android app and a chrome extension. it helps with keeping track of exploratory testing sessions and comes with screenshot annotation and jira integration.
Here are a couple of screenshots of the android app in action, being used for an exploratory session on our test app.
First I selected the testing session:
While I’m testing I see this Bugasura overlay which I can tap to take a screenshot and write up a bug report on the spot:
Here’s their reporting a bug flow:
And here’s a testing report after I finished my exploratory testing where I can push straight to Jira if I want:
Here’s the sample report link (caveat, the screenshots attached to the bug are now public information on the internet, so there’s a privacy concern right there), but OMG, the exploratory session recorded the whole flow too (so a developer could see exactly what I did to find that bug).
Here’s that bug report in chrome paused at screen 13 out of 18:
Some caveats I’ve found so far; the test report is public (not private by default), you wouldn’t want to include screenshots of private or confidential information.
Bugasura only works on android/chrome. There isn’t an ios version but I guess with some remote device access running through Chrome it could work? We use Gigafox’s Mobile Device Cloud at work to access a central server of mobile devices and I imagine Bugasura could work with it.
Also I think they may have misspelt Elisabeth’s name in her quote.
This blog post reflects my opinions only and do not reflect the views held by my employer.
My team is going through a beta release for our mobile app to get early feedback. We’ve noticed that our android app is struggling compared to iOS. It seems that having an extra hurdle with signing up for the android beta program impacts installations. Naturally we’d expect the android engagement to lag a little behind iOS based on the Aussie mobile usage market analysis but there is still a significant drop.
We have 428 iOS installs, and 99 android installs. That’s a 19% Android installation rate. We have roughly 75-80% successful registrations and return log ins once people actually figure out how to install the app.
Google Groups vs Test Flight
We are using google groups to manage the distribution of the android beta app and because it’s harder to use than test flight for iOS we’ve gotten less installations. It’s fascinating how an extra hurdle in the sign up process can impact installations.
Our android numbers appear to be higher than usual here but I think it’s to do with the timeframe I’m collecting these numbers over. We’ve had a few people install the android app before we officially started the beta release and I think they’ve been counted in this statistics.
Do you remember how the world freaked out over the potential Y2K bug when the year was changing from 1999 to 2000? A large mitigation factor was a huge outsourcing effort to India and it helped to establish India as the global IT giant it is today. So when not many bugs eventuated it was a bit anti climatic.
Globally; $308 billion dollars was spent on compliance and testing and it helped build more robust systems that survived the system crashes from the 9/11 terrorist attacks.
Well the y2k bug would only impact systems that used 2 digits to represent the year, i.e. using the DDMMYY format to save on memory.
And the world updated this and their systems every where. Businesses did a pretty good job of patching that bug before it became an issue. Bugs still did come up but the world didn’t end.
Sometimes the fix was to push it out
Some fixes people implemented was to make 00 to 20 stand for 2000 and 2020 respectively. That’s only pushed out the problem and we’ve had some cases this year of this bug coming into effect. You can read more about this 2020 Y2K bug here.
There’s another bug for 2038
However did you know there is another Y2K bug scheduled for the year 2038? Basically the way our current 32 bit computer systems count time is the number of seconds since 1970. In the year 2038 we get a bit overflow issue and that counter resets to zero. This is more likely to impact cheap embedded systems with limited memory or old legacy systems.
If we switch to a 64 bit counter, our sun will explode before we get the same issue. It’s like going from ipV4 (we are already running out of ip addresses) to ipV6 for the internet.
I like to use mind maps to help me test. Mind maps are a visual way of brainstorming different ideas. On a previous mobile team I printed and laminated this mind map to bring along to every planning session to help remind me to ask questions like, “What about accessibility? Automation? Security? or Performance?”:
As I go through exploratory testing (or pair testing), I’ll tick things off as I go and take session notes. Often this will involve having conversations with people, sometimes bugs are raised. Here is a quick mind map I’ll use for log in testing:
Heuristics for testing
This mind map approach can be combined with a heuristic test strategy approach or a nemonic test approach. Heuristics is a rule of thumb that helps you solve problems, they often have gaps because no mental model is perfect.
SFDPOT is a common nemonic that was developed by James Bach; who also developed Rapid Software Testing; a context driven methodology. James Developed his RST course with Michael Bolton.
We truly live in a global and inter connected society. But have you tested your app using a Right to Left (RTL) language such as Arabic? This blog post is a reflection on some of the design considerations to keep in mind when accomodating this.
Why does this matter?
Arabic is one of the top 5 spoken languages in the world with around 3 hundred million speakers and it is the third most spoken language in Australia. Even if you only release apps for the Australian market someone out there will have Arabic set as their default device language. It’s ok if you haven’t translated your app, but you should check that these people can still use it.
How do I test this?
Enable developer options and select “Force RTL layout direction”. On My Samsung S10 this is what my screen and options look like after enabling this option:
In Xcode you can change the build target language to a Pseudo RTL language to see how your app renders in this way without having to change the language on your device.
You don’t actually need to render your key pads in Right To Left, in fact it’s actually more jarring to render numbers in a RTL arrangement because ATM’s and phone pads are left to right in Arabic. Most Arab’s are use to globalised number pads. Samsung has an in-depth article on when RTL should be applied.
When I have RTL rendering set on my android phone, the log in pin screen and phone call functionality is in LTR. However some of my banking apps render their pin pads in RTL.
Common RTL Issues
I was pleasantry surprised to find out how many of my apps weren’t broken when I switched to RTL rendering. Facebook, twitter and email still look very good. Some apps (like my calculator) do not make sense to render RTL and they remain LTR:
Bug One: Overlapping labels
You will have to watch out for when labels overlap like in the domain app here:
Bug Two: Visuals doesn’t match written language
And when your text is rendered RTL but the visual cue is still LTR like in the shade bar for representing countries visitors to my blog in this wordpress statistics view:
Bug Three: Menu’s that animate from the side
In the app I’m helping build, the side menu renders pretty funkily in RTL mode, I can’t show you a screenshot of this behaviour but it’s probably the quirkiest RTL bug I’ve seen. If you find an app with bad side menu behavior in RTL please share your screenshots with me.
But here are some screenshots of the CommSec app on android (LTR on the left and RTL on the right for comparison)
Bug Four: Icon’s aren’t flipped
Often icon’s have a direction associated with them like the walking person when you get google maps directions. Sometimes it can look a little odd when they aren’t flipped correctly (as if they are walking backwards).
Have you seen these bugs before?
Please let me know your thoughts or experiences in supporting RTL languages. I’d love to hear your stories.
Test strategy, what a funny concept. Now this strategy isn’t going to help you win any battles (this is where the word strategy comes from after all) but for lack of a better well understood term, this blog post is a reflection on what I imagine will work for my team*.
*disclaimer: what might work for my team might not work for yours. People are amazingly diverse and your team and company context is fundamentally different. Also this here is a wish list of what I think will work. It’s subject to change as we learn and evolve.
First let’s set the scene. Our scrum team includes 1 Android developer, 1 iOS developer, 2 back end developers, 2 business analysts (1 is our scrum master), 2 testers and a team/tech lead. We are changing our team structure and I’ve come on board as a software engineer in test. Our team closely collaborates with the design team and they are included in our group email threads but don’t come to our retro’s. We have a 10 day sprint cycle that looks a little like this:
We have a daily standup, a few kick off meetings at the start of the sprint to lock in what we are working on for the next 2 weeks, some mid sprint review/next sprint refinement sessions and a few meetings at the end that help tie up what we’ve completed. Consider this a crash course in Scrum Agile if you will. Not everyone is required to attend all of these meetings and I won’t be covering these meetings in detail in this blog post.
Get to the Test Strategy
Yes I know, that was a rambling tangent but the context is important. Before I get into the good bits I’ll ask you a question;
Why do we even bother with testing?
Some people say, “to ensure the product works as expected” or, “to find bugs”, “it’s my job to test things” and these are all ok answers but they miss the point a little. Here’s my answer:
We test to get feedback on the state of the product. To help us answer the question, “are there any known risks with shipping this product to production?”
Paraphrased from conversations with Michael Bolton, the tester not the singer
Every part of the following strategy is all tied into facilitating feedback. The more timely and accurate the feedback the better.
Testing and quality is a team responsibility, it’s not just up to one person to be the quality gate keeper. My role is to help facilitate feedback
Layer One: The product design feedback loop
This is all a little out of scope of my teams day to day activities but this how our design team tests if we are building the right thing that users need.
This might involve researching our market for current trends. How many of our customers care about their superannuation? What is their financial literacy? What type’s of problems are they facing? What are our competitors doing and how does their experience deliver value?
Eventually someone will need to start sketching out some design ideas. What’s the user flow through a particular feature?
This won’t happen for every new design, for example log in hasn’t gone through this process. Our new big features will go through this type of testing. This helps get feedback on the design and layout. Does it all make sense?
Design and user story creation
Out of all of the work, eventually the design team and the business analyst will work together to create acceptance criteria, refine the UI and get the rest of the team up to speed with the context of a feature. Our user stories and designs are usually shared on a confluence page and linked to Jira tasks. We use a GIVEN WHEN THEN structure for our user stories.
Layer Two: The code feedback loop
All testing is exploratory in nature, its front and centre. It’s across everything we do. chaos engineering is a type of it as well as building the code locally. We use our skills, plans and judgement to determine when and how much testing is needed at any point.
When we do the code review we will do exploratory testing based on the risk of the feature. Time boxed to a session or two depending on what has been built. We will look at the user stories, brain storm any more edge cases and consider if they are worth testing. Checking if the experience of the feature makes sense and if there are any ways people can get into some sticky unexpected situations.
As a developer builds a feature, they will create unit tests based on the user acceptance criteria. Developers will use the tools they are most comfortable with the write these tests. If you’d like to read more Martin Fowler has this blog post on Unit Testing.
I have my visual risk board next to our team which we use to prioritise how much testing we build at this layer. We use Espresso for Android and XCUITest for the iOS app.
“Why not Appium?”, I hear you ask
Simple, when test code lives in a repository outside of your production code, you decrease collaboration with the whole team. Also you can’t easily run your appium tests during pre commit testing or locally as a developer. You can follow the interactive visual risk for UI automation exercise here to understand more.
When a new API is being developed, I’ll often pair with the developer to do a code review. We will talk about the architecture, brain storm testing ideas, do a bit of testing (usually through postman if we are testing an API), we will chat about test coverage. Is it adequate? Is there any thing missing? Can we see the tests fail under the expected conditions?
If it’s a front end feature, I’ll check out the code locally and use a different emulator/simulator than what the developer uses. I’ll give the feature a good shake out and check the test coverage. I’ll also test for accessibility if it’s a new front end feature.
For our mobile app, we are able to do most of our code review testing without ever talking to a backend. The engineers have built some mock servers into out apps, when the app would call an API, our mock server returns a canned response. This helps us test that the UI and the flow hangs together even when test environments aren’t available. If you’d like to read more, check out this article on mock testing for android or this one for iOS.
We have different pipelines for different applications. We are using TeamCity as our Continuous Integration tool. Generally all of our unit tests and UI tests will be run. Maybe our contract tests. I have a few other ideas to increase the value from our build pipelines that I’ll talk about in Chaos Testing. If our main builds start failing, we won’t release the software.
We don’t necessarily focus on doing device testing for each feature that comes through. I try to pick a different emulator/simulator than what the developers do. I always make sure features get tested on a Samsung. Some features, if they are 3 star features from our risk analyst we will spend more time testing on a wide variety of devices. We currently have an on premise mobile device cloud server delivered by Mobile Labs. If you don’t have a device cloud, you could set up your own device farm.
Samsung has a wide market saturation and they always do funky stuff to the android UI. The android emulators are awesome at vanilla android. However, most people out there aren’t using vanilla Android :(.
We are moving towards having contract testing in place that lets us know if an API starts to break, if someone changes the JSON payload in an API our contract will break and someone will know the have more stuff to clean up. We don’t have contract testing for our mobile app yet but some of our downstream micro services are starting to build these. If you’d like to find more, read this article by Martin Fowler.
We have an integration test environment where our code is being constantly deployed into. Sometimes it can mean an API is down because it’s being deployed. We do a lot of our API testing in this environment.
With android there’s this command line tool called chaos monkey. This tool is a UI exerciser, it throws random user input at your UI to try and find where it crashes. I’m hoping to include this in a build pipeline for an overnight build. Run it for a few hours on an android device and see if it crashes. The next night, do the same thing but on a different device/os combination. This will give us reasonable device testing over a sprint. I don’t know of a similar tool for iOS. You can read more about chaos engineering on wikipedia.
Layer Three: Shipping the product feedback loop
A few days before the end of the sprint, our team and invited guests will sit down and do some exploratory testing on the features that have just been built. If anyone wants to explore a new API that’s been built, they can. If they’ve had their head in unit tests lately, they have the chance to explore some of the new UI. You can read how to run a bug bash to find out more. If major bugs are found here we won’t release the software. We might do a mob programming session when we don’t have enough features for a bug bash.
On the last day of the sprint we will demo our features to a broader audience. Feedback is gathered and turned into Jira items/research for the design team.
Then we release to internal staff members. Many other companies call this “eating your own dog food”. This gives people the chance to raise more feedback before we put the product in front of customers. You can read more on wikipedia here.
We can release our app to our high value or digitally savvy customers who want to ahead of the curve. This is a customer engagement strategy as well as a test strategy.
Percentage roll outs
The google play store allows you to do percentage rollouts. Say you rollout to 5% on the new version, monitor production for any new crashes or customers complaining. If it’s all smooth for a few days you can continue the rollout to 50% and then 100%. The google play store allows you to roll back if major bugs do occur. The apple play store has a similar feature.
Monitoring in production
What metrics should be communicated back to the team? How can we respond to issues in production? I like this quote from a 5 minute google talk back in 2007:
Sufficiently Advanced Monitoring is Indistinguishable from Testing
Layer Four: Supporting the product feedback loop
We should support all of the devices that 80% of our market uses. We will be support from Android 6 (marsh-mellow) and from iOS 11. There probably will be some obscure android devices out there that don’t play nice with our app. Android is a beast like that.
Facilitating customer feedback
There will be an easy way for customers to provide feedback in app. I have some ideas on how to make that experience better but there are privacy concerns to consider. We will also be monitoring our google/apple play store reviews for bugs.
Someone should be monitoring all of this feedback, attempting to reproduce bugs if customers are facing them and raising them in the teams backlog for prioritisation next sprint.
Soap Opera Testing
Maybe in the future, we could try some soap opera testing with the business? Soap opera testing is a condensed and over dramatised approach to testing. What are the wackiest scenarios our customers have actually tried? How does our system break? You can read more about this exercise here.
Why the layers?
Consider each of these layers like a net. It won’t catch everything, bugs in production will still happen. But when we have all of these feedback loops layered on top of each other, we get a pretty tight net, where hopefully no major issues get into production.
What about auditing or compliance?
Our source of truth is the code, Jira and Confluence. When we have all of it integrated, we can prove we tested a feature thoroughly without too much extra overhead. An auditors mindset and a testers mindset are very similar. Testers are concerned with product risks, auditors are concerned with business risk.
Their main question is, “did you do what you say you do? Did you follow your process?” and, “Is the existing process adequate?”.
Where are your test cases?
Michael Bolton has a 7 part series on breaking the test case addiction. You can read series one here. You don’t need test cases to prove you did adequate testing. They create unnecessary overhead that detract from adding business value.
What else is missing?
Security testing is not included in this test strategy. Neither is performance testing. Getting these included can be challenging. I’m open to your suggestions in how I can incorporate this type of feedback in a timely manner.
Wow, I’m two thirds of the way through my #100DaysOfLinkedIn marketing campaign. Here is an update of how I’ve adapted and grown over that time. You can read up on the launch of the campaign and a halfway through update too.
Before starting this campaign, I had 1400 connections. I’ve now see this grow to over 2200 at the time of writing this blog. I’ve written to every single one of those 800 new connections. EVERY. SINGLE. ONE. I’m now up to my third version of a template message. Here it is:
How are you? Thanks for connecting. What are some of the challenges facing you these days?
It’s short and sweet. Most people don’t respond but I’ve been able to organise a few key meetings with this approach, organise a few testers Meetup events and score a job with a startup.
Reconnecting with Sydney testers
I have around 200-500 QA/Testing professionals based in Sydney in my network. I’ve been reconnecting with them to see what events they are interested in. Here is my template message reaching out to these people:
Have you been to a Sydney Testers event recently? We’ve got a few events lined up that might interest you:
I don’t generally automate these messages because that is against LinkedIn’s terms and conditions. I actually have a Google doc full of these template messages that I copy and paste into LinkedIn and add the person’s name at the start.
Can I get to 5000 connections? Maybe not by the time I finish this campaign but I will continue to use the techniques learned from this campaign after I finish. If I get to over 5000 connections I will become the most connected software tester that I know.
What is your goal for starting a test automation initiative and what is it that you want to accomplish?
Who do you envision participating in your test automation initiative and in what capacity?
How do you plan for the execution of this strategy
Chapter 2 – Creating a Culture for Test Automation Success
how to get people on board
how to help them understand their place in the automation strategy
how to enable them to do what’s needed
Chapter 3 – Developing for Test Automatability
The test automation pyramid is a mental model for thinking about what level you should build your automation tests. It’s a little contended because it doesn’t acknowledge exploratory testing but it’s a model that is widely used. James Bach has a blog on the round earth model that could also be used.
Chapter 4 – Tooling for Test Automation
We discuss how to choose the right tools for your initiative. Before choosing your tools, it’s important to consider who will be using them.
Chapter 5 – Future-proofing Your Test Automation Efforts
Without a clear strategy in mind, many teams make the mistake of automating their tests for their current situation. Perhaps you’re just starting out and you have a dozen or so tests that run locally. You don’t see the issues that your poorly written test code can surface.
Become familiar with design patterns that are especially beneficial for test automation projects such as:
Page Object Model
Here is a getting started guide with Page Object Model (POM) architecture with C# and selenium:
Chapter 6 – Scaling Your Test Automation
Writing automated tests that run perfectly against one environment is challenging in and of itself. But what about when you’re ready to scale your one suite of tests to run in multiple environments, browsers, or devices?
Chapter 7 – Measuring the Value of Your Test Automation
Many automation projects fail due to unrealistic expectations. To avoid this, it’s best to identify expectations early and communicate these expectations to the entire team. What’s your expected Return on Investment?
Bonus Material: Writing readable code
Hands on exercise
We finished the day with a live code example of getting started with Page Object Model architecture using c# and selenium. I would share the video with you but I was experiencing issues with OBS and was unable to get you a video :(.