About one year ago at ING we started receiving reports from users claiming they were suddenly being deregistered from the Mobile Banking app. Users would suddenly see the Registration screen and had to do the registration process all over again to regain access to their accounts.

When users register in our Mobile Banking app, we store some simple settings in the User Defaults, whereas private or confidential data is stored in the Keychain. Some of the flags that end up in User Defaults indicate whether the user is registered or not, and in what context. A combination of flags, from both User Defaults and Keychain, is used on launch to determine if the user lands on the Registration or the Login screen.

We analysed the issue and quickly found out it was caused locally rather than triggered by the API or some user action. Somewhere in the code, the app would suddenly decide it needs to show the Registration screen instead of the Login screen.

One thing did grab our attention when looking at the incidents: only customers using a 32-bit iOS device were reporting this issue. iOS 11 is not available for 32-bit devices, so this issue only occurred on iOS 10 (from 10.2 up to 10.3.2). We were not sure if this is related to the processor architecture or simply to 32-bit devices being slower in general.

Another interesting fact was that Apple had apparently fixed a User Defaults issue in iOS 11:

NSUserDefaults Data Loss Fix

Starting in iOS 9.3, and in subsequent releases of iOS and macOS, NSUserDefaults could fail to load data if more than roughly 250 separate apps (including separate reinstalls of the same app) had been launched since the last reboot. This has been corrected.

While it was great that Apple had addressed this issue, we were still stuck with the 32-bit devices, as these cannot upgrade to iOS 11.

Some time later we were able to diagnose the devices of several users who had this issue. We inspected the sandbox directory where the documents and User Defaults are stored. Surprisingly, we discovered that the User Defaults plist was incomplete: a big chunk of the keys had been removed, while other keys had remained unaffected. Since there was nothing in our codebase that would remove those keys, it meant the issue was indeed related to the User Defaults. Additionally, we saw there were multiple empty Plist files with prefixed names, similar to this. These findings further emphasised the issue was probably related to the User Defaults. However, there was still no explanation on why it was only occurring on 32-bit devices.

As it turns out, the User Defaults indeed had some issues. Here are but a few examples:

We didn’t know if any of these were related to our issue. The biggest problem was that we were not able to reproduce the problem.

We tried a few things which could potentially solve our issue, and split them over several releases:

  • Analyse the diffs between the last several releases and look for something related to User Defaults that had been changed recently.
  • Search for smells in our codebase which could cause an issue, especially for 32-bit devices, in respect to the User Defaults.
  • Add multiple tracking events (e.g. if User Defaults returns nil on launch etc).
  • Double check potential misuse of the User Defaults API (e.g. calling synchronize redundantly).
  • Go thoroughly through the source code of external libraries that use User Defaults. We even temporarily disabled an external analytics library for 32-bit devices, attempting to rule that out as possible cause.
  • Centralise code reading or writing to User Defaults or Keychain. Prevent writing or reading to/from User Defaults when device is locked, the app is in the background, or when it’s launched in the background by another process (e.g. significant location change) while the device is locked. Additionally, we made correct use of protectedDataAvailable.

Unfortunately, we were still receiving incidents.

Last Resort: File Storage

Several months later, we still have no clue. Our last resort was to switch from User Defaults to a custom file storage solution where we would have more control over loading and saving the data from disk.

However, we serve millions of users on a daily basis. Doing this migration with a big bang change is risky, especially since we cannot guarantee it will fix the issue. So we decided to migrate gradually.

We started the migration with customers using iPhone 5 and monitored the events and any reported incidents. We waited a while, and saw we were no longer receiving incidents from iPhone 5 users. In the following release, we migrated all 32-bit devices to the custom file storage solution, and in the next release we migrated also the 64-bit devices. We haven’t seen the problem surface again.

Conclusion

It feels great to have finally solved this issue. On the other side, we didn’t actually find the root problem. Instead, we were forced to work around it. As to this day, we still don’t know what was causing these corrupt User Defaults files on 32-bit devices.

The User Defaults is a solid API that was created 25 years ago. Be aware though that User Defaults data can be lost, so use it only for data that you can afford to lose. This is actually recommended also by Apple, as noted here:

Quinn “The Eskimo!” (Apple Developer Relations)

Before using user defaults to store some data you should ask yourself “How would the user react if this data was lost?”.

If the reaction is “sigh I have to go and change a preference”, go ahead and use user defaults. If the reaction involves pitchforks and flaming torches, do the work to manage this data yourself