Offline / Online resilience

Every once in a while I get surprised when I notice some people didn’t quite realize yet the importance of a system to be developed with “unplugged” functionality in mind.
 
Albeit less often than business executives, even some close friends of mine which I consider super smart software engineers seems sometimes to underestimate the relevance of this topic.
 
I think this happens because there is no yes or no answer to this question – but a ‘it depends‘ sort of answer. So I thought, why not write it down and see what others have to say.
 
What I really mean with “offline capable” is any sort of system which is capable of continuing to provide value, even if it is a subsystem, and its connection to such other main system/network is broken.
 
 
The most common argument I hear is:

Soon infrastructure will be in a point where everything will be connected and online all the time, so why should we bother about data locality, sync mechanisms, and ‘offline’ operation modes, if we can just wait until everyone and everything is connected?

 
Patterns repeat themselves at different scales, and this also applies to data management.
 
Nature itself is full of repeating patterns, that’s how we came up with math in the first place.
 
An analogy I came up to counter-argue the ‘always-connected’ argument above is the following.
 
Let’s take a look at how a computer processing unit is organized:

If you take a look at the picture, there’s a large area of the unit that is dedicated to memory (cache).
 
Why do we have those few megabytes of fast and expensive memory there, if we anyway have all the data in the RAM memory? Why do we even have a RAM memory if we always can find the data in the permanent media, if we can just wait until the external memory gets as fast as a process L3 cache?
 
As if you figured, its all about speed and data locality. It is not an infrastructure problem, its a matter of being efficient.
 
The closer the data is to where it will be used, the faster and more efficient your process or system or app or whatever will be.
 
This applies not only to computing but any process in general (a construction worker can work as fast as his material is at disposal for use, unless he is lazy).
It turns that the same thing that applies for computer processors, also applies to how data is distributed in the Wide Area Network, and also applies to how your project is designed to manage data.
The problem is that it is not so easy to properly manage data locality.
 
To our luck though, distributed computing problems have been studied for several years, applied in real life at data centersĀ and distributed database systems.
 
The least we can do is grasp the proven concepts used in such systems, and apply to our own solutions during development.
However, not all data is made equal, and only by analyzing such patterns, algorithms, and data structures we can derive a more optimal data flow.
 
In the beginning I mentioned “it depends”, because we really need to check the nature of the data to balance the tradeoffs of how long what should be kept where. But its never simply a matter of infrastucture. It is a matter of what to cache, and for how long.