Why Decentralized Storage?
Today, the majority of online user data is stored on centralized server networks on the cloud. In contrast, decentralized storage leverages a distributed network which reduces the reliance on any single server or data center, makes user data more resistant to hacks and outages, and gives users more control over their data.- Verifiability: the integrity of all data is verified with cryptographic hashing, so you can trust that you always get the data you’re looking for.
- Resilience: data is retrievable regardless of its location or even if some nodes go offline.
- Efficient Retrieval: allows chunks of data to be retrieved from multiple sources simultaneously, improving download speeds.
How Decentralized Storage works
Representing and Addressing Data
Data is addressed by its contents (content addressing), rather than a location, such as a URL or IP address (location addressing). Data is atomized into 256 KB chunks, each of which are assigned a unique cryptographic hash called a Content Identifier (CID).Why content addressing?
Why content addressing?
- The CID of the data received can be computed and compared to the CID requested, to verify that the data is what was requested.
- Any difference in the content will produce a different CID.
- The same content added to two different IPFS nodes using the same settings will produce the same CID.
- CIDs are short, regardless of the size of their underlying content.
Why atomize data?
Why atomize data?
Atomizing data provides us with the following advantages:
- Deduplication: we do not need to duplicate and store identical chunks in the network, thus optimizing storage requirements
- Piecewise Transfer: we can retrieve data block by block, identifying any errors before fetching the whole content object
- Seeking: we only need to fetch the exact chunks we need, thus optimizing bandwidth requirements
Persisting Data
Data chunks are saved on IPFS nodes in the network and pinned. Pinning ensures that data persists on the network and remains available for retrieval. In other words, data is exempt from routine garbage collection.Retrieving Data
To retrieve data, we use the relevant CID to fetch the chunks and construct the Merkle DAG.- Content routing: We first need to identify which network nodes can provide the CIDs we need. A node cannot simply find data in the network with a CID alone; it requires information about the IP addresses and ports of its peers on the network. Thus, this is done either by the Kademlia Distributed Hash Table (DHT) to find peers, or asking already-connected peers by using Bitswap.
- Block fetching: the IPFS node fetches the chunks of the Merkle DAG
- Verification: the IPFS node verifies the chunks fetched by hashing them to validate the hash result.
- Local access: once all chunks are obtained, the Merkle DAG can be constructed, making the data accessible.