An Introduction to Tokenization

Add bookmark

What is Tokenization?

Tokenization is a form of fine-grained data protection that substitutes a sensitive data element with a non-sensitive equivalent, referred to as a token, that has no extrinsic or exploitable meaning or value. The nonsensitive, replacement information is called a token.

Another way of putting it is that tokenization replaces sensitive data with an irreversible, non-sensitive placeholder (token) and securely stores the original, sensitive data outside of its original environment.

Tokenization is popular amongst banks and other financial institutions as it allows these entities to keep sensitive data, like credit card numbers and bank account numbers, safe while still being able to store and use the information.

Tokens can be created in a variety of ways such as using:

  • A mathematically reversible cryptographic function with a key.
  • A nonreversible function such as a hash function.
  • An index function or randomly generated number.

With more than 140,000 members, Cyber Security Hub is the vibrant community connecting cyber security professionals around the world.

There are two types of tokenization solutions: ones with token vaults and ones without. Vaulted tokenization utilizes a secure, centralized database, or tokenization “vault,” to store a mapping between the tokenized sensitive data and the corresponding token. 

On the other hand, vaultless tokenization generates the token solely via an algorithm, so when de-tokenization is required, the token can be used to determine the original value without needing a tokenization vault to look it up. In recent years, vaultless tokenization has become the preferred method of tokenization as it offers a number of significant benefits such as:

  • Reduced latency.
  • Reduced compliance scope and cost (PII DSS, GDPR, etc.).
  • Significantly enhanced security over token vaults.
  • Vastly smaller storage footprint of sensitive data.
  • As there is no database to maintain, reduced costs and resources associated with maintaining compliance.
  • Requires little to no architectural upgrades and works well with legacy systems

Tokens can also be broken down by:

  • Single-use tokens: used to represent a single transaction, and processes much faster than multi-use tokens. However, as every transaction generates a new token, it requires significant storage capacity. 
  • Multi-use tokens: A multi-use token always represents the same data element (i.e. credit card number) and may be used for multiple transactions. 

Tokenization vs. Encryption  

Technically speaking, tokenization is a form of encryption. However, while tokenization replaces a piece of information with a random and unrelated set of characters called a token, “traditionalencryption methods use an algorithm to encode sensitive data. While it is not possible to extract sensitive data from a token, encrypted data can be decrypted and read using what is known as an encryption key. 

Where and when to apply tokenization vs. encryption can be incredibly nuanced - both organizations actually use a combination of both. While encryption is easier to scale and synchronize, tokens tend to require significantly less computational resources to process.

In addition, because encryption is reversible, encrypted data is still considered “sensitive data” by some regulatory bodies. Tokens, on the other hand, have no mathematical relationship to the real data they represent and, if breached, cannot reverse them back to the real data values. Therefore, tokens are not considered sensitive data and are not subject to the same regulatory standards as encrypted data.