Opportunities for Blockchain in Analytics and BI: What You Should Know as a Business Analyst
Blockchain (or block chain) is a technology that has recently gotten much news owing to its association with crypto-currencies. While the run on Bitcoin and the likes may or may not be a fad, the fact remains that blockchain as a technology has much wider applications and the potential to become one of the greatest disruptors since the advent of the Internet.
Blockchain can fundamentally alter electronic communication with the potential to affect all sorts of transaction processing systems. Impact beyond the core transactional systems is still unclear, but in this blog post, I attempt to do some crystal-ball gazing and throw up some ideas.
Let’s first cut through the hype and look at a fairly straightforward analogy in order to understand the technology.
Say we are trying to track asset allocation and have a database table that tracks asset ownership using a field structure like this:Asset NumberAsset OwnerCreated ByCreated OnUpdated ByUpdated On123JSMITHADMIN10/10/2017FC_USER12/12/2017
This record tells us that the asset identified by the number ‘123’ is owned by user JSMITH, since at least 12/12/2017. If we were to enhance this into a table that helps track of full ownership, then it could look like this:Asset NumberAsset OwnerCreated ByCreated On123WDALEADMIN10/10/2017123JSMITHFC_USER12/12/2017
Assuming that there are only two records for asset number “123,” these two are essentially giving us the full history of the ownership of this asset.
Note that the absence of the “Update” tracking field indicates that each record is immutable, meaning it is never changed. But how do we ensure that? One way is to add a hash:Asset NumberAsset OwnerCreated ByCreated OnHash key123WDALEADMIN10/10/2017ASDFGH123JSMITHFC_USER12/12/20171LKJHG
The hash key is merely a fixed length representation of any arbitrary length of data. So, after the first record is created, you would take the first four fields, append into one string, then pass the resultant string through a hash function (e.g. the HASH function in the DBMS_CRYPTO package in Oracle) to get a hash key. This hash key represents your string from the first four fields.
If someone edits a field in that record, then the record will no longer match the hash key. Thus, the hash key provides a level of confidence as to the quality and integrity of the record.
But isn’t it easy enough for said someone to also edit the hash key to match the record? Yes, it is. The next step in securing our table would be to “chain” the records together. We would do this by storing the hash key of a record in the next record:Asset NumberAsset OwnerCreated ByCreated OnPrevious Hash KeyHash key123WDALEADMIN10/10/2017 POIUY123JSMITHFC_USER12/12/2017POIUYQWERTY
The first record has no previous hash key (for obvious reasons). In blockchain parlance, this would be the “genesis block.”
In the second record, the previous hash key comes from the first record, and the hash key for the second record is built using the previous hash key field as well.
Now consider the ramifications: if I were to (maliciously) edit the first record, I have to update the hash key of the first record. I have to then update the previous hash key and hash key fields of the second record. And continue doing this for each and every record in the table! (Not just the records for this asset – but for all records representing all assets in the table) This makes it significantly more difficult for a fraudulent change to be made. But not impossible? No. That’s where the next step comes in: distribution.
Consider that every party that is interested in this asset had a copy of this table. If you now needed to make an entry to this table, you would send a request to every party, and get an “approval” from all of them and then make the entry in all the tables at once (or make the entry, and send it to all others to “approve”).
That is basically what a blockchain is. Each record in our table is a block. It can contain any number of fields (usually including an ID field), a field containing the hash key of the previous block or record, and a hash key of all the fields in itself – including the hash key of the previous block or record.
Each block is immutable, meaning once it is created it cannot be edited. The blockchain itself is distributed over multiple nodes which will all have the entire database.
Of course, we also need a mechanism to enable the distribution and for the distributed nodes to stay in sync. And there are several mechanisms to do that.
Securing the Data
So, we’ve addressed the integrity of the data but what about security (from prying eyes perhaps)? This is especially important considering the distributed nature of the “database”. Well you can encrypt each block. And moreover, each party that adds a block to the chain can encrypt it with their own key. So, if a hypothetical malicious hacker needed to edit a block they would need to not just edit each and every block in the chain, but would also need each and every key that was used to encrypt the blocks of the chain.
Public Versus Private Blockchains
Crypto-currencies drive the popular imagination on a use case for a blockchain application. Cryptocurrencies use what is called a “public” blockchain, whereas enterprises would typically use a “private” blockchain. The differences are important to their very nature. Some of the biggest issues with bitcoin are transaction speed and processing power. Both of these are non-issues on a private blockchain. Bitcoin’s transaction speed issues are not even an issue with its public nature but just the reality of its humble beginnings (more modern solutions for the speed issue are widely addressed by other currencies, for instance). On the other hand, the processing power requirements are driven by proof of work needs unique to a public blockchain.
All of this is to simply say that private blockchains do not have the drawbacks of long transactions times or heavy processing needs.
Potential Enterprise Applications
Consider a set of banks that need to communicate asset transfers between each other electronically. They will set up a private blockchain to set up a common distributed database. If a bank needs to make a transaction, it will update the database and the changes are synchronized immediately across the entire distribution. This is the “distributed ledger” technology that enables public cryptocurrencies and as asset transfer mechanisms in private markets.
This blog from Oracle outlines four ideas including autonomous marketplaces; trade assets like invoices; secure business transactions; and enhancing product quality and recall effectiveness. Such applications would be enabled on distributed ledger platforms like this one from Oracle.
Blockchain security can essentially be applied to any sort of transaction processing system (e.g., order processing, general ledger, inventory tracking), and software vendors are moving rapidly to do this.
Aspects of Blockchain Relevant to BI and Analytics from an Analyst’s Perspective
Since blockchain primarily affects the facts in a star schema in transactional systems, we can’t see many ways in which it could fundamentally change business intelligence (BI) and analytics applications. However, technology changes have a way of making their way from the transaction processing layer to the analytical systems layer if you consider big trends like client-server, cloud and noSQL. So, what are some aspects of blockchain that could make a difference to an analyst?
Blockchain does not make fraud impossible, but it makes it hard by making detection easier. Ease is relative though, and the speed of fraud detection becomes a crucial factor in defining success for both platforms as well as applications. Thus, using analysis to detect fraud is going to become the first major application for BI and analytics in blockchain.
Processing hashes might need to become more common especially in the ETL (Extraction, Transform, Loading) Encrypted hashing has the same problem as encryption in general: the longer it takes to encrypt, the longer it takes to break via brute force and thus the more secure. But security comes at the cost of processing power.
A block in a blockchain is immutable (it never changes). It’s not uncommon to deal with immutable data today, but in a scenario where it is taken for granted, we might see new techniques come into use.
A general understanding of the data and the nature of the application it’s coming from is crucial for analysts to architect adequate systems. Therefore, it is necessary for analysts to stay in touch with the developments in this area.
Still want to learn more about how to use blockchain in the analytics and BI arena? Send us a note at email@example.com or post a message below, and we’d be happy to follow up.