*How balances are distributed across addresses for different cryptocurrencies? There are known statistical metrics (Gini, Theil, Nakamoto), used to quantify and compare cryptocurrencies. They also allow to build trends in time and compare crypto with real economics.*

**Statistics of Inequality**

Balances in the blockchain are distributed unequally across addresses. Some people are rich, some people are poor. This also appears naturally in every financial system.

There are ways to quantify the inequality in wealth «in large», taking into consideration the wellness of a whole country’s population or income values. Several approaches were proposed by mathematicians to quantify the inequality as a single number, taking the raw data of balance distribution as a raw input.

The rationality behind this is that the single number (index or coefficient) gives us the ability to compare one country to another in terms of how evenly the wealth is distributed across the population. This also allows us to spot a trend, revealing the tendencies and events.

Apparently, the same approach can be applied to cryptocurrency, as in most cases the balances of specific addresses can be calculated from the history of transactions, and thus it is publicly available.

**Why Is It important?**

Knowing and comparing wealth distribution indexes for cryptocurrencies is important for several reasons:

- Indexes allow us to estimate our assumptions about the «decentralization» of cryptocurrencies and compare them with the reference numbers from the real economy with fiat money.
- Trends help us to see the tendency of changes within time, and sometimes it shows important events, such as large redistribution of funds.
- Statistical measures of well-known cryptocurrencies can also serve as the experimental basis for developing new protocols, especially concerning proof-of-stake consensus algorithms.
- Too much inequality may lead to lower security of cryptocurrency in certain cases and may mean weak usage patterns and adoption.

**Gini, Theil, Nakamoto…**

There are several indexes to quantify inequality:

### Gini

Gini is about the cumulative distribution, with the assumption that there are classes with higher and lower balances.

Ordering balances on the horizontal axis and the cumulative sum of these balances gives us a Lorenz curve. On the left side of the curve are the lower balances, and it rises to the right end, where the top balances are located.

100% on the vertical axis is the total supply of the coin. 100% on horizontal axis is the total number of all coin holders.

Gini is an area between the straight line (which corresponds to an ideally equal distribution) and the actual Lorenz curve.

The Lorenz curve is a nice visualization, already ready to make some judgements, and Gini is a single value describing it.

### Theil Index

Ordering is important for the Gini Index, which makes it very dependent on the way the sampling of the data is done. Also, this ordering and calculating of cumulative balance requires many computations if done on a large database.

The Theil index is based on the informational entropy calculation. It comes from the fact that the entropy metric is low when all balances are equal, and high, when they are very different.

### Nakamoto Index

The concept behind the Nakamoto index is simple — how many addresses hold 51% or more of the total coin supply? Only top addresses are essential to calculate this, as it does not take into account the actual distribution of balances.

This index can be extended to a more generic case — the number of addresses holding, let’s say 90 or 99% of the total supply.

**Data Crunching**

BigQuery public datasource and bloxy.info databases were used to query the raw data.

The SQL query is composed of 2 parts:

- First, we query balances, grouping income/outcome transactions. An example of a query for Bitcoin is seen in our graph.
- Using balances, we calculate all the coefficients on the total set of holders, or on the set of N top addresses.

This calculation can be done for a specific date, filtering the transactions prior to this date.

**Lorenz Curves**

The calculated curve for Ethereum looks very smooth. This is because it shows the cumulative amount on the Y axis, which grows only by a fraction with every address on the X axis. The area between the green line and the curve is the Gini coefficient.

The calculated curve for Ethereum looks very smooth. This is because it shows the cumulative amount on the Y axis, which grows only by a fraction with every address on the X axis. The area between the green line and the curve is the Gini coefficient.

If we take more addresses into consideration (up to 53

**GINI Coefficients:**

The overlapped Lorenz curves for ETH, BTC and DASH clearly show how the form of a curve affects the Gini factor and why. Ether has the lowest level of a curve for small balances on the left side of the graph, while BTC and especially DASH are much higher there.

It is caused by the fact, that for BTC and DASH there are more addresses in the lower and medium balance ranges, which together constitute larger amounts, than those on Ether.

*Note: The ‘amounts’ and ‘balances’ throughout this study take into account only given tokens or coins (ETH/BTC, etc.). No figures and graphs here are dependent on the market price of the coins. Hence, comparing BTC, ETH and DASH is very fair in the context of these metrics.*

This allows the comparison of coins between users and over time, even if their real ‘value’ is significantly different.

*How do the big wallets on exchanges affect the Gini factor? Calculations show that the impact is not so great. Not taking exchange balances into consideration reduces the Gini number, but not significantly.*

**Gini Trends**

**Dash** has the lowest Gini (0.457), and it has been reducing from the very beginning since 2015. It has the most uniform distribution of balances across addresses from these coins.

**Bitcoin** is the next uniform coin (**0.672**), holding a stable Gini number for the last years.

**Bitcoin Cash**, which is forked from Bitcoin, has Gini trends towards more unequal distribution, comparing to Bitcoin.

**Ethereum, Ethereum Classic** and **LTC** are outsiders on this graph, having very unequal distributions with high Gini factors.

Gini coefficients for Bitcoin and Ethereum did not change significantly over the last several years. Despite the big price changes, and certain large fund movements, we do not see any effect on the overall structure of balance distribution.

The early years of a new coin appearance do have certain dynamics, clear visible in 2010-2012 for Bitcoin, 2012-2014 for Ethereum and lately for Dash and new forks. These changes, however are very unlikely to be related to market activity, and are certainly related to the initial coin distribution between miners, funds and exchanges, which is formed completely over a two-year period.

*We used just 10K top addresses for this analysis. The reason for this is that all coins have a very large number of low-balance addresses, which cause very high Gini numbers, close to one. For comparison and trend building, it is more convenient to have a broader range of Gini factors.*

*How many addresses to consider?*

*How many addresses to consider?*

*What if we take more than 10K top addresses to calculate Gini ? Question makes sense, as the 10K’th address in Bitcoin network holds as much as a 154 BTC, in Ethereum 10K’th address holds as much as 700ETH. Both numbers equivalent to more than 1 million $, which is quite a large. In 10K address analysis, we skip all «ordinary» wallets, practically taking only «big whales». This is not fair. Let’s fix this!*

*To give more sense of how Gini depends on the count of addresses we take into account, we did additional calculations, presented in the tables:*

*Conclusions can we get from these numbers:*

*BTC consistently outperforms ETH in terms of distribution uniform measure, Gini Coefficient is lower for BTC comparing to ETH on all ranges of top addresses. The results and trends, that we calculate for 10K addresses, likely to be valid;**The difference of Gini on the whole address set is almost not noticeable (0.989 versus 0.998 ). It means that if we take all addresses, holding non-zero amounts, then the distribution will be highly non-uniform. Both coins, ETH and BTC, have the long «tail» of addresses, holding low balances ( below ~ $1K );**Most*difference between coinsare evident on 10 .. 100 K top addresses range.

*Nakamoto Index Trends*

*Nakamoto Index Trends*

**Dash **and** Bitcoin** have the highest Nakamoto indexes (~3000 and ~5000) respectively. So, the balance on the top addresses below this number is not enough to collect 51% of the value of the total BTC supply. This is, of course, good news for these coins.

**BTC Cash** has had an even steeper decline after forks on the Nakamoto graph. As a graph on log scales, the differences in the Nakamoto index between them is very significant — approx. 5 times!

**Ethereum**, **LTC** and **ETC** have much less indexes (70 – 300), meaning that the majority of coins are concentrated in the hands of just dozens of addresses. Interestingly, Ethereum Classic made a big dip on the Nakamoto index in 2017 to numbers of less than 10, but which recovered later.

We see more dynamics between years in the Nakamoto index graph, compared to the Gini Coefficient graph in the previous section. The Nakamoto index is much more fragile to local changes of balances and can be easily changed due to just several large transactions from big wallet addresses. The Gini index takes into consideration much more information from many addresses, and is much more stable.

The Nakamoto index is probably better suited for capturing events and immediate dynamics, and the Gini index is better for comparisons and long-term trends.

**The Relationship Between Indexes**

The graphs of several metrics were calculated on a daily basis for Ethereum to determine the changes on different time granularities, also checking the correlation between metrics.

The Gini and Theil indexes behave very similarly, and only the range of values is very different. However, the actual value of the Theil index is not so important, as the entropy level is relative by definition.

On the other hand, the Nakamoto index has a complete opposite correlation to the Gini (and Theil) indexes, which should be rather natural. The more uniform the distribution, the more addresses are needed to accumulate 51% of the supply.

However, the strong correlation between the indexes may only happen in case of having a similar kind of balance distribution. One hypothesis that may appear is that all cryptocurrencies are following very similar distributions, and the indexes can be used interchangeably.

To check this theory, we plotted the X-Y plane of coins by specific year on the Nakamoto – Gini graph:

All coin points, except Dash, reside on a diagonal line, connecting **high Nakamoto-low Gini **with **low Nakamoto – high Gini**.

For these coins, this trend means the strong reverse relation of the Nakamoto index with the Gini index.

For **Dash**, this is not the case, with all points lying on the horizontal plane, at the left top area of the graph. The Nakamoto index is not related to the Gini metrics at all for Dash. We expect that Dash has a totally different distribution of balances for the addresses, causing this behavior. The Lorenz curve for Dash also looked quite different, especially in the area of lower balances.

**What About Tokens?**

There are so many tokens (ERC-20, ERC-721) in circulation on Ethereum, that we built indexes for them all simultaneously and placed the majority of them on this plot:

It is clearly seen, that the overwhelming majority of all tokens are positioned in the range of the Gini coefficient **from 0.9 to 1.0 **and the Nakomoto index **below 100**.

In other words, they are pretty bad in terms of equality of balances, even in comparison with Ether, appearing as the large beige circle on the right. The bubble size on this plot is proportional to the holder count of the token.

There are only a few tokens listed on the graph, that can be considered as ‘equally distributed’. Every such case must be carefully investigated. For example, a token airdrop campaign can easily change the distribution and create artificially generated low Gini coefficient levels.

In general, tokens are having a very unnatural and artificial distribution, with high Gini and Theil metrics.

**Analysis of Top 10K addresses in BTC/ETH**

**Percentage of total BTC in circulation is held by the top 10,000 addresses**

*What percentage of total BTC in circulation is held by the top 10,000 addresses in Bitcoin? How has that changed in August 2018, August 2017 and back to 2010? (I want to show that it hasn’t changed much in years)*

**How many addresses hold that same percentage of Ethereum**

*How many addresses hold that same percentage of Ethereum? (this is to demonstrate that Ethereum is more concentrated – although completely understand there a many fewer addresses on the Ethereum blockchain)*

Answer is 642 addresses, the cumulative balance distribution looks as:

In comparison, BTC distribution looks as:

**How long the addresses have been in existence**

*one last question – of the top 10k addresses in Bitcoin, can we see how long the addresses have been in existance? I.e. showing the benefits (they’re rich!) of being an early adoptor*

In average – 800 days. The distribution by full years of life for top 10k is:

**Comparing Crypto and Real Macroeconomics**

The World Bank investigated Gini factors in a set of countries over a period of years. They used household data, which is already averaged compared to personal data.

The range of Gini is from 0.25 for Ukraine in 2016 to 0.63 for South Africa in 2014.

If we place some countries and crypto onto the same axis for Gini, we will see the following picture:

However, the direct comparison of a Gini factor for crypto with a country population is hard because:

- The unit of analysis for cryptocurrencies is addresses, which is not equivalent to people or households, which are used for the analysis of a country’s population.
- The balances on addresses do not necessarily reflect ‘wealth’ or ‘income’, as in the terms used in real economy.
- Some addresses belong to organizations (exchanges or crypto loans).

Taking these factors in consideration, we expect that our Gini estimates for crypto are biased toward higher values, compared to a country analysis.

When taking this bias into consideration, Dash and Bitcoin look pretty good in this comparison.

**Conclusions**

- The inequality of balance distributions between holders of major crypto coins and tokens can be estimated using the well-known metrics; Theil, Gini, Nakamoto. The results show the differences of coins between each
other, and the dynamics over time. - Bitcoin and Dash have less inequality of balances compared to LTC, ETH, ETC and almost all Ethereum tokens. This is confirmed by all indexes.
- Ethereum tokens have
a very unequal distribution, with most balances concentrated in top wallets. This is a very common pattern, and only a minority of tokens are not falling into the same category, for example; Livepeer, Rebellious, Hydro. - In many cases, Theil, Gini, Nakamoto are correlated and can be used interchangeably. Gini and Theil indexes are the best for a comparison study, especially Nakamoto to track events and movements of large funds. Theil and Nakamoto
indexes also easier to calculate compared to Gini. - Comparing crypto metrics of inequality with real macroeconomics is hard because of the different aggregation methods used. In general, the distribution of cryptocurrencies is much more unequal compared to the distribution of money across a country’s population.

**Resources used for the research**

- Big query scripts: https://github.com/kir8/big-query-gini-research
- Gini coefficient, https://en.wikipedia.org/wiki/Gini_coefficient
- Theil Index, https://en.wikipedia.org/wiki/Theil_index
- Evgeny Medvedev, Calculating Gini Coefficient in BigQuery with SQL, 2019. https://medium.com/google-cloud/calculating-gini-coefficient-in-bigquery-3bc162c82168
- GINI index (World Bank estimate) https://data.worldbank.org/indicator/si.pov.gini