Setup
Installation
To install the tokenomics decentralization analysis tool, simply clone this GitHub repository:
git clone https://github.com/Blockchain-Technology-Lab/tokenomics-decentralization.git
The tool is written in Python 3, therefore a Python 3 interpreter is required in order to run it locally.
The requirements file lists the dependencies of the project. Make sure you have all of them installed before running the scripts. To install all of them in one go, run the following command from the root directory of the project:
python -m pip install -r requirements.txt
Execution
The tokenomics decentralization analysis tool is a CLI tool. To run the tool simply do:
python run.py
The execution is controlled and parameterized by the configuration file
config.yaml
as follows:
metrics
defines the metrics that will be computed in the analysis. By
default, all supported metrics are included here (to add support for a new metric
see the conributions
page).
ledgers
defines the ledgers that will be analyzed. By default, all supported
ledgers are included here (to add support for a new ledger see the conributions
page).
execution_flags
defines flags that control the data handling (all set to false by default):
force_map_addresses
: if set to true, the address mapping data from the directorymapping_information
is re-computed; you should set this flag to true if the mapping data has been updated since the last execution for the given ledger
analyze_flags
defines various analysis-related flags:
clustering_sources
: a list of sources that should be used to compute the address mapping information. If empty, no clustering takes place and the addresses are treated as distinct entities. The list should contain any combination of the following options (case sensitive): "Explorers", "Staking Keys", "Multi-input transactions".top_limit_type
: a string that can take one of two values (absolute
orpercentage
) that enables applying a threshold on the addresses that will be consideredtop_limit_value
: the value of the top limit that should be applied; if 0, then no limit is used (regardless of the value oftop_limit_type
); if the type isabsolute
, then thetop_limit_value
should be an integer (e.g., if set to 100, then only the 100 wealthiest entities/addresses will be considered in the analysis); if the type ispercentage
the thetop_limit_value
should be a value between 0 and 1 (e.g., if set to 0.50, then only the top 50% of wealthiest entities/addresses will be considered)exclude_contract_addresses
: a boolean value that enables the exclusion of contract addresses from the analysisexclude_below_fees
: a boolean value that enables the exclusion of addresses, the balance of which at the analyzed point in time was less than the average transaction feeexclude_below_usd_cent
: a boolean value that enables the exclusion of addresses, the balance of which at the analyzed point in time was less than $0.01 (based on the historical price information in the directoryprice_data
)
snapshot_dates
and granularity
control the snapshots for which an analysis
will be performed. granularity
is a string that can be empty or one of day
, week
,
month
, year
. If granularity is empty, then snapshot_dates
define the exact
time points for which an analysis will be conducted, in the form YYYY-MM-DD.
Otherwise, if granularity is set, then the two farthest entries in
snapshot_dates
define the timeframe over which the analysis will be conducted,
at the set granular rate. For example, if the farthest points are 2010
and
2023
and the granularity is set to month
, then (the first day of) every
month in the years 2010-2023 (inclusive) will be analyzed.
input_directories
and output_directories
are both lists of directories that
define the source of data. input_directories
defines the directories that
contain raw address balance information, as obtained from BigQuery or a full
node (for more information about this see the data collection
page).
output_directories
defines the directory to store the output files of the
analysis and the plots.
Finally, plot_parameters
contains various parameters that control whether plots will be produced for the results and for which configurations.