Internal Hardware Optimization, Hardware Profiling Database, and Dynamic Work Unit Normalization

in #gridcoin6 years ago

Abstract

 

A GRC research reward distribution mechanism based on a normalization of work units (WU) is introduced. The motivations for this mechanism are described in [1], and this proposal generalizes the results from a previous proposal [2]. The goal of this mechanism is to remove the necessity for rewarding every project the same amount of research-minted GRC, and give crunchers rewards as closely related to their hardware's contribution as possible. This is achieved via a three step process: 1) optimizing hardware on individual project applications; 2) building a hardware profiling database with network-aggregated information from (1); 3) establishing a work unit normalization across projects.

Acknowledgements

 

I would primarily like to thank @hotbit for their work on this topic that they published primarily in [3] and [4], which challenged my original proposition, and informed and motivated the changes presented here. I would also like to thank @h202 and @nexusprime for the comments in @hotbit's articles and my own, which also raised considerations that influenced this proposal. Lastly I would like to the thank the members of the Fireside Chat who listened to my ramblings on this topic, especially my co-hosts @jringo and @TheGoblinPopper, and at @jamescowens and @jringo for their opinions pre-publication.

Introduction

 

The primary motivation for designing this mechanism is to align research-minted GRC rewards as closely as possible to the value that the hardware actually produces. The primary shortcoming of the current GRC reward mechanism is that it distributes the same amount of GRC to every project, creating the possibility of earning much more GRC by crunching some projects than others [1], and creating a vast discrepancy in magnitude-based voting despite similar hardware contributions. This effectively lowers rewards for crunchers of more popular projects, and potentially creates a dichotomy in project selection for the cruncher between their preferred project and the most profitable project that they can crunch. Reducing this tension by making rewards for similar hardware contributions more equal across projects would allow crunchers to focus more on the value of the projects being crunched, rather than on the profitability of projects.

Other motivations include:

  • determining a stable, predictable amount of computational output, and energy, required to mint a GRC
  • allocation of GRC to projects subject to available WU (which is also addressed by a robust greylisting mechanism and Total Credit Delta)

A direct FLOP to GRC reward mechanism was suggested in [2], where the normalization between CPU FLOPs, GPU FP32 FLOPs, and GPU FP64 FLOPs was determined by an Equivalence Ratio (ER), which was the ratio of the weighted average FLOP/J for every hardware on the network. It maintained the property that the most efficient hardware is still rewarded the most GRC. However, the ER fell short in a number of ways that will covered in the following Sections, and in fact is a special case of the more general solution described in Section 3.

Another motivation mentioned in [2] was to design a mechanism that distributes equal rewards for the same hardware contribution regardless of the project that a user crunches with that hardware. This motivation was in fact misguided: primarily because of the hardware's architecture and the way that the application code is written, some hardware will be naturally better on some project applications than others. The nature of this mistake is elucidated in Sections 1 and 3. In Section 2, an outline for a community-built hardware profiling database is introduced. Such a database would have several benefits to the Gridcoin community.

The principle of the normalization can be understood from the answer to the question: If all of the computing power on the network was focused on App A, how many WU/h would the network achieve? What if it was all focused on App B? App C? And so on. The ratio of these WU/h for each application is the normalization, which must be adjusted dynamically since the set of hardware on the network does not remain constant. While the normalization can still be constructed using just the method described in Section 3, Sections 1 and 2 address some hardware-related issues regarding normalizing WU and have benefits of their own.

1. Internal Hardware Optimization

 

Consider a list of the WU/h for each application that some hardware completes under the default settings provided by the projects (the choice of hours as a time unit is arbitrary). If we optimize that hardware for WU/h, for each application, the improvement in performance will not necessarily scale equally for each application - some applications/projects will benefit more from optimization of a particular hardware than others. Note that this is different than optimizing for WU/J, which would be optimizing for energy efficiency. While optimizing for WU/J is most desirable, it substantially complicates the normalization in Section 3 by potentially having solutions which only partially utilize the hardware. Furthermore, optimizations for WU/h and WU/J have a good chance of being the same, or having WU/h optimization be relatively close to the WU/J optimization after converting units.

To demonstrate the disproportionate scaling, consider CPU A, which before optimization accomplishes 1 WU/h of Application 1 (on average), and 3 WU/h of Application 2 (on average). After optimization, however, CPU A accomplishes 2 WU/h of App 1, and 4 WU/h of App 2. The increase in performance for App 1 is 100%, but for App 2 is only 33%. Adjusting other settings can likewise improve or inhibit the performance of hardware relative to what that hardware would achieve under default project settings.

Furthermore, while CPU A can output 100% more WU/h on App 1 if it is optimized, suppose that CPU B can only achieve 40% more on App 1 after optimization. Thus, both an individual hardware, and different hardwares relative to each other, scale unequally in performance after optimization on every project.

Normalization requires us to compare different hardwares, but the fact that a single hardware can perform differently under different settings, operating systems, etc. raises a complication. What is the base level performance of any given hardware? The solution proposed here is to assume that the hardware is being used under WU/h optimizations. The way to determine these optimized settings would be to systematically test a given CPU at varying settings on every App that CPU can crunch.

The normalization of WU for a given piece of hardware would be the ratio of optimal WU/h that the hardware achieves on every application. This hardware-specific normalization answers the question: If I focused all of the computational power of this CPU on App 1, how many WU/h would I achieve? What about App 2? App 3? The normalization arises from the maximum amount of WU/h that a given piece of hardware is capable of achieving on every App it can crunch. For example, the CPU A described above has a WU normalization of 2 WU App 1 : 4 WU App 2 (ignoring the time unit, which is not necessary here).

Note that overclocking and other such modifications should not be considered as part of optimization, as it should not be assumed that a cruncher is willing and/or able to modify their hardware in this manner. However, because of the structure of the normalization proposed in Section 3, successfully doing such modifications will still yield more WU and thus GRC to the cruncher who does them.

2. Hardware Profiling Database

 

A proper organization of this hardware performance information would result in a hardware profiling database. Several other members of the community have worked on similar constructions. The most closely related work here is @nexusprime's QuickMag, @parejan's CPU/GPU performance series, and the BOINC project WUProp.

Benefits of this database include:

  • Gridcoin and BOINC crunchers could use this database (or a corresponding tool) to increase the energy efficiency of their hardware
  • Knowledge of hardware performance on project applications could lead to valuable insights for project administrators regarding how they design their code, and indeed what hardware to purchase if they choose to do so, leading to both faster results and lower energy consumption
  • Researchers currently outside the BOINC ecosystem could use such a database for the same purposes as current project administrators, potentially introducing more researchers into the BOINC community
  • The community would be using the resources we have closest at hand - our hardware - to create something of value to researchers, and potentially create the world's largest, publicly accessible hardware profiling database

3. Normalization of Work Units

 

Assuming that hardware is used under WU/h optimizations, we can construct a normalization of WU across projects and applications. A fixed normalization does not exist, because the normalization between WU depends on the universe of hardware that is being considered, i.e. the hardware that is online and crunching. Since hardware goes online and offline frequently, a dynamic WU normalization is required.

This phenomenon can be illustrated with two CPUs and two projects. Consider the CPUs described in Section 1. Let CPU B, when optimized, perform 3 WU/h of App 1, and 2 WU/h of App 2. If we consider just the first CPU, we can accomplish a static amount of WU all the time, and for the reasons explained in Section 1, the normalization would simply be 2 WU App 1 : 4 WU App 2. However, if we only consider the second CPU, then the normalization would be 3 WU App 1 : 2 WU App 2. One can see from this how the universe of hardware being considered affects the normalization. Thus, the normalization must be adjusted dynamically; under current network characteristics, probably every superblock.

Furthermore, we can expand on the single CPU example from Section 1. Considering only a single CPU, our normalization is the result of the answer to the question: If I dedicated all of my computing power with this one CPU at some application, how many WU/h of that application would I achieve? The logical equivalent when considering a heterogenous computing environment would be to ask: if I dedicated all of my hardware - in this case, all of the hardware on the network - at App 1, how many WU/h would I achieve? Applying the same question to the other applications, and using the logic in Section 1, yields the desired WU normalization.

Finishing the two CPU example described above, the normalization would be 5 WU App A : 6 WU App B, which is just the sum of individual contributions of each CPU.

Note that the ER was introduced in terms of FLOPs, whereas in this section, the desired ratio is expressed in terms of WU. If one assumes a linear mapping from WU to FLOPs with the same constants for all Applications, this approach makes sense. However, the fact that such a linear mapping is not the case (see [4]) is a shortcoming of the ER, which is just a special case of the more general solution described here.

4. Notes About Incomplete Network Information

 

The reality is, doing the above process for all of the hardware on the network is highly impractical. To have a truly accurate normalization, we would need to test the whole population of hardware. However, there exists a normalization with just one CPU and one GPU (assuming it has FP64 capabilities). If a large enough sample size is considered, the actual distribution can be approximated quite well. Furthermore, the distribution of hardware is not equal for each piece of hardware, i.e. some CPUs and GPUs are more common than others, and taking advantage of this fact significantly reduces the amount of work that obtaining an accurate representation of the hardware on the network requires.

Conclusion

 

Implementation of this mechanism could happen in several ways. The primary concern raised by normalizing WU and distributing rewards accordingly is that crunching less popular projects would no longer be incentivized with greater rewards. Mechanisms can be designed to still incentivize less popular projects to be crunched - for example, a percent of the reward kitty can be set aside to be distributed to all projects equally, while the rest of the kitty is distributed according to the normalization method (a similar proposal was made in the comments of [5]). This recreates the problem of our current mechanism, but on a much smaller scale. If there is to be some mechanism which ensures that less popular projects still get crunched, I would prefer a mechanism which guaranteed a minimum amount of crunching power to each whitelisted project, but did not reward crunchers of less popular projects with more GRC.

An interesting problem arises if we consider crunching multiple projects on the same CPU at the same time - what if some combination of different projects would actually be the most efficient use of that CPU? I have chosen to ignore this possibility since it is complicated and unlikely; however, if true, the mechanism would still reward the cruncher who uses the appropriate settings with more GRC. This effect is ignored for some of the same reasons, and with the same consequences, as overclocking hardware is ignored.

Discussion of how to obtain the information required for this process is omitted from this article. I am working on an information-gathering method which at the least would have a low chance of being successfully manipulated, but there is likely more than one valid option.

My misguided search for a normalization that allowed hardware to achieve the same amount of GRC on every application has led me to some potentially novel game-theoretic problems. The solutions that address these problems would attempt to reconcile financial incentives with personal project preferences, and I have been working on them for a few months. If these problems are novel, then the solutions might be publishable. If they are not novel (e.g. they have been solved theoretically in some paper which I have not yet uncovered in my literature search), applying existing theoretical solutions to this practical problem will still have great benefits for Gridcoin and BOINC, such as greatly improving the energy efficiency of network-wide use of hardware and creating new mechanisms by which Gridcoin users can contribute even more to their favorite projects.

References

 

[1]: Towards an Incentive-Compatible Magnitude Distribution

[2]: Researching a FLOP and Energy Based Model for GridCoin Reward Mechanism

[3]: Time and Work Based Model for GridCoin Reward Mechanism

[4]: Counting FLOPSs in a FLOPS - part 1

[5]: Ranking GridCoin Projects + Introduction to Nash Equilibrium


GRC donations can be sent to: S4Mv1TLp5gfoC9NseEG7e2yX6ACEaAnV37

Please see my Steemit blog for my up-to-date donation address.

Sort:  

This post received a courtesy vote from @gridcoin-booster! Thank you for your contribution to the #gridcoin community!

We will be discussing this paper and its implications on The Fireside on november 15th!

Congratulations! Your post has been selected as a daily Steemit truffle! It is listed on rank 13 of all contributions awarded today. You can find the TOP DAILY TRUFFLE PICKS HERE.

I upvoted your contribution because to my mind your post is at least 8 SBD worth and should receive 134 votes. It's now up to the lovely Steemit community to make this come true.

I am TrufflePig, an Artificial Intelligence Bot that helps minnows and content curators using Machine Learning. If you are curious how I select content, you can find an explanation here!

Have a nice day and sincerely yours,
trufflepig
TrufflePig

This post has been upvoted for free by @nanobot with 5%!
Get better upvotes by bidding on me.
More profits? 100% Payout! Delegate some SteemPower to @nanobot: 1 SP, 5 SP, 10 SP, custom amount
You like to bet and win 20x your bid? Have a look at @gtw and this description!

Congratulations @ilikechocolate! You received a personal award!

1 Year on Steemit

Click here to view your Board of Honor

Support SteemitBoard's project! Vote for its witness and get one more award!

Coin Marketplace

STEEM 0.24
TRX 0.27
JST 0.040
BTC 96764.82
ETH 3446.16
SBD 1.59