The tangle idea, isn't it something similar to sharding? Sharding is on the road map of both Ethereum and Bitcoin Cash - and probably many others as well.
As far as I've understood, every node needs to keep the full transaction history of all the tokens they have, meaning it may not scale with time (an anecdote - in my previous workplace we built up new software from scratch to replace the old one, and we did extensive stress testing, I don't remember, think we got 1000 TPS which was rather good as we would only expect something like 50-100 TPS during traffic peaks - however, all the stress testing was done with a hierarchical product catalogue that was just two levels deep and contained dozens of items - as soon as we went live we observed mediocre performance - our software didn't scale when the catalogue was many levels deep and had thousands of products).
"As far as I've understood, every node needs to keep the full transaction history of all the tokens they have, meaning it may not scale with time"
In IOTA there are a couple different type of nodes. The most prevelent right now being full nodes. Full nodes receive and record all incoming transactions, however they preform what is called a "snapshot" every now and then, which erases all data accept the addresses that currently have value. After each snapshot, the database stored on full nodes right now is only around 20MB.
Perma-nodes (not out yet) will store the entire history of the tangle. This is what will allow all transactions to be undeniably verified forever.
Swarm-nodes (not out yet) will consist of multiple machines that don't have the capacity to store the entire tangle on their own, each storing a part of it, and working together to act like one perma-node.
These techniques used together will allow all data for IOTA to be stored efficiently over time. It is possible that perma/swarm nodes might be payed a small fee for storing information permanently.