Category Archives: Models

The dynamics of UFO sightings

The Economist reports on UFO sightings:

UFOdataThis deserves a model:

UFOs

UFOs.vpm (Vensim published model, requires Pro/DSS or the free Reader)

The model is a mixed discrete/continuous simulation of an individual sleeping, working and drinking. This started out as a multi-agent model, but I realized along the way that sleeping, working and drinking is a fairly ergodic process on long time scales (at least with respect to UFOs), so one individual with a distribution of behaviors over time or simulations is as good as a population of agents.

The model replicates the data somewhat faithfully:

UFOdistributionThe model shows a morning peak (people awake but out and about) and a workday dip (inside, lurking near the water cooler) but the data do not. This suggests to me that:

  • Alcohol is the dominant factor in sightings.
  • I don’t party nearly enough to see a UFO.

Actually, now that I’ve built this version, I think the interesting model would have a longer time horizon, to address the non-ergodic part: contagion of sightings across individuals.

h/t Andreas Größler.

Early economic dynamics: Samuelson’s multiplier-accelerator

Paul Samuelson’s 1939 analysis of the multiplier-accelerator is a neat piece of work. Too bad it’s wrong.

Interestingly, this work dates from a time in which the very idea of a mathematical model was still questioned:

Contrary to the impression commonly held, mathematical methods properly employed, far from making economic theory more abstract, actually serve as a powerful liberating device enabling the entertainment and analysis of ever more realistic and complicated hypotheses.

Samuelson should be hailed as one of the early explorers of a very big jungle.

The basic statement of the model is very simple:

NationalIncome

In quasi-System Dynamics notation, that looks like:

SamuelsonDiagramB

A caveat:

The limitations inherent in so simplified a picture as that presented here should not be overlooked. In particular, it assumes that the marginal propensity to consume and the relation are constants; actually these will change with the level of income, so that this representation is strictly a marginal analysis to be applied to the study of small oscillations. Nevertheless it is more general than the usual analysis.

Samuelson hand-simulated the model (it’s fun – once – but he runs four scenarios):Simulated Samuelson then solves the discrete time system, to identify four regions with different behavior: goal seeking (exponential decay to a steady state), damped oscillations, unstable (explosive) oscillations, and unstable exponential growth or decline. He nicely maps the parameter space:

parameterSpace

ParamRegionBehaviorSo where’s the problem?

The first is not so much of Samuelson’s making as it is a limitation of the pre-computer era. The essential simplification of the model for analytic solution is;

Simplified

This is fine, but it’s incredibly abstract. Presented with this equation out of context – as readers often are – it’s almost impossible to posit a sensible description of how the economy works that would enable one to critique the model. This kind of notation remains common in econometrics, to the detriment of understanding and progress.

At the first SD conference, Gil Low presented a critique and reconstruction of the MA model that addressed this problem. He reconstructed the model, providing an operational description of the economy that remains consistent with the multiplier-accelerator framework.

LowThe mere act of crafting a stock-flow description reveals problem #1: the basic multiplier-accelerator doesn’t conserve stuff.

inventory1 InventoryCapital2Non-conservation of stuff leads to problem #2. When you do implement inventories and capital stocks, the period of multiplier-accelerator oscillations moves to about 2 decades – far from the 3-7 year period of the business cycle that Samuelson originally sought to explain. This occurs in part because the capital stock, with a 15-year lifetime, introduces considerable momentum. You simply can’t discover this problem in the original multiplier-accelerator framework, because too many physical and behavioral time constants are buried in the assumptions associated with its 2 parameters.

Low goes on to introduce labor, finding that variations in capacity utilization do produce oscillations of the required time scale.

ShortTermI think there’s a third problem with the approach as well: discrete time. Discrete time notation is convenient for matching a model to data sampled at regular intervals. But the economy is not even remotely close to operating in discrete annual steps. Moreover a one-year step is dangerously close to the 3-year period of the business cycle phenomenon of interest. This means that it is a distinct possibility that some of the oscillatory tendency is an artifact of discrete time sampling. While improper oscillations can be detected analytically, with discrete time notation it’s not easy to apply the simple heuristic of halving the time step to test stability, because it merely compresses the time axis or causes problems with implicit time constants, depending on how the model is implemented. Halving the time step and switching to RK4 integration illustrates these issues:

RK4

It seems like a no-brainer, that economic dynamic models should start with operational descriptions, continuous time, and engineering state variable or stock flow notation. Abstraction and discrete time should emerge as simplifications, as needed for analysis or calibration. The fact that this has not become standard operating procedure suggests that the invisible hand is sometimes rather slow as it gropes for understanding.

The model is in my library.

See Richardson’s Feedback Thought in Social Science and Systems Theory for more history.

Are all models wrong?

Artem Kaznatcheev considers whether Box’s slogan, “all models are wrong,” should be framed as an empirical question.

Building on the theme of no unnecessary assumptions about the world, @BlackBrane suggested … a position I had not considered before … for entertaining the possibility of a mathematical universe:

[Box's slogan is] an affirmative statement about Nature that might in fact not be true. Who’s to say that at the end of the day, Nature might not correspond exactly to some mathematical structure? I think the claim is sometimes guilty of exactly what it tries to oppose, namely unjustifiable claims to absolute truth.

I suspect that we won’t learn the answer, at least in my lifetime.

In a sense, the appropriate answer is “who cares?” Whether or not there can in principle be perfect models, the real problem is finding ones that are useful in practice. The slogan isn’t helpful for this. (NIPCC authors seem utterly clueless as well.)

In a related post, AK identifies a 3-part typology of models that suggests a basis for guidance:

  • “Insilications – In physics, we are used to mathematical models that correspond closely to reality. All of the unknown or system dependent parameters are related to things we can measure, and the model is then used to compute dynamics, and predict the future value of these parameters. …
  • Heuristics – … When George Box wrote that “all models are wrong, but some are useful”, I think this is the type of models he was talking about. It is standard to lie, cheat, and steal when you build these sort of models. The assumptions need not be empirically testable (or even remotely true, at times), and statistics and calculations can be used to varying degree of accuracy or rigor. … A theorist builds up a collection of such models (or fables) that they can use as theoretical case studies, and a way to express their ideas. It also allows for a way to turn verbal theories into more formal ones that can be tested for basic consistency. …
  • Abstractions – … These are the models that are most common in mathematics and theoretical computer science. They have some overlap with analytic heuristics, except are done more rigorously and not with the goal of collecting a bouquet of useful analogies or case studies, but of general statements. An abstraction is a model that is set up so that given any valid instantiation of its premises, the conclusions necessarily follow. …”

The social sciences are solidly in the heuristics realm, while a lot of science is in the insilication category. The difficulty is knowing where the boundary lies. Actually, I think it’s a continuum, not a categorical. One can get some hint by looking at the problem context for models. For example:

Known state variables? Reality Checks (conservation laws, etc.)? Data per concept? Structural information from more granular observations or models? Experiments? Computation?
Physics yes lots lots yes yes often easy
Climate yes some some for many things not at scale limited
Economics no some some – flaky microfoundations often lacking or unused not at scale limited

(Ironically, I’m implying a model here, which is probably wrong, but hopefully useful.)

A lot of our most interesting problems are currently at the heuristics end of the spectrum. Some may migrate toward better model performance, and others probably won’t – particularly models of decision processes that willfully ignore models.

Facebook Reloaded 2013

Facebook has climbed out of its 2012 doldrums to a market cap of $115 billion today. So, I’ve updated my user tracking and valuation model, just for kicks.

As in my last update, user growth continues to modestly exceed the original estimates. The user “carrying capacity” now is about 1.35 billion users, vs. .95 originally (K950 on graph) and 1.07 in 2012 – within the range of scenarios I originally ran, but well above the “best guess”. My guess is that the model will continue to underpredict for a while, because this is an inevitable pitfall of using a single diffusion process to represent what is surely the aggregate of several processes – stationary vs. mobile, different regions and demographics, etc. Of course, in the long run, users could also go down, which the basic logistic model can’t represent.

You can see what’s going on if you plot growth against users -the right tail doesn’t go to 0 as fast as the logistic assumes:

User growth probably isn’t a huge component of valuation, because these are modest differences on a percentage basis. Marginal users may be less valuable as well.

With revenue per user at a constant $7/user/year, and 30% margins, and the current best-guess model, FB is now worth $35 billion. What does it take to get to the ballpark of current market capitalization? Here’s one way:

  • The carrying capacity ceiling for users continues to grow to 2 billion, and
  • revenue per user rises to $25/user/year

This preserves some optimistic base case assumptions,

  • The risk-free interest rate takes 5 more years to rise substantially above 0 to a (still low) long term rate of 3%
  • Margins stay at 30% as in 2009-2011 (vs. 18% y.t.d.)

Think it’ll happen?

facebook 3 update 2.vpm

Random rein control

An interesting article in PLOS one explores the consequences of a system of random feedbacks:

The Emergence of Environmental Homeostasis in Complex Ecosystems

The Earth, with its core-driven magnetic field, convective mantle, mobile lid tectonics, oceans of liquid water, dynamic climate and abundant life is arguably the most complex system in the known universe. This system has exhibited stability in the sense of, bar a number of notable exceptions, surface temperature remaining within the bounds required for liquid water and so a significant biosphere. Explanations for this range from anthropic principles in which the Earth was essentially lucky, to homeostatic Gaia in which the abiotic and biotic components of the Earth system self-organise into homeostatic states that are robust to a wide range of external perturbations. Here we present results from a conceptual model that demonstrates the emergence of homeostasis as a consequence of the feedback loop operating between life and its environment. Formulating the model in terms of Gaussian processes allows the development of novel computational methods in order to provide solutions. We find that the stability of this system will typically increase then remain constant with an increase in biological diversity and that the number of attractors within the phase space exponentially increases with the number of environmental variables while the probability of the system being in an attractor that lies within prescribed boundaries decreases approximately linearly. We argue that the cybernetic concept of rein control provides insights into how this model system, and potentially any system that is comprised of biological to environmental feedback loops, self-organises into homeostatic states.

To get a handle on how this works, I replicated the model (see my library).

The basic mechanism of the model is rein control, in which multiple unidirectional forces on a system act together to yield bidirectional feedback control. By analogy, the reins on a horse can only pull in one direction, but with a pair of reins, it’s possible to turn both left and right.

In the model, there’s a large random array of reins, consisting of biotic feedbacks that occur near a particular system state. In the simple one-dimensional case, when you add a bunch of these up, you get a 1D vector field that looks like this:

If this looks familiar, there’s a reason. What’s happening along the E dimension is a lot like what happens along the time dimension in pink noise: at any given point, the sum of a lot of random impulses yield a wiggly net response, with a characteristic scale yielded by the time constant (pink noise) or niche width of biotic components (rein control).

What this yields is an alternating series of unstable (tipping) points and stable equilibria. When the system is perturbed by some external force, the disturbance shifts the aggregate response, as below. Generally, a few stable points may disappear, but the large features of the landscape are preserved, so the system resists the disturbance.

With a higher-dimensional environmental state, this creates convoluted basins of attraction:

This leads to a variety of conclusions about ecological stability, for which I encourage you to have a look at the full paper. It’s interesting to ponder the applicability and implications of this conceptual model for social systems.

Positive feedback drives email list meltdown

I’m on an obscure email list for a statistical downscaling model. I think I’ve gotten about 10 messages in the last two years. But today, that changed.

List traffic (data in red).

Around 7 am, there were a couple of innocuous, topical messages. That prompted someone who’d evidently long forgotten about the list to send an “unsubscribe me” message to the whole list. (Why people can’t figure out that such missives are both ineffective and poor list etiquette is beyond me.) That unleashed a latent vicious cycle: monkey-see, monkey-do produced a few more “unsub” messages. Soon the traffic level became obnoxious, spawning more and more ineffectual unsubs. Then, the brakes kicked in, as more sensible users appealed to people to quit replying to the whole list. Those messages were largely lost in the sea of useless unsubs, and contributed to the overall impression that things were out of control.

People got testy:

I will reply to all to make my point.

Has it occurred to any of you idiots to just reply to Xxxx Xxxx rather than hitting reply to all. Come on already, this is not rocket science here. One person made the mistake and then you all continue to repeat it.

By about 11, the fire was slowing, evidently having run out of fuel (list ignoramuses), and someone probably shut it down by noon – but not before at least a hundred unsubs had flown by.

Just for kicks, I counted the messages and put together a rough-cut Vensim model of this little boom-bust cycle:

unsub.mdl unsub.vpm

This is essentially the same structure as the Bass Diffusion model, with a few refinements. I think I didn’t quite capture the unsubscriber behavior. Here, I assume that would-be unsubscribers, who think they’ve left the list but haven’t, at least quit sending messages. In reality, they didn’t – in blissful ignorance of what was going on, several sent multiple requests to be unsubscribed. I didn’t explicitly represent the braking effect (if any) of corrective comments. Also, the time constants for corrections and unsubscriptions could probably be separated. But it has the basics – a positive feedback loop driving growth in messages, and a negative feedback loop putting an end to the growth. Anyway, have fun with it.

Computing and networks have solved a lot of problems, like making logistics pipelines visible, but they’ve created as many new ones. The need for models to improve intuition and manage new problems is as great as ever.

Spot the health care smokescreen

A Tea Party presentation on health care making the rounds in Montana claims that life expectancy is a smoke screen, and it’s death rates we should be looking at. The implication is that we shouldn’t envy Japan’s longer life expectancy, because the US has lower death rates, indicating superior performance of our health care system.

Which metric really makes the most sense from a systems perspective?

Here’s a simple, 2nd order model of life and death:

From the structure, you can immediately observe something important: life expectancy is a function only of parameters, while the death rate also includes the system states. In other words, life expectancy reflects the expected life trajectory of a person, given structure and parameters, while the aggregate death rate weights parameters (cohort death rates) by the system state (the distribution of population between old and young).

In the long run, the two metrics tell you the same thing, because the system comes into equilibrium such that the death rate is the inverse of the life expectancy. But people live a long time, so it might take decades or even centuries to achieve that equilibrium. In the meantime, the death rate can take on any value between the death rates of the young and old cohorts, which is not really helpful for understanding what a new person can expect out of life.

So, to the extent that health care performance is visible in the system trajectory at all, and not confounded by lifestyle choices, life expectancy is the metric that tells you about performance, and the aggregate death rate is the smokescreen.

Here’s the model: LifeExpectancyDeathRate.mdl or LifeExpectancyDeathRate.vpm

It’s initialized in equilibrium. You can explore disequilbrium situations by varying the initial population distribution (Init Young People & Init Old People), or testing step changes in the death rates.

What a real breakthrough might look like

It’s possible that a techno fix will stave off global limits indefinitely, in a Star Trek future scenario. I think it’s a bad idea to rely on it, because there’s no backup plan.

But it’s equally naive to think that we can return to some kind of low-tech golden age. There are too many people to feed and house, and those bygone eras look pretty ugly when you peer under the mask.

But this is a false dichotomy.

Some techno/growth enthusiasts talk about sustainability as if it consisted entirely of atavistic agrarian aspirations. But what a lot of sustainability advocates are after, myself included, is a high-tech future that operates within certain material limits (planetary boundaries, if you will) before those limits enforce themselves in nastier ways. That’s not really too hard to imagine; we already have a high tech economy that operates within limits like the laws of motion and gravity. Gravity takes care of itself, because it’s instantaneous. Stock pollutants and resources don’t, because consequences are remote in time and space from actions; hence the need for coordination. Continue reading

Et tu, Groupon?

Is Groupon overvalued too? Modeling Groupon actually proved a bit more challenging than my last post on Facebook.

Again, I followed in the footsteps of Cauwels & Sornette, starting with the SEC filing data they used, with an update via google. C&S fit a logistic to Groupon’s cumulative repeat sales. That’s actually the end of a cascade of participation metrics, all of which show logistic growth:

The variable of greatest interest with respect to revenue is Groupons sold. But the others also play a role in determining costs – it takes money to acquire and retain customers. Also, there are actually two populations growing logistically – users and merchants. Growth is presumably a function of the interaction between these two populations. The attractiveness of Groupon to customers depends on having good deals on offer, and the attractiveness to merchants depends on having a large customer pool.

I decided to start with the customer side. The customer supply chain looks something like this:

Subscribers data includes all three stocks, cumulative customers is the right two, and cumulative repeat customers is just the rightmost.

Continue reading

Time to short some social network stocks?

I don’t want to wallow too long in metaphors, so here’s something with a few equations.

A recent arXiv paper by Peter Cauwels and Didier Sornette examines market projections for Facebook and Groupon, and concludes that they’re wildly overvalued.

We present a novel methodology to determine the fundamental value of firms in the social-networking sector based on two ingredients: (i) revenues and profits are inherently linked to its user basis through a direct channel that has no equivalent in other sectors; (ii) the growth of the number of users can be calibrated with standard logistic growth models and allows for reliable extrapolations of the size of the business at long time horizons. We illustrate the methodology with a detailed analysis of facebook, one of the biggest of the social-media giants. There is a clear signature of a change of regime that occurred in 2010 on the growth of the number of users, from a pure exponential behavior (a paradigm for unlimited growth) to a logistic function with asymptotic plateau (a paradigm for growth in competition). [...] According to our methodology, this would imply that facebook would need to increase its profit per user before the IPO by a factor of 3 to 6 in the base case scenario, 2.5 to 5 in the high growth scenario and 1.5 to 3 in the extreme growth scenario in order to meet the current, widespread, high expectations. [...]

I’d argue that the basic approach, fitting a logistic to the customer base growth trajectory and multiplying by expected revenue per customer, is actually pretty ancient by modeling standards. (Most system dynamicists will be familiar with corporate growth models based on the mathematically-equivalent Bass diffusion model, for example.) So the surprise for me here is not the method, but that forecasters aren’t using it.

Looking around at some forecasts, it’s hard to say what forecasters are actually doing. There’s lots of handwaving and blather about multipliers, and little revelation of actual assumptions (unlike the paper). It appears to me that a lot of forecasters are counting on big growth in revenue per user, and not really thinking deeply about the user population at all.

To satisfy my curiosity, I grabbed the data out of Cauwels & Sornette, updated it with the latest user count and revenue projection, and repeated the logistic model analysis. A few observations:

I used a generalized logistic, which has one more parameter, capturing possible nonlinearity in the decline of the growth rate of users with increasing saturation of the market. Here’s the core model:

Continue reading