The Model Thinker - SeekingWisdom.ai

Models are formal structures represented in mathematics and diagrams that help us to understand the world. Mastery of models improves your ability to reason, explain, design, communicate, act, predict, and explore.

The models we cover fall into three classes: simplifications of the world, mathematical analogies, and exploratory, artificial constructs.

In sum, when our thinking is informed by diverse logically consistent, empirically validated frames, we are more likely to make wise choices.

To people who use models, the rise of model thinking has an even simpler explanation: models make us smarter.

The models share three common characteristics: First, they simplify, stripping away unnecessary details, abstracting from reality, or creating anew from whole cloth. Second, they formalize, making precise definitions. Models use mathematics, not words.

Models are wrong because they simplify. They omit details. By considering many models, we can overcome the narrowing of rigor by crisscrossing the landscape of the possible.

To rely on a single model is hubris. It invites disaster. To believe that a single equation can explain or predict complex real-world phenomena is to fall prey to the charisma

of clean, spare mathematical forms.

Hierarchy To sketch the argument for many-model thinking, we begin with a query from poet and dramatist T. S. Eliot: “Where is

the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?” To that we might add, where is the information we have lost in all this data?

Plato defined knowledge as justified true belief. More modern definitions refer to it as understandings of correlative, causal, and logical relationships. Knowledge organizes information. Knowledge often takes model form.

Atop the hierarchy lies wisdom, the ability to identify and apply relevant knowledge. Wisdom requires many-model thinking.

We will model people as either rule-based actors or rational actors. Within the set of rule-based actors, we consider those who act based on simple fixed rules and those who act based on adaptive rules.

The rational-actor model assumes that people make optimal choices given a payoff or utility function.

he relied on heuristics.

learning and higher stakes increase rationality has ample empirical and experimental support.6

Hyperbolic discounting has been put forward as a reason why people run up credit card debts, eat unhealthy foods, have unprotected sexual relations, and fail to save for retirement.

the zero intelligence rule, accepts any offer that produces a higher payoff. It never takes a stupid (i.e., utility-reducing) action.

A seller following a zero intelligence rule would randomly pick a price above her value. A buyer would purchase any good with a price below her value. When we encode those behaviors in a computer model, we find that in markets zero-intelligence traders produce nearly efficient outcomes.

adaptive rules require a utility or payoff function.

adaptive-rule models exhibit ecological rationality—better rules come to

predominate.

We have four options: equilibrium, cycles, randomness, or complexity.

if the model produces randomness at the macro level, the individuals probably cannot learn anything.

A similar logic applies to models that produce complex patterns. In these cases, we would assume that people continue to adapt new rules, but we would not necessarily assume that they can choose optimally.

confront complexity with an ensemble of simple rules.

models that produce cycles or equilibria create a stationary environment. We therefore might expect that people can learn—that no one would continually take a suboptimal action.

optimal behavior may be an unrealistic assumption, particularly in complex situations.

Given the uncertainties, we should err on the side of more models rather than fewer.

People are diverse, purposive, adaptive, biased, and socially influenced, and we possess a degree of agency.

Our aim is to construct many models that as an ensemble will be useful.

It tells us that when we add up or average random variables, we can expect to obtain a normal distribution.

normal distribution all look identical, with approximately 68% of all outcomes within one standard deviation of the mean, 95% of all outcomes within two standard deviations, and more than 99% lying within three standard deviations.

To produce a long-tailed distribution requires non-independence, often in the form of positive feedbacks.

A power-law distribution5 defined over the interval [xmin, ∞) can be written as follows: p(x) = Cx-a where the exponent a > 1 determines the length of the tail, and the constant term ensures the distribution has a total probability of one.

The special case of power laws with exponents equal to 2 are known as Zipf distributions.

preferential attachment model,

self-organized criticality model,

forest fire model,

The key assumptions for self-organization to critical states is that pressure increases smoothly, like water flowing into the lake, and that pressure decreases in bursts, including possibly large events.

Empirical studies show that social effects create bigger winners.

long-tailed distributions arise because of feedbacks and interdependencies.

The probability that someone writes a patent correlates with that person’s mathematical abilities.

an increase in opportunities creates an incentive for risk

functional relationships between variables. That relationship could be linear, concave, convex, or S-shaped, or it could include threshold effects.

To prove causality, we need to run an experiment where we manipulate the independent variable and see if the dependent variable changes.

pay for skill; do not pay for luck. Better-run corporations do in fact pay less for luck.

skill differences are small, and thus luck matters.

big-coefficient thinking builds in conservatism.

the big coefficient becomes smaller as we try to exploit it.

Big coefficients are good. Evidence-based action is wise, but we must also keep our eyes open to big new ideas as well.

Most phenomena of interest are not linear.

growth and positive feedbacks produce convexity

diminishing returns and negative feedbacks produce concavity.

increasing slope:

exponential growth model,

A value of a resource at time t, Vt, that has an initial value of V0 and grows at a rate R can be written as follows: Vt = V0(1 + R)t

half-life model,

If every H periods half of the remaining quantity decays, then after t periods the following holds: Proportion Remaining ≈

Our utility or value from almost all goods exhibits diminishing returns.

When we assume concavity, we imply a preference for diversity and risk aversion.

Convexity implies risk-loving:

Cobb-Douglas Model Given L workers and K units capital, the total output equals: Output = Constant · La K(1−a) where a is a real number between 0 and 1 capturing the relative importance of labor.

Simple Growth Model Production Function: O(t) = 100 Investment Rule: I(t) = s · O(t) Consumption-Investment Equation: O(t) = C(t) + I(t)

t) − d · M(t) O(t) = output, M(t) = machines, I(t) = investment, C(t) = consumption, s = savings rate, and d = depreciation rate

Solow* Growth Model Total output in the economy is given by the following equation: Output = A where L denotes the amount of labor, K denotes the amount of physical capital, and A represents the level of technology. The long-run equilibrium output, O∗, is given by the equation13

The strong center establishes property rights and rule of law. Pluralism prevents capture by the elite, who often prefer the status quo and may not embrace innovation, which can be destructive.

A central takeaway from this chapter is that intuition becomes insufficient once we include nonlinearities.

Without models, we can usually infer what goes up and what goes down, but we lack understanding of the shape of functional relationships.

As a result, we often make linear extrapolations—

Cooperative Games A cooperative game consists of a set of N players and a value function that assigns a value to any subset S ⊆ N, V(S). These subsets are called coalitions. The value of the coalition consisting of no players equals zero, = 0; the value of all N players, V(N), equals the total value of the game.

Shapley Value Given a cooperative game {N, V}, the Shapley value is defined as follows: let O represent all N! orderings in which the N players could arrive and be added to a group. For each ordering in O, define the added value of player i to be the change in the value function that occurs when player i is added. Player i’s Shapley value equals the average of her added values over all orderings in O.

Shapley Value: Axiomatic Basis The Shapley value uniquely satisfies the following axioms: Zero property: If a player’s added value equals zero for any coalition, the player’s value equals zero. Fairness/Symmetry: If two players have the same added value for any coalition, then those players have the same value. Full allocation: The sum of the values of the players equals the total value of the game, V(N). Additivity: Given two games defined over the same set of players with the value functions V and , the value of a player in the game (V + ) equals the sum of that player’s values in V and .

the probability of someone getting credit for an idea equals 1 divided by the number of people who propose the idea.

Adding extras makes existing members expendable and drives their LOTB values to zero. We see this in practice. Employers hire excess workers to reduce worker power. Manufacturing firms rely on multiple competing suppliers of intermediate goods. Governments award contracts to keep multiple contractors in business.

Network Statistics Degree: The number of neighbors (also the number of

edges) of a node. Path length: The minimum number of edges that must be traversed to get from one node to another. Betweenness: The number of paths of minimal length connecting two other nodes that pass through a node. Clustering coefficient: The percentage of a node’s pairs of neighbors that are also connected by a edge.

Monte Carlo Method for Random Networks To test whether a network with N nodes and E edges is random, we create a large number of random networks with N nodes and E edges and calculate distributions for degree, path length, clustering coefficient, and betweenness. We then perform standard statistical tests to accept or reject the hypothesis that the network’s statistics could have been drawn from the simulated distributions.4

Quality and Degree Network Formation Model Create d disconnected nodes. In each period t create a new node with quality Qt drawn from a distribution F. Connect that node to d other nodes based on the degree of those nodes. If Dit denotes the degree of node i at time t, the probability of choosing node i given N nodes equals:

Sorry, we’re unable to display this type of content.

The Friendship Paradox If any two nodes in a network differ in their degree, on average a node has lower degree than its neighbors. In other words, on average, people’s friends are more popular than they are.8

These random friends might also be thought of as weak ties—people who connect you to other communities of people. Our weak ties, the random friends in our network, play an important informational role by connecting communities with diverse interests and information. Hence, sociologists speak of the strength of weak ties.

Six Degrees of Separation Assume each node has 100 clique friends (C), all of whom are friends with one another, and 20 random friends (R), who have no friends in common with the node. Degree one: C + R = 120 Degree two: CR + RC + RR = 2000 + 2000 + 400 = 4400 Degree three: CRC + CRR + RCR + RRC + RRR = 328,000 Degree four: 17, 360,000 13 Degree five: > 1 billion Degree six: > 20 billion

The number of friends of degree three, their diversity, and their relative proximity make them an important asset. They can provide new information and job opportunities. These are the people most likely to help a person find a job, facilitate a move to a new city, or become a life or business partner.

Broadcast Model It+1 = It + Pbroad · St where Pbroad denotes the broadcast probability, and It and St equal the number informed and susceptible at time t.

Initially, I0 = 0 and S0 = NPOP.

Fitting the Broadcast Model to Data Period 1: I1 = 20, 000 = Pbroad · NPOP Period 2: I2 = 36, 000 = 20, 000 + Pbroad · (NPOP − 20, 000)

Solution:2 Pbroad = 0.2 and NPOP = 100, 000

Diffusion Model where Pdiffuse = Pspread · Pcontact.

Bass Model where Pbroad = probability of broadcast and Pdiffuse = probability of diffusion.

SIR model (susceptible, infected, recovered),

SIR Model where Pspread, Pcontact, and Precover equal the probability of spreading the disease, the probability of contact, and the probability of recovery.

disease with an R0 greater than 1 can spread through the population. Diseases with R0’s less than 1 dissipate.

Estimates of R0 do not assume that people change their behavior in response to a disease.

Information is the resolution of uncertainty.

Low entropy corresponds to low uncertainty and little information being revealed.

We can also use entropy to distinguish between the four classes of outcomes: equilibrium, periodicity, complexity, and randomness.

In the absence of a controlling or regulating force, some populations may drift toward maximal entropy.

Information Entropy Given a probability distribution (p1, p2,…pN), the information entropy, H2, equals: Note: the subscript 2 denotes the use of the base 2 logarithm.

Axiomatic Foundations: Entropy The above class of entropy measures uniquely satisfies the following four axioms: Symmetric, continuous function: H(σ()) = H() for any σ that permutes the probabilities. Maximization: H() is maximized at pi = for all N. Zero Property: H(1, 0, 0,…, 0) = 0. Decomposability: If

where and

Wolfram’s four classes: equilibrium, cyclic (periodic), random, and complex.

Equilibrium outcomes have no uncertainty, and therefore, have an entropy equal to zero.

Cyclic (or periodic) processes have low entropy that does not change with time, and perfectly random processes have maximal entropy.

Complexity has intermediate entropy—it lies between ordered and random.

Thus, if we are writing a model that assumes distribution of website hits or market shares, in the absence of data an exponential distribution is a natural assumption.

Maximal Entropy Distributions Uniform distribution: Maximizes entropy given a range, [a, b]. Exponential distribution: Maximizes entropy given a mean, μ. Normal distribution: Maximizes entropy given a mean, μ, and a variance, σ2.

maximizing entropy given constraints results in a normal distribution. So, when we see a normal distribution, it could be the result of entropy maximization.

The architect Christopher Alexander shows how geometric properties such as strong centers, thick boundaries, and non-separateness can produce complex, living buildings, neighborhoods, and cities.

We learn that one-dimensional and two-dimensional random walks return to their starting point infinitely often, while a three-dimensional random walk need not return home at all.

We also learn that the time between returns to zero for a one-dimensional random walk will follow a power-law distribution. This finding, which we might be tempted to dismiss as a mathematical curiosity, can explain the life spans of species and firms.

Bernoulli Urn Model Each period, a ball is randomly drawn from an urn containing G gray and W white balls. The outcome equals the ball’s color. The ball is returned to the urn prior to the next period’s draw. Let denote the proportion of gray balls. Given N draws, we can calculate the expected number of gray balls chosen, NG, and its standard deviation, σNG :

A Simple Random Walk where Vt denotes the value of the random walk at time t, V0 = 0 and R(-1, 1) is a random variable that is equally likely to equal -1 or 1. The expected value of a random walk in any period equals zero and has a standard deviation of , where t equals the number of periods.7

In one and two dimensions, a random walk returns to its origin infinitely many times. In three dimensions, it wanders off forever.

If the economy grew by 3% per year, in half a century, the economy would increase 4-fold.

What appears to be a trend might well be random.

The Polya Process An urn contains one white ball and one gray ball. Each period a ball is drawn randomly and returned to the urn along with an additional ball of the same color as the one drawn. The color of the ball drawn denotes the outcome.

The Balancing Process An urn contains one white ball and one gray ball. Each period a ball is drawn randomly and returned to the urn along with an additional ball of the color opposite to the color drawn. The color of the ball denotes the outcome.

The balancing process captures sequences of decisions or actions that include pressures toward equal allocation.

The local majority model always converges to an equilibrium, while the Game of Life, depending on its initial configuration, can produce any class of outcome: equilibria, cycles, complexity, or randomness.

Local Majority Model Each cell on a two-dimensional square grid is in one of two states: on or off. Each cell has eight neighbors (shown in the diagram below).2 In each period, a cell is chosen randomly.3 The cell changes its state if and only if five or more of its neighbors are in the other state.

Pure Coordination Games In a pure coordination game, each player chooses one of two actions, A or B. If both players choose the same action, each receives a payoff of 1. If they choose different actions, each receives a payoff of zero. Actions: A A: 1, 1

B: 0, 0 Actions: B A: 0, 0 B: 1, 1 A pure coordination game has two efficient equilibria: both players choose A or both players choose B. It also has an inefficient equilibrium, in which each player randomizes between A and B. We can reinterpret the local majority model with each cell being a player who must choose a common action to play against her eight neighbors. If players can change their action only when randomly activated, a player could increase her payoff by choosing the action that matches a majority of her neighbors’ actions. Such a strategy is called a myopic best response because it does not take into account the likely future actions of the neighbors. A player with five neighbors who have chosen B could increase her payoff in the short term by switching from A to B, but if the player and her neighbors are surrounded by a sea of other players choosing A, then she might have a higher expected payoff by staying with A. The key takeaway is that the behavioral rule in the local majority model, though an assumed rule, can be rooted in a game theoretic model.

The Paradox of Coordination

If people coordinate locally, then global configurations will be patchy and diverse.

The Game of Life Each cell on a dimensional square grid is either alive (on) or dead (off). Each cell’s neighbors consist of the eight adjacent cells on the grid. Cells update their states

synchronously using two rules: Life rule: A dead cell with exactly three live neighbors becomes alive. Death rule: A live cell with fewer than two or more than three live neighbors dies.

The beauty of mathematics only shows itself to more patient followers.

Lyapunov Theorem Given a discrete time dynamical system consisting of the transition rule xt+1 = G(xt), the real-valued function F(xt) is a Lyapunov function if F(xt) ≥ M for all xt and if there exists an A > 0 such that If F is a Lyapunov Function for G, then starting from any x0, there exists a t∗, such that G(xt∗) = xt∗, and the system attains an equilibrium in finite time.

Self-Organizing Activities Model A city offers A activities. Each day consists of L time periods. Each person in large population of size M chooses a routine, an order to participate in a set of L activities (out of a larger set of K possibilities) across the L time periods. A person’s congestion equals the number of other people who choose the same activities

as her at the same times.

Any Markov model with a finite set of states, fixed transition probabilities between them, the potential to move from any state to any other in a

series of transitions, and no fixed cycles between states converges to a unique equilibrium.

Perron-Frobenius Theorem A Markov process converges to a unique statistical equilibrium provided it satisfies four conditions: Finite set of states: S = {1, 2,…, K}. Fixed transition rule: The probabilities of moving between states are fixed, for example, the probability of transitioning from state A to state B equals P(A, B) in every period. Ergodicity (state accessibility): The system can get from any state to any other through a series of transitions.

Noncyclic: The system does not produce a deterministic cycle through a sequence of states.

That leaves the restriction to fixed transition probabilities between states as the assumption least likely to hold. Thus, the model says that when history matters, underlying structural forces must change transition probabilities (or change the set of states).

If you have low brand loyalty, you tend to have low sales.

If we think of PageRank as an algorithm, we realize that we can use it to produce rankings of any network.

modeler’s selection of the states proves critical. The choice of states determines the transition probabilities between those states.

A Markov decision model amends a Markov model by including actions.

Using systems dynamics models, we can often identify the causes of complexity. When a system includes both positive and negative feedbacks, it can produce complexity.

Lotka-Volterra Model An ecosystem consist of H hares and F foxes. The population of hares grows at rate g and the population of foxes dies off at rate d. When hares and foxes meet, hares die off at rate a and foxes grow at rate b. These assumptions produce the following differential equations:3 These equations have an extinction equilibrium (F = H = 0), as well as an interior equilibrium given by the equations and .

agent-based models—

In a threshold-based model, an individual takes one of two actions, depending on whether an aggregate variable exceeds a threshold.

The business succeeded because the founders were able to bootstrap a sufficient number of initial renters so that a double riot ensued. They constructed the tail, and the tail wagged the dog.

Schelling’s Party Model Each of N individuals has an observable type A or B. Each person randomly chooses one of two rooms. At each moment a person moves to the other room with probability p. Person i has a tolerance threshold Ti and leaves her room if the percentage of people in the room of her type falls below that threshold.

(i) women who exit a profession choose a new profession with more women, and (ii) women leave professions at a higher rate then men.

Schelling’s Segregation Model N individuals, each of whom has a type A or B, are randomly arranged on an M-by-M checkerboard with room for open spaces. Each person i has a tolerance threshold, Ti, and relocates to a random new location if the percentage of the people of her same type on the eight neighboring squares falls below her threshold.

Ping-Pong Model Each entity in a population of size N randomly takes an initial positive (+1) or a negative (-1) action. The initial state of the system, S0, is set equal to zero. All future states of the system, St, equal the average action plus a random variable: Each entity i has a response threshold Ti > 0 drawn uniformly from the interval [0, RANGE]. An entity takes the same action as before if the magnitude of the state, | St |, is less than its threshold and takes an action to reduce the magnitude of the state otherwise.

If | St |≤ Ti, Ai(t + 1) = Ai(t), otherwise Ai(t + 1) = -signSi(t) where εt is randomly drawn from {-1, +1}.

system with only positive feedbacks will either

blow up or collapse. A system with only negative feedbacks will either stabilize or cycle. A system with both positive feedbacks and negative feedbacks has the potential to produce complexity.

We assume that each person holding this asset has a crash threshold. If the price of the asset falls more than the crash threshold in a given day, the investor sells the asset, taking her money out of the market. This rule captures the behavior of trend or noise traders and creates a version of the riot model.

Spatial Competition Model An alternative consists of N spatial attributes: = (a1, a2,···, aN). An individual is represented by an ideal point: = (x1, x2,···, xN). The payoff (utility) to an individual from an alternative equals π(, ) = C − (x1 − a1)2 − (x2 − a2)2 −…− (xN − aN)2 where C > is a constant. Example: = (3, 4, 6), = (2, 1, 8), C = 20: π(, ) = 20 − (3 − 2)2 − (4 − 1)2 − (6 − 8)2 = 6

The multiple cut lines carve up the space of ideal points into three regions, known as Voronoi neighborhoods,

If we continue to apply this logic, we see that candidates should converge on that point. This result is known as the median voter theorem.

Hedonic Competition Model An alternative consists of N valence attributes: = (v1, v2,…, vN).

Individual preferences are captured by weights = (w1, w2,…, wN) assigned to the attributes. The payoff (utility) to an individual from an alternative equals π(, ) = w1 · v1 + w2 · v2 + ··· + wN · vN . Example: = (3, 1, 2), = (4, 2, 5): π(, ) = 4 · 3 + 2 · 1 + 5 · 2 = 24 .

The key will be to distinguish between crowded markets, with a large number of products in a low-dimensional attribute space, and a sparse market, where there are few competitors.

market competition creates an incentive for differentiation,

Hedonic attribute model: This model explains a good’s value based on intrinsic attributes.

Coordination model: This model explains prices as socially constructed.

Predictive models: This model explains prices as forecasts of future value.

A Nash equilibrium of a game is a pair of strategies such that each player’s strategy is optimal given the strategy of the other player.

Sports are zero-sum: one team (or player) wins and one loses.

Any non-randomness can be exploited.

iterative elimination of dominated strategies.

the unique Nash equilibrium.

backward induction: we start at the end nodes and choose the optimal action at each.

We may want to compete to try to deter entry in the other markets.

The Effort Game Each of N players chooses an effort level expressible in monetary terms to win a prize of value M. The probability that a player wins the prize equals her effort divided by the total effort of all players. If Ei equals the effort level of player i, her probability of winning is given by the following equation:1 Equilibrium:

We can see the effects on individual and total effort by increasing the number of players. Here, the findings are less intuitive. According to the model, individual players’ effort levels decrease but the total effort by all players increases. Thus, the model implies that efforts by organizers of research grant opportunities, architectural competitions, and essay contests to attract large numbers of entrants may, paradoxically, produce lower-quality winners because in the larger contests, participants have less incentive to put in effort.

By looking across time, we can discern if people change their behaviors to fit in with their friends (peer effect), or if they change their friends and retain their behaviors (sorting).

four mechanisms that enable cooperation: repetition, reputation, local clustering, and group selection.

The game can then be expressed with three variables: a reward, R, from cooperating, a temptation, T, to defect, and a sucker’s payoff, S, from being exploited

Repetition Maintains Cooperation In the repeated Prisoners’ Dilemma, Grim Trigger maintains cooperation if the probability of continued play, P, exceeds the ratio of the difference in the temptation payoff, T, and the reward payoff, R, to the temptation payoff:5

Each of these implications reveals an intuitive route to more cooperation: increase the reward, make continued interaction more probable, and reduce the temptation to defect.

We can extend the model to include a community of people who monitor the behavior of one another and punish people who deviate.

emergence or evolution of cooperation.

evolve is that the payoff from cooperating exceeds the payoff from defecting

To understand how bootstrapping could occur, we need more elaborate models that allow for local learning, evolution, and group selection. We turn to those now.

Cooperative Action Model A population of N individuals consists of cooperators and defectors connected in a network. Cooperation incurs a cost C and produces a benefit B to the other player for each interaction. Defecting produces no cost or benefit. The ratio of cooperative advantage, , captures the potential gains from cooperation.

Clustering Bootstraps Cooperation If the neighbors of an open node include a cooperator of degree D with K cooperating neighbors and all non-cooperators of the empty node have no cooperating neighbors, then the open node becomes a cooperator if and only if the ratio of cooperative advantage exceeds the ratio of the degree to the number of cooperators:13

individual selection favors defection but group selection favors cooperation.

And an individual working at a firm may fare worse by building talents useful only to her current employer, yet if she does, her firm will be able to outcompete others.

Splitting employees into teams that compete against one another and allocating bonuses and opportunities based on team performance creates the possibility for inducing cooperative behavior.

To avoid endless punishments following a mistake, other strategies—such as Win Stay, Lose Shift—are more forgiving.

as we study cooperation, we should keep in mind that it need not be for the common good.

The provision of public education, physical and mental health care, infrastructure, public safety, a justice system, and national defense are all collective action problems, as are managing global fisheries, combating climate change, and in particular reducing the amount of carbon in the atmosphere.

A Collective Action Problem In a collective action problem, each of N individuals chooses to free ride (f) or contribute (c) to a collective action. An individual’s payoff depends on her own action and the total number of cooperators. Individuals receive a higher payoff from free riding, Payoff(f, C) > Payoff(c, C +1), but the sum of payoffs is maximized when everyone contributes.

A Public Good Provision Problem N people each allocate an income I > N between a public good (PUBLIC) and a private good (PRIVATE) that each cost $1 per unit. Each person has the following utility function: Socially optimal allocation: PUBLIC = N (if N = 100, each person contributes $100). Equilibrium allocation: PUBLIC = (if N = 100, each person contributes $0.01).4

Public Good Provision Among Altruists N people have altruistic preferences with weight α on aggregate utility: Equilibrium pure altruists (α = 1): PUBLIC = N Equilibrium general solution:6 PUBLIC = Example: α = : PUBLIC ≈

A Congestion Model M of N people choose to use a resource. Their utility can be written as follows: Utility(M) = B − θ · M where B denotes the maximal benefit, and θ is a congestion parameter. The remaining (N − M) people abstain and receive utility of zero.9 Socially optimal: M = Utility = Nash equilibrium: M = Utility() = 0

Multiple Congestible Goods M people go to Park 1 and (N − M) go to Park 2. To account for Park 2 being larger, utilities are as given below:10 Park 1: Utility(M) = N − M

Park 2: Utility(N − M) = 3N − 3 · (N − M) Socially optimal: M = creating total utility N2 Nash equilibrium: M = creating total utility

Renewable Resource Extraction Model Let R(t) denote the amount of a renewable resource at the start of period t. Let C(t) equal the total amount consumed in period t, and g denote the growth rate of the resource. The amount of the resource in period t + 1 is given by the following difference equation: 11

R(t + 1) = (1 + g)[R(t) − C(t)] The equilibrium consumption level:

Pareto Efficiency Within a set of outcomes, an outcome is Pareto dominated if there exists an alternative that everyone prefers. All other outcomes are Pareto efficient.2

Revenue Equivalence Theorem Any auction in which the bidders have independent private values drawn from a known, common distribution produces the same revenue to the seller and the same expected payoffs to the buyers if each bidder makes a bid that maximizes her expected payoff, the bidder with the highest bid always wins the object, and a bidder who has a value of zero has an expected payoff of zero.

A Public Project Decision Problem Let (V1, V2,···, VN) denote the monetary values that N people attach to a public project with cost C. The project should be undertaken if and only if C < V1 + V2 + ··· + VN

Majority-Vote Equal Sharing Individuals vote for or against undertaking the project. If a majority vote for the project, the project is undertaken and each pays a cost . As the following example shows, this mechanism can violate efficiency and voluntary participation.

The Pivot Mechanism

Individual i submits a valuation for a project of cost C. If the sum of the individual valuations exceeds the cost, then the project is undertaken. Individual i pays no tax if and otherwise. The mechanism is incentive compatible (), efficient, and individually rational. It also implements the efficient outcome in dominant strategies. As the following example shows, this mechanism can violate budget balance: Example: (V1, V2, V3) = (60, 120, 150) and C = 300. The project should be undertaken given that 300 < 60 + 120 + 150. Individual 1 pays taxes of 30, the cost minus the sum of the other valuations (300 − 270); individual 2 pays taxes of 90; and individual 3 pays taxes of 120. The total taxes generated the sum of 240, less than the cost of the project.

Honest people don’t hide their deeds.

with his development of the concept of conspicuous consumption: he observed that rather than buy goods that bring direct enjoyment or practical utility, people often make choices to signal their social status.

For signals to function, they must be costly or verifiable.

Discrete Signals Model A population of size N consists of S strong types and W weak types whose costs of sending a signal are c and C

respectively, with c < C. Those members of the population who send the signal equally divide a benefit of B > 0. The model has three possible outcomes: Pooling : Both types signal. Separating (c < and < C): Strong types signal. Partial pooling (c < < C < ): Strong types and a fraction of the weak types signal.

Separation with Continuous Signals A population of size N consists of S strong types and W weak types with per-unit costs of signaling c and C > c, respectively. The individuals who send the largest signal split a benefit B. Any signal of magnitude separates the strong types. If CW ≥ cN, then all of the strong types separate. If not, a partial pooling equilibrium exists in which a portion of the strong types signal.2

We take actions to signal our fitness, wealth, intelligence, and generosity.

Spending money to signal product quality is sometimes referred to as burning money. Burning money attracts buyers much as the peacock’s feathers attract mates.

In a game, an action’s payoff depends on the action of the other player or players. In that setting, both learning rules favor risk-averse equilibrium outcomes over efficient ones.

reinforcement learning

That extra capacity allows humans to consider counterfactuals when learning, a phenomenon left out of the reinforcement learning model.

learned faster when he increased the reward. He called this the law of effect.2 This finding has a neurological explanation. Repetition of an activity builds neurological pathways that induce that same behavior in the future.

rewards that far exceeded past or expected outcomes, produced faster learning in people,

A Reinforcement Learning Model A collection of alternatives {A, B, C, D,…, N} have associated rewards {π(A), π(B), π(C), π(D),…,π(N)} and a set of strictly positive weights {w(A), w(B), w(C), w(D),…,w(N)}. The probability of choosing K is as follows: After choosing K, w(K) increases by γ · P(K) · (π(K) − A), where γ > 0 equals the rate of adjustment and < maxKπ(K) equals the aspiration level.5

Reinforcement Learning Works In the learning-the-best-alternative framework, reinforcement learning with the aspiration level set equal to the average earned reward (eventually) almost always selects the best alternative.

The most widely studied model of social learning, replicator dynamics, assumes that the probability of taking an action depends on the product of its reward and its popularity.

Replicator Dynamics A collection of alternatives {A, B, C, D,…, N} have associated rewards {π(A), π(B), π(C), π(D),…, π(N)}. The actions of a population at time t can be written as a probability distribution across the N alternatives: (Pt(A), Pt(B),…, Pt(N)). The probability distribution changes according to the replicator equation: where equals the average reward in period t.

replicator dynamics produce less path dependence than

reinforcement learning.

Replicator Dynamics Learns the Best In learning the best from a finite set of alternatives, replicator dynamics with an infinite population converges to the entire population choosing the best alternative.

The Spiteful Man and the Magic Lamp A spiteful man finds a bronze lamp while on an archeological expedition. He rubs the lamp and a genie appears. The genie proclaims, “I will grant you one wish for anything that you desire, and because I am a benevolent genie, I will give everyone you know double what I give you.” The man ponders the proposition, grabs a stick, and says, “Poke out one of my eyes.”

The Generous/Spiteful Game

Each of N players chooses to be generous G or spiteful S. Payoff(G, NG) = 1 + 2 · NG Payoff(S, NG) = 2 + 2 · NG

Individual learning leads people to choose the better action, so people learn a dominant action if one exists. Social learning leads people to choose actions that perform well relative to other actions.

In bandit problems, rewards from alternatives are distributions rather than fixed amounts.

Bernoulli Bandit Problems Each of a collection of alternatives {A, B, C, D,…, N} has an unknown probability of producing a successful outcome, {pA, pB, pC, pD,…, pN}. In each period, the decision-maker chooses an alternative, K, and receives a successful outcome with probability pK.

Bayesian Multi-Armed Bandit Problems A collection of alternatives {A, B, C, D,…, N} have associated reward distributions {f (A), f (B), f (C), f (D),…, f (N)}. The decision-maker has prior beliefs over each distribution. In each period, the decision-maker chooses an alternative, receives a reward, and calculates new beliefs based on the reward.

The Gittins Index: Example To show how to compute Gittins indices, we consider

the following example with two alternatives. Alternative A produces a certain reward in {0, 80} with 0 and 80 equally likely. Alternative B produces a certain reward in {0, 60, 120} with each equally likely. We assume that the decision-maker wants to maximize reward over ten periods.

NK Model An object consists of N bits, s ∈ {0, 1}N. The value of an object is V (s) = Vk1(s1, {s1k}) + Vk2(s2, {s2k}) + ··· + Vk2(s2, {s2k}) where {sik} equals a randomly selected set of k bits other than i, and Vk1(s1, {s1k}) is a random number drawn from the interval [0, 1]. K = 0: Results in a linear function of the bits. K = N − 1: Any bit change produces a new random contribution from each bit.

A Model of Opioid Approval Multi-Armed Bandit Model To demonstrate their efficacy, opioids were tested against placebos. In clinical trials, patients were randomly assigned to take either opioids or a placebo. The assignment of the opioid can be modeled as one arm of a two-armed bandit and the placebo as the other arm. At the end of treatment, each trial is classified as a success or a failure. Clinical tests found that patients who received opioids experienced (statistically) significantly less pain. Tests on patients who had hip replacements, dental surgery, and cancer treatments all found that opioids outperform placebos.

Transition-to-Addiction Model Markov Model A three-state Markov model reveals a nonlinear relationship between transitions to addiction and overall addiction rates. The model’s states represent people not

in pain, people using opioids, and addicts. We estimate the transition probabilities between those states, which we represent as arrows. The model on the left assumes that 1% of of people who use opioids become addicted and that 10% of addicts revert to the no-pain state. It also assumes that 20% of the people in the no-pain state become opioid users. In equilibrium, only 2.2% of the population are addicts. To account for longer prescriptions, the model on the left assumes that 2.5% of people who use opioids become addicted and that 5% of addicts revert to the no-pain state. It also assumes that 20% of the people in the no-pain state become opioid users. Now, in equilibrium, 10% of the population are addicts.1

Paths to Heroin Addiction Systems Dynamics Model A population of people in pain produces opioid users and addicts. People on opioids flow into the no-pain state and also flow into the addict state. Addicts, in turn, can become heroin users. One reason that people use heroin is that they can no longer get opioids. Thus, as the flow of opioids increases, so does the number of heroin users.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search