Transcript of an Economic Forum: Anticipating Crises: Model Behavior or Stampeding Herds
Washington, D.C.
November 1, 2001
Anticipating Crises: Model Behavior or Stampeding Herds
International Monetary Fund
IMF Meeting Hall
Thursday, November 1, 2001
Washington, D.C.
P a n e l i s t s
Carmen Reinhart, Senior Policy Advisor, IMF Research Department (Moderator)
Peter Garber, Global Strategist, Deutsche Bank
Kristin Forbes, MIT and U.S. Treasury
Eduardo Borensztein, Chief, Strategic Issues Division, IMF Research Department
DR. REINHART: Predicting crises through early warning systems is something that is intellectually challenging, to put it mildly. Policy makers are interested in such systems because they want to help their own country, or countries, that they have surveillance responsibilities for, and the financial markets are interested in them because there's always a good chance to make a big buck.
We have three distinguished speakers with us today to discuss early warning systems: Peter Garber from Deutsche Bank and formerly of Brown University. He's also had longstanding association with the International Monetary Fund in the Capital Markets Unit; Kristin Forbes, who is currently at the U.S. Treasury and is a professor at MIT's Sloan School of Business; and our very own Eduardo Borensztein, who has spearheaded the early warning research here at the IMF.
Each participant will have 15 minutes. Peter Garber is known for going over his time limit, so please feel free to throw paper clips, pens, pencils, et cetera, when Peter hits his 15-minute mark. There is no other way to get him to stop.
DR. GARBER: And expect me to throw them back at you too.
[Laughter.]
DR. GARBER: I'll talk today about the early warning model that Deutsche Bank has developed. It's an example of the type of model you see created by financial institutions. There are several of them that are out there. It's a subcategory of the general kinds of models which looking for crises per se. Our special definition of crisis is one that includes price movements or asset price movements, in particular, exchange rates and movements of a particular size. The interest in the financial markets, of course, is to try to forecast big market movements to cover event risks and market risks. An event risk is generally a large price movement in a foreign exchange market, which can create a very large loss for an investment house or an investor if it somehow is unrecognized or unforeseen.
Our model is called the"Alarm Clock," and it's a 1-month ahead forecasting model for large-size events in foreign exchange and in domestic local currency short-term interest rates. There's generally a trade-off between an exchange rate event and an interest rate event because interest rates are used to defend currencies.
So, although you might have a crisis, the outcome of that crisis is uncertain and one's behavior during that crisis depends on whether you think the crisis will result in a big exchange rate devaluation or an interest rate defense, which preserves the exchange rate, but increases domestic interest rates.
Whether you want to be in the country or outside it hinges strongly on which outcome you think is going to happen, even though there's a crisis that you might regard as an opportunity and rush money in, instead of rushing money out.
This particular model was technically developed by Mazen Soueid who was a graduate student of mine. He developed it for us a couple of years ago, but he has since come to the Fund. He has no relationship to it now. It is now being updated and further developed by Robin Lumsdaine, who is our head of Flow Research at Deutsche Bank.
So what is the "Alarm Clock" model? It's a short-term, as I said, 1-month ahead forecasting model. Pardon the dimness of this.transparency, but I don't have slides. It's a cross-sectional time series panel data set of 19 emerging market countries, with the time series starting, at the earliest, in 1985. It's only for emerging markets. There's no industrial countries included in the model. It's focus is on those emerging markets that are considered to be the most important in the financial markets.
We assume one global model for all of the countries, with some specializations for particular regions, rather than separate country models. The reason for combining all of the countries into a single model is that there just aren't that many events, big movements in the exchange rates, although one has the feeling that they're happening every week. In the sample of monthly evaluations, they just don't occur that often, which spares of us from having to address statistical problems in single countries.
The particular technical twist that we add is that we use a two-equation system simultaneously estimating the event probabilities in exchange rates and local interest rates. I won't get into the technical details here.
The countries that are included are contained in this list, so it's fairly spread out across the different emerging market regions; emerging Asia, Latin America and Europe. Some of the countries hardly are emerging market countries any more, such as the Czech Republic. Singapore is hard to keep in there, except that it's economy really depends strongly, importantly, on countries that are themselves emerging market countries. Some of the countries we've just recently added were left out of our original estimation procedures to serve as out-of-sample cross-checks on the forecasting power of the models.
As far as how we define events, we have different models that we estimate for different definitions of an event size. For exchange rate, an event can be anything, a 5-percent or more devaluation in a month, 10-percent or more devaluation in a month, 15 and so on.
And for each definition of an event, we have a set of separate model estimations, and one can focus on a particular event size if one wishes. We don't really advertise one event size over another. Ten percent is a usual size of event that people are interested in. Our own risk management team defines a 10-percent or more movement in the exchange rate in a month as an event, and that is what we use to calculate capital to hold against possible events.
The interest rate event is defined as a 500-basis point, as a 5-percentage point increase within a month in the short-term domestic interest rate, or a thousand for Turkey because it is a high-inflation country, and therefore 5-percent movements in Turkey's exchange rate can occur on a daily basis during this period.
What distinguishes the Alarm Clock from other academic and commercial models is its exclusive focus on emerging markets. We have a fairly short data set. As I say, we calculate probabilities over a range of possible event sizes, estimate exchange in interest rate events simultaneously;in addition, our model will produce a set of probabilities of events in the next month, but those probabilities need to be interpreted, or converted, into actions. In other words: do you want to go into the currency, or get out of it. So we overlay on the basic probabilities a set of action triggers to determine whether or not a probability is high enough so that one would want to get out of the currency.
The variables that go into our models consist of widely available macro data. For the exchange rate model these date consist of changes in domestic credit, changes in industrial production, changes in the stock market over the last month, the level of the real exchange rate relative to some norm, whether or not the currency is overvalued or not, some contagion variables and regional effects. Plus whether or not there's a simultaneous money market event or currency interest rate event in the particular country. Into this black box called—I guess it's a white box—an exchange rate event goes those data that yields an estimated probability forecast. We employ somewhat different variables for an interest rate event.
So the output equals a set of probabilities, but then we also have to worry about how those probabilities are interpreted, which requires the choice of action triggers. The way we do this is to consider an investment strategy or trading strategy of a particular sort. We assume that you're either going to put, say, a million dollars in or out of a particular currency, and we do that for all 19 currencies, and then we choose, for all of those countries, a particular trigger size such that we maximize over the sample the profit and loss that one would earn over the cost of funds.
An alternative method we use in the interest rate equation is to have a trade-off between the Type I and Type II errors that emerge from the selection of an action. A Type I error occurs when the model calls for an event, and it doesn't occur. There's a loss there. Basically, since the interest spreads are usually positive, there's a cost of carry spread for being short. Type II errors occur when you call a nonevent to occur, and yet you get a big event, such as a big depreciation. If you call for a nonevent, then you're going into the country to get a relatively high interest rate, but you get killed because the exchange rate suddenly depreciates at 25 percent that month.
Generally, for the exchange rate models we choose a trigger to maximize profit and loss over the sample.
Now there are two types of strategies that one might assume: A long versus a short strategy. In the first case, if you like the country you're in, if you don't like the country, you're not only out, but you're short. You've borrowed the currency and sold it. The other is a long out, that is, if you like the country, you're in, but otherwise you don't short the country, you just stay out.
For each possible event trigger for the entire sample,one can look at a sequence of possible event triggers, and that is what we list along the horizontal axis here. For each trigger level, you can actually calculate in the sample what you would have made had you used that trigger to decide whether to get in or get out of the currency; you can calculate what your profit and loss would be. If we select, for example, a 10-percent devaluation level, we would choose an event trigger of about 2.1 percent. That basically maximizes the in-sample profit and loss.
So for each possible devaluation level, we select an event trigger, and then we have a calculation of a P&L for each one, the two different strategies. There's five different possible devaluation levels that one might be interested in, in deciding whether to be in or out, and these often will produce contradictory signals. What we generally recommend to our clients is to settle with one of them that you're most comfortable with and that's the one you work with; or maybe you would prefer a range consisting of a majority voting rule, where three of them are saying get out and two of them are saying stay in, and therefore you get out. The choice of a trigger is relatively arbitrary; the choice, in other words entirely depends on the risk profile of a client is quite aggressive or quite conservative.
In our monthly publication we produce a set of probabilities, say, for Mexico, for different levels of exchange rate movements, along with the event, the probability triggers that we have calculated for all of the countries. In the particular case of Mexico we note that it comes in above the action triggers for all of the devaluation levels, except for the 5-percent level, so this is really a negative view for Mexico.
Russia, on the other hand, is consistently below the trigger level, so this would be a message that says stay in the ruble, regardless of what's happening on the borders of Russia.
In addition, we produce, just for information, the historical probability series for particular-size events. Consider, for example, Venezuela, which we always have coming in below the trigger level at the 5-percent devaluation level, and almost always coming in above the trigger level at the 25-percent level. What this says, in general, is: Venezuela isn't going to go, but when it goes, it's going to go in a big way.
Note that the 25-percent devaluation probabilities that we're calculating here are relatively small. They're on the order of 1-percent probability or, maybe for the big ones, a 7- or 8-percent probability of a big 25-percent devaluation over the next month.
The reason why such calculations are useful is that, even though you don't get a big event, you might get a relatively large depreciation of the currency, even though it's not of event size, over the course of a month, so that if you were short the currency, you still would make a fairly substantial profit, and that's what happens when you're looking at a Profit and Loss criteria.
Performance, I'll just look at the in-sample performance, as you might expect, since you're working at in-sample to do well, although it's not quite a data-monitoring exercise, what's interesting is the out-of-sample performance of this. This went live in January of 2000. So, from that point on, we were producing 1-month-ahead probabilities of events in these countries. The model itself is reestimated. The parameters are reestimated every 6 months and occasionally a major change. At that time, any major changes are implemented. So for every 6 months, there's kind of a new model that's making the live forecast across a sequence of different model parameters.
Let's focus at the 10-percent level. Basically, the model—there were 360 country months, 360 observations, 19 countries, although not all of the countries were in the sample at the beginning. but somewhere around 19 countries, and there were 20 months. So it's a little less than 400.
Of those countries at the 10-percent level, this model, given the event triggers that we chose, called for 48 events and 321 nonevents. Now how many of those happened? Of those events that the model predicted, six happened. This was a relatively tranquil period compared to earlier ones, and there were 363 nonevents. So, basically, this model is overcalling, over predicting events, obviously, 48 to 6.
One frequently asked question is: of the ones that you called, of the events that actually occurred, how many of them did you call? So that's where these Type I and Type II errors come in. If you called an event, and it did not occur, that is called a Type I error. So we called 48 events and 6 occurred. I think we've got a wrong number here. Oh, of the events that we called, how many of the events that we called turned out to be nonevents? Obviously, 42 did here. So it should be a much lower Type II error, one minus Type II. So this says that we've got a Type II error of 67 percent. So I guess that's in the ballpark, that's the right order of magnitude.
Now for a Type II error of the nonevents that we called, we might also want to know how of them how many of them turned out to be events, and here we got 13 percent of those that were in error. So the one minus Type II is 87 percent. In the in-sample estimations, you get a much closer relationship between Type II and Type I errors. This is a criterion that is often used, but again we focus on the profit and loss that one would earn. And the Profit and Loss that we earn from these calls in the investment strategy that we implemented was about the same order of magnitude that we had on the in-sample. So we view this as a reasonably good performance for this type of model, especially since it's just a one-month ahead model, and it's difficult to catch the timing.
Now the caveats in using this particular model are that different models are used to estimate each devaluation level. Although we produce a picture of probabilities of each devaluation level, and it almost looks like a probability density function, it is not, because you can actually get situations where the probability density function starts to turn up. So that although you calculate a particular probability of a 5-percent devaluation or 5-percent or higher devaluation, you should, under the logic of probability, calculate a lower probability of a 25-percent or higher devaluation. In fact, sometimes you can calculate a probability of a 25-percent or higher devaluation that's higher than the probability of a 5-percent or higher devaluation, though that happens very rarely.
Equity markets are treated as exogenous or predetermined here, and it's desirable to actually forecast equity markets. The model does not include variables to indicate political events, although equity markets themselves often capture political events, and I guess I'll stop there. I've mentioned the data frequency and revision periods.
DR. REINHART: Well, I hope you're impressed with my forecasting accuracy. You see he did go over the 15 minutes, but not by much.
DR. GARBER: I'm glad that you were threatened by my retaliatory threats.
DR. REINHART: Kristin Forbes will speak next.
DR. FORBES: [In progress.] —is U.S. Treasury, and that is on purpose. These are my personal views. These do not represent the opinions of U.S. Treasury in any way.
Having said that, my comments are going to come from a very different angle than Peter's or any of our other panelists. I'm not going to talk about what goes into the models, what variables should be included or what econometric techniques should be used to estimate the models. Instead, I want to just focus on what these models tell us, what countries do they predict are vulnerable, do they give us a consistent message, and is the message they give us useful as policy makers in terms of assessing country vulnerability.
More specifically, I will talk about three main points. First, briefly mention three private-sector models that will form the basis of my analysis; second, look at the results of the models and analyze how correlated the results are, do these different models give us a consistent message; and then, finally, what is the warning these models predict? These are early warning models. What do they warn us of and, finally, some conclusions.
First, the models that will form the basis of the rest of my comments. I'm going to focus on three private-sector models. These are three publicly available models, so no one will get into confidentiality problems, three models again that are accessible to all of us. These are the Credit Suisse First Boston Emerging Markets Risk Indicator Model or EMRI, the Deutsche Bank Alarm Clock, which Peter just told us about in detail, and the Goldman Sachs Watch model, GS Watch.
Each of these models covers a different sample of countries. There are 16 countries that are covered in all of the three models, and I'm going to focus all of my results on this sample of 16 countries that are covered in each of the models. And, again, this is the list of the countries, and it includes most of our at-risk countries, a lot of the countries a lot of us are following right now, and most of the major emerging markets.
I am also going to focus my discussion on recent results. I'm not going to look at the historical performance of these models during, say, the Asian crisis, the Russian crisis. I just want to focus on how good these models are today, and specifically from January 2001 through October 2001.
So what I wanted to do first was look at how correlated the results are. Do these models give us some sort of similar message, even though they're each constructed slightly differently and have different definitions, et cetera, et cetera.
So, first, what I did, I took all of the models and looked at what countries they predicted out of that sample of 16 countries were most vulnerable in each given month in 2001. This list has, obviously on the left axis, each of the months in 2001 at the top, each of the three models, and then each country listed in that chart is the model predicted as the most vulnerable country during that time period.
If you glance at it quickly, you think there's a lot of overlap. You're seeing a lot of the same countries over and over. So that's a good thing. They're sending a consistent message. But if you look at it more closely, you'll realize most of the overlap is going down in the columns and not across the columns. For example, some models consistently predict that certain countries will have a crisis. For example, CSFB model predicts that Taiwan and Poland are more vulnerable, and the most vulnerable basically switches between those two countries. GS Watch has focused on Poland, Turkey, and then Venezuela is the most vulnerable countries.
That is positive in the sense that at least the models give a consistent message month to month, but then again look across the columns, and there's not a lot of commonalities. There's only a few cases where two models, even just two of the three, will predict the same country is the most vulnerable in any given month. For example, in March, two of the three models do predict that Poland is the most vulnerable. In April, two of the three models predict that Turkey is the most vulnerable. But other than a few overlaps like that, there, again, is not a lot of consistency. There's no month when, say, all three models say Turkey is going to have a crisis.
Next, I did the same sort of analysis for which country is least vulnerable out of the 16 in the sample, and here results are even less consistent across the sample. Again, you do get consistency within each model, for example, CSFB. Obviously, Russia is the least vulnerable every month in the sample, against some commonalities. Say, GS Watch focuses on Philippines and Peru, South Africa is least vulnerable. But, again, look across the columns for any given month. There is not one case where the same country comes up as least vulnerable in all three models.
A couple times you get agreement such as in February, Russia comes up as least vulnerable in two of the three models. But other than that, again, there aren't a lot of common predictions about which countries are the least vulnerable.
So, after looking at those charts, I was a little pessimistic. These models aren't sending out a consistent message on what country is the most vulnerable or the least vulnerable. But to be fair, I'm just looking at the one highest prediction and one lowest prediction according to these models, and that's a tough criteria. What if one model, say, CSFB says Russia is the least vulnerable, and maybe the DBAC says Russia is the second least vulnerable, that would still be sending out a pretty consistent message, but wouldn't show up in this sort of comparison.
So what I did next was took, again, each of these models each month and split their predictions into groups, three groups. One group is the most vulnerable countries, a quarter of the sample are four countries that are in the most vulnerable category; then a middle class, which are the countries that have moderate vulnerability; and then a lowest group, the four countries or quarter of the sample that has lowest vulnerability.
And then I wanted to compare across groups. Is, say, Turkey in the highest, most-vulnerable group across all three models? So even if it's ranked number one most vulnerable in one model, two most vulnerable in the next model, and three most vulnerable in the next model, as long as Turkey is in that top group, there would be some consistency across model predictions. So a slightly less-stringent test than in the last one.
When I did this vulnerability overlap comparison, the results weren't much better. Actually, these are reversed. First, starting with the left column, that's the overlap in the bottom group, overlap in the least-vulnerable groups. And what the number says is, say, in January, is there even one country that is in the low-vulnerability group across all three models? For example, is Russia in the low-vulnerability group across all three models? And during every month there is not a single country that falls in the low-vulnerability class across all three models.
When you do the same analysis for the top-vulnerability groups, the most-vulnerable countries, you get a little bit of overlap. In February and March, Poland is in the most-vulnerable category, the top quarter of the sample for all three models; Turkey, this is reassuring, is in the most-vulnerable category in all three models in April; Argentina, most-vulnerable category for all three models in July. But other than those four examples, there is not any other example where there's one country that's even in the top group. We're not even talking one or two, but the two quarter of the sample in all three models. So, again, not a lot of consistency across models, even in this less-stringent test.
One more try to try to get some consistency out of these models and see if they're giving us a somewhat consistent message. Instead of ranking predictions of vulnerability on an ordinal basis—one, two, three or four—or even by these rough groups of most vulnerable, least vulnerable or moderate vulnerability, what I did next was take country scores, predictions of risk and just see if these numerical predictions are correlated.
That took a little bit of juggling because the different models have different rankings and different ways of calculating their probabilities. So what we did is converted all of the model predictions about vulnerability into a scaled index where, for each model, the output is put into an index where 50 is the mean, it's a normalized distribution and about 99 percent of the values are in between zero and 100, so basically an index from zero to 100 is a way of thinking about it.
And then you could compare the output of each of the models and calculated correlations. How correlated are the output results of these different models. If you do this for some months, say, April 2001, it starts to look like there is some correlation in the models, though not an awful lot. Deutsche Bank and the CFSB model are 36-percent correlation in their numerical ratings of country vulnerability. Goldman Sachs, since CFSB is not quite as good, 25-percent correlation in what they're predicting. Again, not super high, but at least they're positive. But then I performed the same sort of comparison correlations across the different models in May, and now they're all negative. There's not even a positive correlation between any two of the three models, in terms of what they're predicting for country vulnerability.
Do the same thing for all of the months in 2001, and the correlations are somewhere in between. Zero correlation between what the Deutsche Bank model and CFSB model are predicting, in terms of country vulnerability; positive, but still pretty low correlations between some of the other models.
So, to sum up all of that, I'll admit that this is not a very positive outlook. Maybe one of these models is very good, and the others aren't good. I'm not saying they're all bad, but they are definitely not sending a consistent message. They're sending very different signals in terms of what countries are vulnerable to a crisis.
For the last part of my comments, I want to talk about what these models are warning us about. They're early warning models. So, to do that, here is a table that again has the scaled indices. A higher number is more vulnerable for each of the models, for every other month, through 2001. To the right is just the average of the three models.
If you look at just general trends in terms of vulnerability across these different months, average vulnerability of all 16 countries for each of these models and each of these months, you don't see an awful lot of patterns. For example, look at the CFSB model. February, vulnerability yields a rather low of 45. By August, vulnerability has jumped up to 57, saying that the average vulnerability of emerging has increased. But then look at Goldman Sachs, the GS. Vulnerability in February was 62, much higher, and it's fallen as of August, down to 51. So vulnerability increased from February to August in the CFSB model and decreased in the Goldman Sachs model.
But then what I found most confusing when I first looked at these numbers, was comparing the August and October results for vulnerability. August was when global growth was slowing down a bit, but emerging markets weren't looking too bad, export growth had slowed somewhat, but for the global economy, no one was sure if there'd be a short recession, a long recession or if growth would pick up pretty quickly.
And then came September 11, which changed the world and our predictions about global growth. Global growth is now expected to be much slower. Export growth has slowed dramatically in a number of emerging markets. Risk spreads have increased dramatically. Many countries are paying much higher interest rates on debt. I would think that country vulnerability has increased substantially, especially emerging market vulnerability has increased between August and October.
But then the one clear message these models are sending out, the one thing they finally agree on is that vulnerability has decreased. In each model, vulnerability in August was higher than in October. The mean also has fallen. In some cases, it isn't a big fall, but I would have a priori expected a large jump in vulnerability as of October, and instead you're seeing the opposite, a drop in vulnerability.
So I found that initially confusing, counterintuitive. So I looked at a little more closely at the countries and what these models are predicting for different countries. This is the average output for each of these countries, as of August and October for all three models. So, again, just an average of the scaled scores for the models and then the change on the far right corner.
As you can see for Argentina, vulnerability fell, according to these models—probably not what most of us would have guessed ahead of time. Brazil, vulnerability has fallen a little bit. For Mexico, Korea, Taiwan, countries that have been affected significantly by the slowdown in U.S. growth, exports have fallen substantially; these countries are all less vulnerable now despite that.
Turkey, one country where vulnerability has gone up, but just by a very small amount, which is again somewhat counterintuitive. Venezuela is another example of a country highly dependent on oil exports. Oil prices have fallen quite a bit since August, and yet vulnerability has fallen according to Venezuela.
But if you look at these more closely and think about what the models were designed for, these indicators actually aren't that bad. These models were designed to predict depreciations. They're designed to predict currency crises, which are often defined as mildly as a 5-percent depreciation of the currency. In between August and October, many of these currencies, obviously not Argentina, but many of the others, have depreciated quite a bit.
So, as of August, there probably was a decent probability of a depreciation in some of these countries. So these models were warning that there is a significant probability of a 5-percent depreciation. Since then, these currencies have depreciated, in many cases, by 5 percent or more, and so there is less likelihood of a depreciation in the future. So, in that sense, these models aren't really that bad. They're doing what they were designed to do. They're predicting currency depreciations.
But that leads to my more fundamental concern with these models, is what are they really warning us about? Again, they're fairly good at what they were designed for, for predicting currency crises and depreciations. In an era of fixed exchange rates, that was important because a depreciation in an era of fixed exchange rates generally meant a major currency crisis, often tied to a banking crisis, balance of payments crisis, et cetera.
But now, in this era where most countries have flexible exchange rates, and granted they undoubtedly intervene, and there's a fear of floating and all of that, but countries do all of that, but countries do allow their exchange rates to adjust to pressure. So is predicting an exchange rate of depreciation really what we are interested in as policy makers? I'd almost see a depreciation as a healthy sign, a country is adjusting, exports will increase. They're adjusting to external imbalances.
So what I would like to see, in terms of direction, for these early warning models, instead of focusing on currency depreciations, which again to be fair might be important for people like Peter, who will bet on currency movements and could make money off them, for the rest of us who are concerned about country vulnerability and balance of payments crises or debt financing crises, depreciations really aren't what we care about in terms of assessing vulnerability. What we should care about and what I'd like to see more work go into is models predicting things such as external financing difficulties and financial system vulnerabilities, just as a few examples.
So, to conclude, the lack of agreement in the different model results, the first part of my talk, is disconcerting. I find it very hard to rationalize how private-sector models that largely use a lot of the same variables, are predicting very similar things, come up with such different predictions month after month about who is vulnerable and who is not vulnerable.
I hope these models rethink their goal or at least people working on them in the future, rethink what the models are designed for. I think it would be much more productive to focus on crises, other than depreciation, say, 5-percent depreciations.
But to end on a slightly more positive note, I think the models are a useful tool to plumb more carefully country analysis. You can't take the numbers they give you as a prediction of what country is going to have a crisis, clearly, but I think it is useful to look at the output of different models and really get into them and say, why does one model predict a major depreciation, while another model doesn't? What are the differences? What are driving the different predictions in the models? And that is a very useful start to a more careful analysis.
DR. REINHART: Thank you, Kristin.
Next is Eduardo Borensztein.
DR. BORENSZTEIN: Thank you, Carmen.
Peter gave you a picture of one of the models, and then Kristin criticized it and also other models in the presentation. So I will tell you now who is right.
[Laughter.]
DR. BORENSZTEIN: In fact, what I want to do is, rather than presenting our work, which you can see in the Occasional Paper that's been distributed that summarizes some of the work that we've been doing in the Fund.
I'd like to take a step back and look at the EWS models by assessing their strengths and weaknesses in order to better determine what we might expect from these models' future ability to analyze vulnerabilities.
As I said, I've been working in this business for a while, and I meet people in the elevators. Some of them are here, and you know, they ask me, what are you working on now, and I say, well, we're doing early warning system models. You get two kinds of reactions. Some people say, oh, that's great. If you do this right, we can put the Fund on automatic pilot. You know, we don't need to worry about anything else. You can take care of us. Of course, there is a somewhat larger number of people that look at you and say: yeah, that's like predicting the stock market, right?
I think the truth is somewhere in between. I don't think it's quite like predicting the stock market, but also it's not as reliable as a naive interpretation may say.
What are the problems, beginning with the weaknesses? The first weakness, in my view is, well, how can you even think of setting up a model that would work for all different crises? Every crisis is different. You go from country to country; you know, Mexico has certain problems with debt denominated in dollars; for Asia, the problems appear to lie in the financial sector. And of course there are tens of other crises that derive from very different causes, when you study them one by one.
But I don't view this as a fatal weakness since the real utility of early warning system models, in my opinion, derives not from their ability to predict crises but to help us identify pertinent symptoms. What happens, for example, when a country is approaching a situation where its currency is going to have a big drop, or it's going to lose a lot of reserves, or come under severe foreign exchange market pressure, however measured.
The symptoms may be the same, even if the ultimate causes are very different. For some reason people tend to draw analogies from medical problems. Mike Mussa [former head of the IMF's Research Department], for example, used to say what we're doing is taking the temperature of the patient and see if the temperature is high or not. If it's high, that's all we're asking from the models. Now the reason behind such symptoms—to return to our medical analogy—may be found in the stomach, lungs, or what have you. In either event, we use early warning system to help us better determine where those financial symptoms might lie.
What are these symptoms? Well, there are things like reserves are starting to drop, the current account deficit is very high, the exchange rate looks overvalued, you know, some of the traditional fundamentals you would look at, but also some indicators of vulnerability itself, such as, for example, the level of short-term debt relative to reserves is very high. As we know, particularly in the Asian cases, that is a symptom of vulnerability. So, even though we don't have theory, we may still look at the symptoms, and we may have some early warning of a crisis developing.
Beyond all this, of course,are many crisis situations that result from self-fulfilling attacks. So the country is in a situation where market investors or speculators may at one point conclude that a given country is doing well, but then turn around and decide that the country is no longer a good risk, so they're all going to pull out, and in doing so help precipitate the crisis may. So there can be this sort of dual equilibrium situation, and we really don't know what is going to decide whether the country stays safe or goes into a crisis.
We concede that countries are vulnerable to such situations, i.e., having a low level of reserves relative to their short-term liabilities, which, in turn, can make you vulnerable to a speculative attack, but it might be a lot more difficult to say when this would happen. So the exact timing of the crisis is a lot more difficult to predict. In this sense, the models that focus on short-term prediction, like private-sector models 1- to 3-month horizon, certainly have a much harder time than models that have a 1- or 2-year horizon in looking at—
DR. BORENSZTEIN: [In progress.] —problem.
The other, the final weakness and a common criticism of EWS models is, well, because of this lack of a strong theoretical underpinning, this may end up being a big exercise in data mining; namely, if you search long enough for different variables, different specifications, you're going to find something that works well.
That's true, and I don't think there is an answer to that except by looking at the performance of the models outside the sample. You can always estimate the models, and with sufficient time and computer resources, you're going to have a really good estimate. The question is how is that model going to perform after that point as time goes by? And Peter was showing some out-of-sample performance, and we have done that as well with Andy Berg and Cathy Pattillo. There's a forthcoming Working Paper that addresses these issues; we have analyzed the out-of-sample performance of the model, which is certainly not as successful as it did in-sample, but sufficiently good enough to demonstrate some predictive power.
But what are the good things about EWS models and where I think we should use them?
First, and I think this is very important, they provide us with a systematic, objective, and consistent way of looking at country vulnerability. Sometimes it's too systematic and too mechanical, and you may find situations like what Kristin was complaining about—why didn't the models do anything in October after the attacks of September 11, given where the world economy may be going and so on. This has not yet been reflected in the models. The models don't cover all eventualities or a broad set of variables.
But I think that's also a strength because being objective and being mechanical, EWSs avoid biases that analysts may have. Of course, the competing method or complementary method is the analysis that economists do of a situation of a country. The problem with country analysis is that it tends to be biased sometimes. We tend to think that, for example, Korea is a big success story, and for sure it's never going to get into trouble. If you were to look at some of the numbers and some of the indicators in a more mechanical fashion, it would flash up as in a very vulnerable position. So I think the consistency is certainly a strength of EWS models.
Second, such models also provide us with a logical way to process indicators of vulnerability. I think we all agree that we're serious about preventing crises. There's a whole series of data we should be looking at. There is a whole number of indicators we have to be watching on a regular basis. But what do we conclude from looking at 25 different indicators for a country? Some of them may be going up, some of them are coming down and so forth. You need a method to process all of the indicators into a single measure of risk, and that is what the EWS models do in some statistically optimal way.
Finally, of course, the question is whether EWS models are worthwhile or not. It depends on their performance. As I have already said, while there is some literature out there, there is a difference between in-sample and out-of-sample performance, but it's not unusual to see a relatively high number of false alarms, for example, especially in a period that is relatively calm, as the last couple of years have been. Even if it doesn't look that way, there has been a lot less crisis than average.
But, of course, the relevant question here is, well, compared to what? And people look at things like spreads, and things like rating agencies' ratings of countries as possible indicators of vulnerability. It is clear that EWS models do a lot better than variables like that. I am not necessarily saying that markets are wrong or inefficient. Probably spreads are reflecting other things in addition to the possibility of a currency crisis, but it's not a great indicator if what you are worried about is predicting currency crisis. It is not. I wouldn't recommend using spreads.
We have also looked at more sophisticated predictions, such as those produced by the Economist Intelligence Unit, which looks at the risk of a depreciated risk of a currency crisis over a couple of years. Again, the models, in some periods particularly, for example, during the Asian crisis, do a lot of better than these other analyses.
So, even if the performance of early warning systems in absolute terms may not be stunning, it's not clear what the alternatives are that can easily beat these models. In either event, one would not use them as a unique tool, as a complete tool in deciding to put the Fund on automatic pilot and just run the model and see which country comes to the top of the list and do something about that country.
Clearly, it's a first step, it's the first round that will tell you which countries may be vulnerable. That has to be complemented by the analysis, more traditional surveillance work, and as such it has I think an important role to play within the vulnerability analysis.
Thank you.
DR. REINHART: Before I open the floor for questions, I just have one brief remark. Eduardo and the previous speakers have alluded to this, the weakness of early warning system models is that they are mechanical. The advantage of early warning system models is that they're mechanical. And in saying this I would remind all of you that probably the four most expensive words in history are, "This time it's different." And policy makers use that, the exchange rate is overvalued, the current account deficit is worsening, you're losing reserves, but, no, this time it's different because the administration in place.
And in investing it's the same way. When price earnings ratios are going way beyond any historic norm, "No, no, this time it's different because it's the information technology revolution or whatever." These mechanical models are a good way of sort of reminding us that there are certain empirical regularities associated with crises or with events.
My second and final remark is that, in some sense, I'm not surprised there's heterogeneity in the predictions of the models because not all of these models are trying to predict the same things. There are early warning models out there that are trying to pin down banking crises. There are other models out there that are really looking for currency crises. These are extreme events, which entail not only exchange rate depreciations, but substantial reserve losses and increases in interest rates, not simply fluctuations in the exchange rate of certain magnitudes. So they are built for different purposes.
From a policy standpoint, an ideal indicator is one that predicts the crises far ahead of time so as to enable the policy maker to engage in preemptive policy action. So an ideal indicator signals months ahead.
From where Peter sits or Kristin Forbes sits, the early warning could be a couple of weeks because policy isn't involved, it's reshuffling—
DR. GARBER: You can get out in that time.
DR. REINHART: You can get out much quicker. So the point I'm making is that the outcomes are all so different because the design is different and what they're trying to predict and what they're used for are also very different.
With that, let's open for questions.
QUESTION: I need a clarification from Dr. Garber and a question to follow.
In your Alarm Clock model, are the action triggers coming out of the model quantitatively? Are they a quantitative result of the model?
DR. GARBER: No, no, no. The basic model is there to estimate—
QUESTION: Probabilities.
DR. GARBER: What you might regard as objective probabilities, given that the model is right. The action triggers are a set of overlays on top of that. Since it's a discontinuous action, you're in or you're out, to make sense to convert what is a continuous variable, the probabilities, into a discontinuous action, and that is based on either an investment strategy, an appetite for risk or a particular or maybe profit and loss or maybe a desire not to miss—if a crisis occurs, and you don't want to miss it, but you don't really care if you miss the noncrises that occur. That depends subjectively on the requirements of the client or of our own risk management.
QUESTION: So the client will decide ultimately?
DR. GARBER: The action triggers that we provide are just kind of indicative based on a fairly arbitrary trading strategy.
QUESTION: Then, as a question, perhaps as a second-layer question to Ms. Forbes' observations, when you provide this data to your clients, how many of them, in your judgment, are you able to move? Because actions speak louder than words. You provide the words, these are the observations, but ultimately the person whose money is riding on that action is going to decide.
DR. GARBER: That depends entirely on the nature of the client. Some clients have their own ideas of things, and maybe they'd like the probabilities that you've provided, and then they take that, and they ignore what you tell them, and then they take their own actions.
QUESTION: But the market outcome is not determined according to what you tell them. Ultimately, the market outcome is determined according to what they do. So, as a ballpark, do you have an idea how many of them you're able to move? Because if they don't move, as Mr. Borensztein said, nothing will happen maybe.
So that's a very important question. It's a second-layer complement to Ms. Forbes' usefulness of the models' observations.
Thank you.
DR. GARBER: Do you want—is that a comment or—
QUESTION: I'm finished. So I can finally sit down.
[Laughter.]
DR. GARBER: In terms of what the clients—but you might want to add more. In terms of what the clients will do, as I was saying some of them really only are interested in the probability calc—some of them aren't interested even in the probability calculation because they have their own research staff, so they don't want to hear about it. Others are happy to see the probabilities because they want to know what Deutsche, you know, what a big bank is thinking and telling to other clients as well, but they'll make of it what they want to make of it and ignore your triggers, your action triggers.
A third set of clients might take it seriously and might move funds. It's hard for me to say exactly whether a particular trade—they will never tell you that a trade is triggered by this. Trades are triggered by—big trades are triggered by a very convincing story and not by a machine.
Our own risk management uses it as one straw in the wind among other indicators that they've gotten; in particular, internal flow data that they see, knowledge about market positioning, knowledge about the fact that a major corporate in a particular country is desperately begging for cash and is about to go bankrupt, all of that. So this will just be a little piece.
They need it because mechanically, for risk-limit positions which they impose, they need an objective probability. They really ignore the action triggers. All they want is a probability, and so they take the probability as relatively objective.
QUESTION: [Off microphone.] But, eventually, that probability is further subjectively evaluated in a very different realm of probability evaluations. The standard models will not explain this.
DR. GARBER: These "objective probabilities" are based on publicly available macro data for which there are lags. So there is nothing private about those data, and all these models do is convert those into a number. In addition, there is a lot more information that certainly a big bank and generally a large client would have because the large client talks to all of the big banks as well. So there is a lot more information that's combined with those numbers to come to an investment decision.
If all of the numbers are ringing, and everybody's model is ringing, and that is consistent with a bunch of other things, then they might move. But the people are not going to move, clients are not going to move just based on this sort of information because there is no story, there is no story there, really.
QUESTION: [Off microphone.] Dr. Borensztein and Dr. Reinhart, I appreciated your comments at the end, that there are many models out there and that there are [inaudible] who model different things, currency prices, [inaudible]. There are also, prior to these older models that banks use to measure political risk. Bank of America has a political [inaudible] model.
I actually wanted to direct the question to Dr. Forbes. And your exercise of looking at three private-sector banking models [inaudible] data for 10 months of the year 2001 is interesting because that did very well and gave us conclusions.
I'm actually a little more interested in other thinking you might be doing right now. I'm a little more interested in the Frankel Rose model, the Reinhart, Kaminsky, Lizondo model. There was a model published by the Institute of International Economics in the summer of 2000 that I think Dr. Reinhart you might have been—but there are these other models that are, they are large models. They are attempting to be robust.
If you were to dictate three nonprivate early EWS warning system models and attempt to do the same exercise over the last 10 months, have you done this, and is there greater—is there greater—is there robustness or [inaudible] between some of the think-tank models rather than the private-sector models?
DR. FORBES: That's an excellent question, and I'll admit I haven't done it, partially because the data is not as easily available as the private-sector firms, and also part of it is I think when you—I believe when you get into the other models, what they're trying to predict is more diverse.
At least the three private-sector models I focused on are focusing on 5-percent depreciations over a 1-month or 3-month horizon, and I believe the other models have more variation in what they predict, so comparisons might not be as accurate, but I haven't done it. It would be a useful exercise.
DR. BORENSZTEIN: As part of our work in this area, we've been tracking various models over the last 2 years, including the Kaminsky, Lizondo, Reinhart model, KLR, or at least the version that we implemented to be able to update it and follow it, and an in-house model which is described in the Occasional Paper called the Developing Country Studies Division model, DCSD,and we also track a few of the private-sector models.
We've examined, but haven't tracked Frankel Rose on an ongoing basis. We have done some tests of it in the past.
We haven't done any actual correlation of forecasts, and we haven't looked, in a summary way, in a systematic way,how in agreement or disagreement the models have been. We have tried to sort of visually decide when the models were in agreement, and that suddenly strengthens the message because they don't necessarily agree always, even though they're based on the same ideas and the same approach, to some extent, they do produce different forecasts.
I think the test that Kristin Forbes was putting out was a very tough test in the sense of, which one is the most vulnerable country. I think it's a pretty tough standard.
I think if one looks at a longer sample, I'd be surprised to find negative correlation. I think my impression is there is positive correlation. There might be higher correlation between models that are more similar than the models that are more different. But I think going forward it might make sense to think of ways of more systematically combining different forecasts and see if there is any value added in that activity, which I am not sure how to engage on, but my colleagues, I'm sure Cathy Pattillo and Andy Berg, will figure out how to do it.
QUESTION: I was wondering if you could, none of you specifically, but whoever wants to say a little bit more about contagion. How is it modeled in the short-term forecasting models of the private sector or is it just captured by dummies from sort of statistically inferior[?] way or is it modeled in some other form, and what does contagion matter in sort of longer-term early warning system models? Is it relevant at all? I mean, is contagion potentially a problem which is not, for a country, which is not otherwise identified as being vulnerable?
And so I'm not quite sure I fully understand the role of contagion either in the short-term or in the longer-term models.
DR. BORENSZTEIN: Let me start, and you can add.
A number of models have tried to incorporate some contagion variables, by and large, looking at the number of crises somewhere else and maybe weighing those crises by how close the links, say, trade links or financial links between the countries are or simply regional links. These are the short-horizon models, by and large.
My own feeling is that contagion is not a very robust variable. There are episodes where you have huge contagion. Russia is probably the best example. When Brazil, in '99, had its crisis, I think everybody was expecting major contagion throughout Latin America. Very little happened. But, anyway, this is the way it's formulated, and it is in some models, and Peter can tell me what they did in the Alarm Clock model.
But in terms of the longer-term models, we haven't even tried it because we can't figure out, how to, contagion is usually something that happens very soon. It's almost simultaneous or within a month or two, and the long-horizon model tries to forecast the crises over the next 24 months.
So we really need to sort of predict contagion, predict which country will have a crisis first, and therefore that will be contagious to another country or something. It will be just too complicated.
DR. GARBER: We do have contagion variables which typically are regional dummies, but feeding from the bigger countries to the smaller countries in the region, plus there's one that has to do with stock-market contagion, which seems to signal a ripple effect.
I distrust these a lot because I view them almost as a way of forcing a better fit, and it has no value at all, I think, in forecasting, primarily because if there is contagion in various episodes, the nature of it changes from episode to episode. There is almost no regularity, unless it's a trade-based effect, which is something real.
If it's a financial market-based contagion having to do with the way businesses are organized or risk is managed, then the next time around there is going to be a different organization of risk or business or different risk control. So I view it almost as a random disturbance that is built on top of everything else.
DR. REINHART: Let me add that, I mean, today, as we speak, there is this widespread perception that there is an aversion to risk, an increased aversion to risk, a loss of appetite for risky assets. That is something that is very difficult to model systematically, as Peter mentioned, in any of these kinds of models.
Peter mentioned trade links. There is another type of contagion, if you will, that is relatively reliably measured, and that is contagion via a common bank lender. In one of the works of Graciela Kaminsky and myself on crisis contagion and confusion, we indeed find some parallels between the contagion channels, looking at banking clusters, who is borrowing from Japanese banks and European banks. And there were a lot of parallels in that Asian episode to the Latin American episode in the early '80s where a vehicle of contagion were the U.S. banks.
QUESTION: I have a somewhat technical question.
In a sense, you are forecasting the exchange rate. As you know, in Europe, previous incarnation as Professor Garber, we have been recently reminded here that one of the most difficult things in all of this business is forecasting exchange rates. What is new here that makes us more confident that we will be able to forecast exchange rates when an entire sort of literature has said that that's almost impossible to do?
And, B, if indeed you are going to do it, why are you using this discrete-type model and why not just sort of straight, continuous exchange rate movements and use straight-time series, common filters and others which are, indeed, designed to do forecasting?
DR. GARBER: I actually don't view this as an exchange rate forecasting model because we're not forecasting the continuous exchange rate. We're forecasting a thing called an event, which is an exchange rate or we're calculating the probability distribution or a part of the probability distribution of a thing called an "event," which is an exchange rate movement of a particular size.
So out of that we have, based on an observable set of macro variables mixed up with a not-too-outrageous econometric technique, out comes a number which is a probability. It's not a forecast of the exchange rate, it's a probability that you will have an event of a particular size or higher.
QUESTION: [Off microphone.] You could do that from a continuous model also. I mean, you could run a [inaudible]—
DR. GARBER: You could, but it's really—basically, risk management doesn't want to do that. It just eliminates a lot of noise in the exchange rate that you're not really interested in, which is otherwise going to clutter things.
So out comes the probability. Now, normally, generally, the probabilities are quite low, especially of the big events, they're very low, and even of the smaller events it's rare to see a probability higher than 15 percent or 20 percent for the next month. So, in general, you're not really forecasting.
The probability of an exchange rate movement of 5 percent or more in the next month is .2—a .8 probability, therefore, that you're going to have an exchange rate movement of less than 5 percent, and that is not a forecast of the exchange rate. That establishes the risk environment that you might be in, but it's not a point estimate, a point forecast of an exchange rate.
MR. : [Off microphone.] Well, thank you very, very much for a stimulating presentation.
Thank you very much for coming.
[Applause.]
[Whereupon, the Economic Forum was adjourned.]