This article requires a subscription to view the full text. If you have a subscription you may use the login form below to view the article. Access to this article can also be purchased.

## Abstract

Rating agencies cluster companies in rating categories to signal their creditworthiness. The rating is based on qualitative and quantitative factors and often is a mix of public and private information. Market prices, either asset swap spreads or credit default swap premiums, reflect the market perception on creditworthiness (default probability) and loss given default. Assuming a recovery rate, we use the (risk-neutral) default probabilities to cluster them in (rating) groups. We use the well-developed technique of regime switching to cluster issuers into categories. We test the model over the period 2004–2014 on issues such as in-sample likelihood, forecasting accuracy, and rating stability. The model allows market participants to rate a company’s credit risk directly, complimentary to ratings issued by credit rating agencies.

Credit rating agencies (CRAs) use proprietary models to classify the likelihood that an issuer will be able to repay its debt obligations in a timely basis. The models are based on qualitative and quantitative factors of either public or private information. Fitch Ratings [2014] states that the opinions (ratings) may in turn be based on (1) disclosed management projections, (2) a trend (sector or economic), or (3) historical performance. Although the CRAs models are unknown, the process to classify the likelihoods is simple: first the likelihood of a default is estimated and then these likelihoods of defaults are classified (clustered) in buckets of creditworthiness (AAA, AA, A, BBB, etc.).

Although we have said that ratings are clustered probabilities of defaults and the market sees CRAs as complementary, there are some important differences in the approaches used by the CRAs in deriving ratings (see Ghosh [2013]). S&P’s rating is a pure indication of the probability of default of a bond, while Moody’s credit rating is based on expected loss, reflecting both the probability of default and the loss given default. Hence, one should expect Moody’s to assign lower ratings to industries with typically higher loss given defaults (e.g., Industrials and Financials) or when the economy is in a recession (as one expects higher loss given defaults when companies have to sell their assets at fire-sale prices under adverse market conditions).

Fitch Ratings [2014, pg. 4] states that its opinions (ratings) “typically attempt to assess the likelihood of repayment at the ultimate/final maturity.” However it also states that “credit rating express risk in a relative rank order, which is to say that they are ordinal measures of credit risk and are not predictive of a specific frequency of default or loss.”

To cluster the probability of defaults into rating groups, it is first necessary to choose how to measure the probability of default. A good candidate is the physical-world probability of default obtained from a structural model by means of, for example, the expected default frequency (EDF) of the KMV model (Crosbie and Bohn [2003]). Structural models are based on two variables: firm’s leverage and firm’s volatility. However, these variables are not directly observable and also are delayed.^{1} Another candidate is the risk-neutral default probability that we can infer from credit default swap (CDS) spreads. When we infer the default probability from CDS spreads, it is instantaneously available (no lag), but only includes a risk premium that is documented to be time varying (Berndt et al. [2008]). Therefore, after adjusting for the risk premium, the CDS spread should be the best measure for the directly observed probability of default.

Our idea is to consider a rating for a company as a state or regime. By doing so, we can cluster the probabilities of default with the regime-switching model of Hamilton [1989]. In this model, we can cluster all bonds to a particular number of regimes and identify the moments of the regimes, as well as classifying an individual issuer to a certain regime.

**IMPLIED RATINGS**

While CRAs have served investors since 1909, when John Moody published the first bond ratings on railroads companies (see White [2010, p. 211]), market-based ratings have a much shorter history. It was not until the introduction of a capital market instrument, the CDS, that academics and practitioners started to look at what the market implies about a company’s credit risk.^{2}
,Breger, Goldberg, and Cheyette [2003] appear to be the first to explicitly talk about market-implied ratings. They augmented information of the CRAs’ ratings by mapping bond spreads to these ratings. For each rating, they looked at the lower and upper bounds for bond spreads to minimize the misclassification. Later, CRAs applied similar techniques to map the CDS premiums on their ratings.

**Standard and Poor’s**

In 2009, S&P introduced a measure it refers to as Market Derived Signals (MDS) based on CDS contracts (see Bergman et al. [2009]). S&P estimates a piecewise linear regression for each rating class, where the dependent variable is the logarithm of the CDS spread, as follows (this is the expression for the basic model for U.S. domiciled non-financial corporate firms):

where *CreditWatch* indicates if the particular bond is under review at S&P, *DocumentType* indicates the type of contractual clause for the CDS,^{3} and *GICS* is the MSCI Global Industry Classification Standard. All variables are dummy variables (either 0 or 1). The ßs are the parameters to be estimated. The constant ß_{0} can therefore be interpreted as the benchmark (CDS) spread on a five-year CDS for that rating. S&P estimates this equation each day and it shows a sort of benchmark value for a bond with those characteristics.

Investors then can judge whether the error term ( ) is reasonable or too large. Note that a large error term also can be explained by factors that are not included in the regression, such as differences in recovery rates, liquidity premiums, and risk premiums. The estimated regression coefficients change each day and therefore so does the value of the benchmark spread. S&P also developed a model to estimate probabilities of default via a structural model (see Baldassarri, Chen, and Heinrichs [2012]). These probabilities are mapped into letter scores, equivalent to the S&P rating nomenclature.

**Moody’s**

Moody’s developed what it refers to as *Market Implied Ratings* (MIR)^{4} that translates CDS price, bond spreads, and equity market (structural model) information into the rating nomenclature of Moody’s. Moody’s calculates the rating gap between the CDS spread and the Moody’s rating. For the CDS-implied rating, Moody’s uses the five-year tenor and calculates the median CDS premium per rating category. That is, Moody’s takes all AAA-rated CDSs and calculates the median and the bandwidth. A specific CDS is then mapped on the CDS spread curve to determine its implied rating, and then the rating is compared with Moody’s assigned credit rating.

**Fitch**

Fitch has a CDS Implied Rating (CDS-IR) methodology^{5} that divides the CDS universe into three sub-models based on region, and then within each region the most prevalent restructuring form is used (Modified for Americas/Oceania, Modified-Modified for Europe/Africa, and Full for Asia). It smooths the data via an exponentially weighted moving average to filter out the short-term noise and then applies non-parametric mapping to the five-year senior unsecured CDS. It then estimates the boundary spreads that optimally span the ratings. This method is similar to Breger, Goldberg, and Cheyette [2003] as discussed above and is in fact minimizing an objective function.

**Non-CRAs-Based Implied Ratings**

All the models described above have in common the need for the CDS spread and the CRAs credit rating as critical input to estimate the implied rating. However, the CDS spread is the product of the risk-neutral probability of default and the loss given default (LGD), while the CRAs rating is a real-world probability. So in mapping the CDS spreads into the agency credit ratings, we neglect risk premiums in the default probability and possible (sector) variations in the LGD.

Creal, Gramacy, and Tsay [2014] estimated implied ratings without CRA ratings as a critical input. They use the full term structure of (risk-neutral) probabilities of default (based on CDS spreads and assumed LGDs) to cluster bonds with a finite (Markov) mixture model into rating categories.^{6} Their model is multivariate as it includes the full term structure of default likelihoods and has a Markov transition matrix that varies in time to account for changing market conditions. One is able to quantify the uncertainty of a bond being classified in a certain state, based on the well-known Markov regime-switching toolset for predicting, filtering, and smoothing. Creal, Gramacy, and Tsay [2014] estimate the model with Markov Chain Monte Carlo (MCMC) simulations; they demonstrate that their ratings are an earlier indication of a default than CRA ratings.

**THE MODEL**

Our model differs from Creal, Gramacy, and Tsay [2014] in five important ways. First, we apply a univariate instead of a multivariate finite Markov mixture model. Econometricians refer to such a univariate finite Markov mixture model as a regime-switching model (see Hamilton [1989]), which is more tractable than the multivariate version. Second, we estimate the (risk-neutral) transition matrix by following the Jarrow, Lando, and Turnbull [1997] model (JLT hereafter). Third, we estimate the default probability by varying the risk premiums in three distinct ways (level, slope, and curvature), while Creal, Gramacy, and Tsay [2014] use a logistic function. In our model we therefore have a direct relation between the physical and the risk-neutral transition matrix. Fourth, our model requires fewer parameters to estimate. Finally, we include more external state variables to estimate the time-varying transition matrix and report the impact of these variables.

The JLT model implies the risk-neutral transition matrix from three inputs: a physical transition matrix, credit spreads, and the CRAs ratings. The model relates the physical-transition matrix and the risk-neutral transition matrix via the risk premium. By estimating the JLT model in a regime-switching framework, we enhance the model in two ways. First, the JLT model estimates the risk premium based on the assumption that the ratings of the CRAs are accurate. However, if they are not accurate, the risk premium will be misestimated as well. Therefore, we will imply the risk premium and the ratings simultaneously based on the physical world transition matrix.

Second, the JLT model explains the risk premium per CRA rating, but does not explain the differences among issuers within the same CRA rating category.^{7} Because we estimate the ratings in a regime-switching model, we also assign probabilities of being in a certain regime to each issuer and therefore explain the differences in risk premiums among them.

In the next subsection we describe the model setup that is based on the regime-switching model of Hamilton [1989], while in the subsequent subsection we describe how we estimate the risk premium based on the JLT model in the case of a constant risk-neutral transition matrix. In this section’s final subsection we describe how we estimate the risk premium in the case of a time-varying risk-neutral transition matrix.

**Regime-Switching Model**

We define *N* states in which a bond can reside. Such a state is equivalent to a rating category. Let state 1 be the state with the lowest probability of default (and therefore the highest credit quality) and state *N* be the state with the highest probability of default (and therefore the lowest credit quality). The probability ? that firm* k* will default at time *t* is

where *q*
_{s}
_{,}
_{N} is the probability of default of state *s*, and
is the error term of state *s* and is assumed to be normally distributed N(0,s^{2}).

The actual state *s* in equation (1) where the bond can reside is a non-observed random variable that can take values from 1 up to *N*. A probabilistic model lies behind the changes in *s* and can be specified by an *N*-state Markov chain:

where *q _{ij}
* is the probability that the bond transitions from state

*i*in period

*t*- 1 to state

*j*in period

*t*. As

*q*is a probability, the values are restricted to a range [0,1].

_{ij}Since we have defined *N* different states, the full *N* × *N* transition matrix of *q*
_{ij} is:

where *q*
_{1,N} is the probability that the bond that is actually in state 1 defaults (*N*) between now and the next period. Default state *N* is an absorbing state, so therefore the last row of *Q* only contains a 1 in the bottom-right cell and for the rest zeros. Since all rows have to sum to 1, the last column of *Q* is given if we know all the other cells. Therefore, the *N* × *N* transition matrix reduces to a matrix of size (*N* - 1) × (*N* - 1).

We need to infer the model parameters ? based on the available information O_{t}. The available information O_{t} contains the times-series of probabilities of defaults of all bonds: O_{t} = {?_{k}(*t*), ?_{k}(*t* - 1), …, ?_{k}(1), ?_{k}(0)} and the model parameters ? that we need to estimate includes the transition matrix Q, the means (µ), and the standard deviations (s) of the regimes^{8}: ? = *c*(s_{1}:s_{N}, Q).

To estimate ?, we follow the regime-switching model of Hamilton [1989, 2008]. This model is estimated iteratively for *t* = 1, 2, …, *T* and the inference takes the form of probabilities:

where ?_{k}
_{,}
_{j}(*t*) indicates the probability that bond *k* is in state *j* in period *t*, knowing all information up to *t*. For each bond *k*, the vector ?_{k}(*t*) is of length *N* and contains the probabilities that the bond is in one of the *N* regimes. The vector ?_{k}(*t*) for each *k* and *t* sums to unity because the probabilities have to sum up to 1.

The inference will be performed iteratively for *t* = 1, 2, …, *T*, with step *t* accepting the input values of ?_{k}
_{,}
_{j}(*t *- 1).^{9} Then we need to estimate the density ? of each regime *j* for each bond *k* at *t* as follows:

where *j* can take values of 1, 2, …, *N* while *q _{j,N}
* and s

_{j}are the mean and standard deviation of each regime as defined above.

Now that we have the densities of each regime given by equation (5), the probability of the bond being in that regime given by equation (4) and the state transition matrix given by equation (3), we can calculate the conditional density of bond *k* at each *t* as follows:

To estimate the model, we compute the sum of the logarithms over all *k* and each *t*:

and estimate the model by maximizing the likelihood.

The difference between (7) and the regime-switching model of Hamilton [1989] is the number of time-series. Hamilton’s model had only one time-series (real GNP growth), while we have *K* number of time series (for each CDS issuer) that are driven by the same process. The number *K* is also variable through time, as companies default, merge, or emerge. Therefore, we have to divide (7) each period by the number of observations *K*.

After each iteration we update the probability vector ?:

8The estimation is via an iterative process and therefore we need to initialize equation (4) at the starting point *t *= 0. Hamilton [1989] initialized the model with the ergodic transition probabilities, but we are unable to do so for two reasons. First, our transition matrix is reducible (or non-ergodic). Second, we have multiple (*K*) observations (CDS issuers) per period that all should have different weights as long as their probability of default is different.

An issuer with a low probability of default should have a weight vector ?_{j}(*t*) that has its mass at the left side (exposure to states with low probability of default), while an issuer with a high probability of default should have a weight vector ?_{j}(*t*) that has its mass at the right side (exposure to states with a high probability of default).

To set the initial exposure ?_{kj}(*t* - 1) of a bond we follow two steps. First we search for the weights *w* for each issuer *k* where the product of the weight and the probability of default of the state *q* is equal to the default probability of the issuer (?_{k}):

where *w _{kj}
* is restricted to be positive and the sum of the weights

*w*

_{kj}over all

*j*is equal to 1. These weights give us the bond’s exposure to different regimes given the exact pricing of the default probability.

Then as a second step, we multiply the weights by the *Q* matrix to obtain the vector of initial weights:

If we take, for example, a bond that is exactly priced in equation (9) when it has a 50% weight *w*
_{k1} to the first regime and 50% weight *w _{k}
*

_{2}to the second regime (all other weights are equal to zero), the initial weight ?(

*t*- 1) is set by taking 50% of the first row of

*Q*and 50% of the second row of

*Q*. Therefore, the initial transition probabilities of this bond are exactly in the middle of the transition probabilities of regime 1 and regime 2.

**Constant Risk-Neutral Transition Matrix**

It is well documented in the literature that (1) there is a large difference between the probabilities of default calculated from historical data (physical probabilities, *P*) and those implied by prices (risk-neutral probabilities, *Q*) and (2) these differences fluctuate over time.^{10} If we assume that the historical transition matrix *P* is relatively stable, the fluctuations in time come from market participants’ pricing of bonds. The source of the fluctuation is therefore the risk-neutral transition matrix *Q*.

In this section, we first assume that there is a risk premium, but this risk premium is constant. We need to estimate ?, which consists of *Q* and s. As we have seen in equation (3), to estimate the matrix *Q* we need to estimate (*N* - 1) × (*N* - 1) parameters, while we need (*N* - 1) parameters for s.^{11} If we set *N* = 8,^{12} the estimated set of parameters ? consists of 56 parameters: 49 parameters for *Q* and 7 parameters for s. With such a large number of parameters, the system is difficult to estimate, so we need some reduction in the number of parameters. We will do so by imposing some structure on both *Q* and s.

To impose some structure on *Q* we will follow the JLT model. The JLT model assumes that the risk-neutral probabilities are dependent on the physical world probabilities:

where ?_{i} is the (constant) risk premium. Note that (1) the risk premium is only dependent on the current state (*i*) and not on the prospect state (*j*), and (2) the risk premium is defined in such a way that if ?_{i} = 1 there is no risk premium and the risk-neutral probability is equal to the physical world probability.

The full matrix of physical transition probabilities *p _{ij}
* is

*P*. The sum of each row in

*P*is equal to one, so if the risk premium is different from the one in equation (11), the sum of the rows for the resulting

*Q*matrix would be different from one. Therefore, the JLT model adjusts the diagonal elements such that the sum of each row is equal to one. We can write the risk-neutral matrix as follows (see Jarrow, Lando, and Turnbull [1997, p. 489]):

where *I* is the *K *× *K* identity matrix and ? is a *K *× *K* matrix with the risk premia ? on its diagonal. To avoid negative probabilities, there is a restriction imposed on the risk premium since the diagonal elements of *Q* cannot be smaller than zero.

To ensure that the probabilities of default are well separated (economically meaningful and reduce misclassification risk) and increasing (no-arbitrage), and to reduce the number of parameters to be estimated, Creal, Gramacy, and Tsay [2014] impose structure on the means *q _{s,N}
* so that collectively they are only dependent on one parameter by using a logistic function.

^{13}We considered this function as well, but choose something else, as in their approach there is no direct relation between the risk-neutral default probability

*Q*and the physical default probability

*P*as there is in expression (11).

The last column of *Q* is equal to:

The risk-neutral default probability is equal to the physical default probability times a risk premium. To guarantee that the regimes are well separated and increasing, we need to impose a structure on the risk premia. Litterman and Scheinkman [1991] show that the yield curve can move in three distinct ways: level, slope, and curvature. We will follow this idea for the movements of the risk premia and use the parametric shapes proposed by Nelson and Siegel [1987], the level (*L*), slope (*S*), and curvature (*C*).^{14}

where *j* indicates the regime and ? is a fixed parameter that we set equal to 0.1.^{15} We will orthogonalize and normalize the factors. Cross-sectionally this setup will give orthogonal results but in a time-series it correlates as indicated by Diebold and Li [2006]. We choose this form and inspect the correlation matrix later. The risk premiums can be estimated by a function of these level, slope, and curvature movements:

We choose the exponential form so that when the ß’s are all zero, the risk premium is one and the risk-neutral matrix is equal to the physical matrix. We need to estimate three parameters: ß_{1}, ß_{2}, and ß_{3}. We now have reduced the number of parameters to be estimated for the *Q* matrix from 49 to 3 when we set *K* equal to 8.

The estimated set of parameters ? consists not only of *Q* but also of s. We also want to reduce the number of parameters for s by assuming that the variance is a function of the default probability. Higher rated bonds have lower variance as below:^{16}

where t is the parameter to be estimated in the model and *q _{s,N}
* is the default probability of state

*s*. The parameter t determines the level of . The default probability

*q*

_{s,N}will increase from state 1 to the next state and s will also increase, but if t > 1, not as much as the default probability. Therefore, t can be seen as a kind of t-value of the ratio default probability to standard deviation.

We now need to estimate four parameters for the model with a constant risk-neutral transition matrix.

**Time-Varying Risk-Neutral Transition Matrix**

Given the evidence of time-varying risk premiums, we replace the risk premium ? in equation (11) by a risk premium that is a function of time:

19the time-varying risk-neutral transition matrix *q _{ij}
*(

*t*) is a function of a time-varying risk premium ?

_{i}(

*t*) and a constant physical transition matrix.

Hull, Predescu, and White [2005] show that the risk premium ?_{i}(*t*) decreases if the credit quality declines. However, the real-world probability of default *p*
_{ij} increases if the credit quality declines. So equation (19) could result in risk-neutral default probabilities that (1) are not in the same ranking order as the physical probabilities and (2) are not well separated anymore. Therefore, we need some structure and include in equation (17) an external state variable:

where *I*(*t*) is the (demeaned) exogenous indicator and ?_{1}, ?_{2}, and ?_{3} are the additional parameters to be estimated. Note that the constant risk premium model is nested in the time-varying risk premium model if we set the ?s to zero. In our empirical section we will test the same exogenous indicator variables as in Figlewski, Frydman, and Liang [2012].^{17}

**DATA**

To estimate our model, we need CDS spreads, data for the exogenous indicators, and a real-world transition matrix. To test our model, we need CRAs ratings.

**CDS Pricing Data**

We use the Markit database for CDS spreads. This global database contains daily CDS spreads for corporates from January 2001 to June 2014. The CDS spreads in the database are composite spreads, based on contributions from different brokers. Each reference name has different entries per date because currency, clause, and seniority can be different. We restrict ourselves to entities that have CDS spreads denominated in U.S. dollars (USD) and are senior in tier. Additionally, we restrict ourselves to corporate issuers from the United States. The database contains data on the full term structure of CDS spreads. However for simplicity we only focus on the one-year probability of default and therefore use only the one-year CDS spreads.

When we apply the above filter to the database of CDS spreads, we have a maximum of four different spreads per calendar day, as there are different contract clauses. CDS contracts pay out whenever a credit event happens. The International Swaps and Derivatives Association (ISDA) has defined six events that can trigger a CDS payout^{18} (see Markit [2008]). The most common three events are default, failure to pay, and restructuring. The first two are obvious events but restructuring can be interpreted differently by market participants. Therefore ISDA has defined four restructuring clauses: Full Restructuring (CR)^{19}, Modified Restructuring (MR)^{20}, Modified-Modified Restructuring (MM),^{21} and No-Restructuring (XR).^{22} A CDS can in principle be written on any of these clauses.

Exhibit 1 shows the development of the credit market over time. We show the number of issues written on each clause and also include the number of unique quoted issues for contracts that have both a CDS spread entry and a recovery rate entry^{23} in the Markit Corporation database. As of January 3, 2001, the first date in our database, we have only 89 unique contracts. The prevalent contract was modified restructuring (MR), as all unique contracts were quoted on the basis of this clause.

The CDS market showed a strong growth in the number of contracts between 2001 and 2008: the number of unique CDS spreads peaked at 997 in April 2008 and most contracts were still quoted in modified restructuring (975), although full-restructuring (713) and no-restructuring (845) followed closely.

After April 2008 the number of unique contracts fell, but it stabilized after April 2009 at about 850 unique contracts. Moreover, since April 2009, the prevalent document type in the United States has been “no restructuring” (see Markit [2009]). By the end of April 2009, the unique number of priced contracts was 873, with most contracts now quoted in terms of no-restructuring (862) and a bit fewer in modified-restructuring (859). For the last date in our database, June 30, 2014, the unique number of contracts is 750, of which 746 are quoted under the no-restructuring clause and 732 under the modified-restructuring clause, with only a minority under CR (275) and MM (188).

The two differences between the contract types are (1) the protection one gets in case of a restructuring and (2) the choice of the deliverable in the case of a payout. The document clauses that bring most protection to the insurance taker are the ones that include restructuring (CR, MR, and MM) and are therefore the most expensive, while the document clause that brings the least protection (XR) is the cheapest. Additionally, the Loss Given Default (LGD) in the case of a restructuring is the highest under CR and the lowest under MR. Therefore, the CDS spreads should satisfy the following condition CR > MM > MR > XR (see also Packer and Zhu [2005]).

We can calculate the probability of default of a reference entity by the credit triangle:

21where ?_{j}(*t*) is the default probability of firm *j* at time *t*, the recovery rate is represented as ?_{j}(*t*) and the CDS spread is represented as *CDS*
_{j}(*t*).

The Markit Corporation database also contains a field that shows the consensus traders expectation about the recovery rate ?. Therefore, we can test equation (21) and gauge if the credit triangle holds for the different contract types. Contract types that include restructuring (CR, MR, and MM) should have different recovery rates and CDS spreads, but the default probabilities should be equal, as they all insure the same risk. The default probability of contracts under the XR clause should be lower.

Exhibit 2 shows the average difference per year between the default probability of a contract with a CR clause and the three other clauses for those issuers that have quotes in both clauses. It is clear that in recent years (2011–2014) the average difference in price between the contracts that include restructuring is zero (as expected), while the average default probability of CR contracts is marginally higher than the XR contracts (as expected). In prior years, we observe larger-than-expected differences between MR and MM clauses and CR. If we believe that the credit triangle holds, then the explanation could be that the recovery rate has a term structure and the reported consensus trader’s expectation is the recovery rate for the most liquid tenor (five years). However, given the abrupt change, we believe that something else caused the difference prior to 2011.

We can either choose for the consensus trader’s expectation or, as market practitioners do, assume a fixed 40% recovery rate. Creal, Gramacy, and Tsay [2014] have chosen the consensus trader’s expectation, but we choose the 40% fixed recovery rate because we would otherwise include another measure of uncertainty in the model: how good are the consensus trader’s forecasts?^{24} Consequently, we assume that the recovery rate is 40% to calculate the probability of default in equation (21).

We choose to work with the default probability that is estimated with the MR clause,^{25} because this includes restructuring and has the highest insurance coverage over time.

We apply weekly averaging of the default probabilities to enrich and clean our database. We take Wednesday as our analysis day and take all daily quotes from the Thursday of the week before until that Wednesday. This cleans the database (smoothing liquidity issues) and enriches the database (i.e., more data points).

As can be seen in Exhibit 1, the number of different CDS spreads is limited at the beginning of 2001. Therefore we decided to start our dataset on December 31, 2004. The database ends June 30, 2014, providing us with 9.5 years of daily observations and 1,536 unique CDS tickers.

**Exogenous Indicator Variables**

We choose to test the same macroeconomic variables as defined in Figlewski, Frydman, and Liang [2012]. Our selection of these variables is based on (1) their grouping of the variables in different categories and (2) their reasoning for using lags and smoothing in variables.

We have listed the different variables and sources in Exhibit 3. The variables are grouped in three main categories: (1) those related to the health of the economy (“General Macroeconomic conditions”), (2) those that indicate if the economy is improving or not (“Direction of the economy”), and (3) financial market indicators (“Financial Market conditions”).

As much as possible, we use the same sources and definitions as in Figlewski, Frydman, and Liang [2012]. For the corporate bond default rate, however, instead of using the percentage of U.S. corporations that defaulted over the past 12 months as provided by Moody’s KMV, we use the percentage of companies defaulted in the Markit CDS database based on S&P ratings. We will discuss the S&P ratings below. For the stock market indicators (S&P 500 return, S&P 500 volatility, and Russell 2000 return), we use Bloomberg as our source instead of Wharton Research Data Services.

Figlewski, Frydman, and Liang [2012] argue that macroeconomic variables might not have an instantaneous effect on the number of defaults because, for example, a slower growth in the economy would not immediately lead to more defaults, but would impact the financial health of companies, and therefore defaults, with some lag. One solution could be to add a series of lagged variables, but this would lead to too many coefficients to estimate. Therefore, Figlewski, Frydman, and Liang [2012] chose to use exponential declining weights. If *X*
_{t-k} is the macroeconomic variable at time *t* with lag *k*, then the smoothed series becomes:

where d is the decay factor. We will follow Figlewski, Frydman, and Liang [2012], who chose to set d to 0.88 and *K* to 18.^{26} Effectively the weight of a specific macroeconomic variable is:

Figlewski, Frydman, and Liang [2012] base their study on monthly data, while in this study we use weekly data. Consequently we need to smooth the decay factor from monthly data to weekly data. We developed an exact relationship in smoothing from monthly data to weekly data while keeping the decay factor equal:^{27}

where *f * is the numeric indicator indicating the time horizon of the original series. In our example, we want to go from a monthly (four weeks) to a weekly indicator and therefore *f* equals four. Figlewski, Frydman, and Liang [2012] use a decay factor of 0.88 and therefore the weekly decay factor becomes 0.97 by using equation (24). Then the sum of the first four weekly decay factors equals the first monthly decay factor, the sum of the fifth until the eighth weekly decay factor equals the second monthly factor and so on.

The correlation between the macro factors is shown in Exhibit 4. Most of the correlations are moderate, with some exceptions. Figlewski, Frydman, and Liang [2012] reported high correlations for the same variables in their sample—for example the S&P 500 Index and the Russell 2000 Index—but others are higher in our sample (unemployment and three-month yield). The main reason for this difference is that we studied a different time period (2005–2013 in our sample versus 1981–2002 in Figlewski, Frydman, and Liang. Moreover, our sample included the credit crisis that began in 2008. As in Figlewski, Frydman, and Liang [2012], we retain these highly correlated variables in our study because it is unclear which are the best to use in our model.

**Real-World Transition Matrix**

As in JLT, we use a realized transition matrix to calibrate our model. More specifically we use the average one-year U.S. corporate transition rates reported by Standard & Poor’s that include realized transitions from 1981 to 2011 (S&P [2012]). This matrix is shown in Panel A of Exhibit 5. We modify this matrix in three ways. First, we follow JLT and exclude the withdrawn ratings (NR) from the published matrix (see Panel B in Exhibit 5).^{28} Second, also following JLT, we assign a default probability to the highest credit class. Third, we assume that bonds residing in a particular rating category either remain in that same category, have an upgrade or a downgrade, or go into default. This cleans most of the off-diagonal entries, as can be seen in Panel C of Exhibit 5.

**S&P Ratings**

To test our model, we require data on the credit quality development of companies. A CDS is a derivative that is written on an underlying bond (the reference obligation) and therefore we can monitor the development of the credit quality of this reference obligation. To do so, we first match each CDS ticker with its reference obligation (ISIN) in the Markit Reference Entity Data (Markit RED) Obligation file.^{29} Not all CDSs are in the Markit RED file, as Markit scrubs the issuers based on general criteria.^{30} We found at least one reference obligation for 1,114^{31} out of 1,536 tickers. A CDS ticker could have several reference obligations as the underlying bonds mature or are called. Then we matched each reference obligation that we found with the corresponding S&P long-term issue rating.^{32} There are 2,929 reference obligations specified for the 1,114 tickers. We found an S&P issue rating for 2,901 bonds, so we have a rating history for 1,098 out of 1,114 tickers.

The S&P rating service is primarily an issuer-pay model^{33}: the firm charges issuers a fee for providing a rating opinion. Some companies select not to have any rating at all or choose a different CRA (e.g., Moody’s or Fitch). In this last group is also McGraw Hill, the parent company of S&P, that has a no S&P rating as there might be a conflict of interest.

The S&P rating classification scheme is based on 10 different letter grades, ranging from AAA to D,^{34} where AAA denotes a debt obligation with the least credit risks and D denotes that the debt obligation is in the default state. In addition, S&P may modify the letter gradings AA to CCC with a plus (“+”) or a minus (“-”) to show their relative standing (see S&P [2012, pg. 5]), so that there are in total 22 different grades or regimes.^{35}

If a ticker has more than one reference obligation at a certain point in time, there can be a conflict in the rating between the bonds. For example, the CDS ticker BELLO has two reference obligations in the Markit RED file (6.75% on May 5, 2013, and 8.00% on November 15, 2016). Before June 17, 2011, the rating of the 6.75% bond was two notches below that of the 8.00% bond, but after that date the rating was the same.

The 2016 issues have much stronger covenants (negative pledge, change of control protection, limitation of debt, and restricted payments). There are several cases in the database like this and as the lowest-rated bond will probably serve as the indicator for credit risk, we include the lowest-rated bond for the credit rating of the CDS ticker. We follow this rule as long as the issuer has not defaulted on the bond issue. If the issuer defaults, we exclude it as per the next day from the database (we assign NR). If another reference bond for that ticker has not defaulted we include the rating for that bond as per the next day.^{36}

Exhibit 6 shows the results of the matching of tickers, reference entities, S&P ratings, and CDS spreads. The number of tickers with an S&P rating and a priced CDS is shown in rows A and B. The number of tickers with both a priced CDS and an S&P rating is reported in C.

We monitor the S&P ratings for the tickers in row C during the year and count the number of times we find a D (default). Row D in Exhibit 6 indicates that we found 92 defaults for the tickers and their reference bonds for the years 2006 to 2014. Some tickers in the database defaulted multiple times. One ticker defaulted four times, 2 tickers defaulted three times, and 10 tickers default twice.^{37} Energy Future Holdings Corporation (EFHC), the ticker with the reference obligation that defaulted according to S&P four times, only defaulted once according to the International Swaps and Derivatives Association (ISDA).

The ISDA makes binding decisions on credit events that may trigger obligations for CDS contracts. The resulting decisions are in the Markit reference file but also publicly available.^{38} What ISDA does not consider as a default is so-called distressed exchanges (see for example Altman and Kuehne [2012, pg. 19]), where the holder of the bond chooses to write down a large part of the notional in order to avoid bankruptcy filings and potential larger future losses. Often, as indicated by Altman and Kuehne [2012], such write-downs are just a short-term repair and do not prevent further future losses. S&P considers these distressed exchanges as defaults and therefore marks the event with a D, immediately reinitiating the rating the next day with a higher rating.

Distressed exchanges and subsequent further write-downs are also reported in row E in Exhibit 6 (Single Year Defaults), where we only allow tickers to default once a year. Six bonds defaulted twice in the same year in 2008. To test the forecasting performance of the model in the empirical section we monitor the rating of the ticker at the end of year *T* and then check if the firm defaults in year *T* + 1. Therefore we only include the first default appearance in year *T* + 1. Additionally we need to have a rating for that ticker at the end of year *T* (see row F) and we need a priced CDS at the end of the year *T* (see row G). Combining the availability of a rating and a priced CDS reduces the number of observed defaults to 50 (see row H).

We divide the number of defaults reported in row H by the number of tickers in the universe as shown in row C to get the default percentage for that year.

Since we have not observed any default in our database in 2007 and 2013, we validated by cross-checking several sources. First we compared the number of defaults we found with the ones reported in the *2014 Annual Global Corporate Default Study of*
*S&P* [2015]. We report the number in row J in Exhibit 6. In 2007, the number of global defaults according to S&P [2015] was below average (0.37%), while in 2013 the number of defaults was on average 1.06%.

The S&P default study covers global corporate debt obligations that include both loans and corporate bonds. In this study, however, only senior corporate bonds originated by companies from the United States that are USD-denominated are included. Therefore, we check the results of the CDS auctions in 2013 for the tickers that had a credit event.^{39} Ten auctions were held, of which nine were relevant, as it considered credit events that happened in 2013.^{40} Three out of the nine credit events were from U.S. companies; however, all of these auctions were so-called Loan Credit Default Swap (LCDS) auctions where the underlying was not a corporate bond but a secured loan. Therefore, we conclude that the default-free years of 2007 and 2013 were a result of the sub-universe the CDS covered in contrast to the S&P universe.

**EMPIRICAL RESULTS**

We need to answer the following four questions when we estimate the model: (1) What is the best specification of the regime-switching model when we classify the issuers in-sample? (2) Do the ratings from these estimated (in-sample) models have any prediction power for future defaults? (3) Does the estimated model produce stable ratings? (4) How do these three questions inter-relate? We will answer these four questions in the subsequent section.

**In-Sample Log Likelihood**

We choose to setup the model with 8 regimes. We choose 8 instead of 10 regimes because S&P combines the ratings CCC, CC and C in their physical transition matrix (see Panel A of Exhibit 5).

**Constant risk premium.** We estimate the model at each year-end with weekly data and vary the length of the sample period in either a rolling one-year estimation window or an expanding estimation window that starts in 2005. For example, at the end of 2010 we include for the rolling model all 52 weeks for that year, while for the expanding window we included the 312 weeks starting January 2005 and ending 2010. We do so because we expect that the parameters of the rolling-window model will vary more than those of the expanding model, and so we will have some insight on the variability of the parameters.

Exhibit 7 shows the results of our estimate for the four parameters (three ßs and t) in equation (13) and equation (14) for the two sample sizes. The rolling model shows a few interesting results.

First, the log likelihood is not equal each year, suggesting that the model is able to classify the companies default probabilities in some years better than in other years. The likelihood of the model with the rolling window shows the lowest likelihood in 2009. An explanation is that the issuers’ default probabilities were more dispersed in 2009. A higher dispersion leads to a lower likelihood, as can be seen in equation (5); the likelihood is higher whenever the default probabilities are closer to the state means and the state standard deviations are lower. When the issuers’ default probabilities are more dispersed, they are further from the state means and the states have higher volatilities. It is obvious from Exhibit 6 that when the log likelihood is lower (such as in 2009), the parameter that governs the volatility t is lower. In equation (18) we see that the smaller t is, the higher the standard deviation.

Second, the level parameter that indicates the parallel shock to the risk premiums is generally positive, so that risk premia are above 1, except in some years (2006, 2007, and 2013). The years 2006 and 2007 were known for low credit spreads which are shown in the level parameters. When the credit crisis started, the level parameter changed from negative to positive, reaching the highest value at the end of 2009.

Third, the sign of the slope parameter is always negative, and since the slope factor given by expression (15) is defined as upward sloping, this negative sign indicates higher risk premiums for the highest-quality companies and lower risk premiums for the lowest-quality companies. This is similar to what Hull, Predescu, and White [2005] reported, in that the ratio of the risk neutral to physical world probabilities decreases if the credit quality declines.

Fourth, the curvature parameter is negative and the curvature factor is positive at the extremes and negative for the middle segment, resulting in slightly lower extremes and a higher middle part.

Fifth, the value of the t parameters is relatively stable through time and, except for 2009, well above 3. This indicates that the regimes all have low variances compared to the default probability and the different regimes are well separated, as there is not much overlapping of the distributions. To get some more confidence in the estimated parameters, we also calculate the standard error via the empirical Hessian matrix.^{41} Assuming normality for the model, we calculate the t-statistic for each estimator by dividing the estimated parameter by the standard error as shown in brackets in Exhibit 7. It can be seen that all parameters have generally high t-statistics, indicating that the parameters are all significantly different from zero.

Looking at the expanding model in Exhibit 7, we see that for 2005, the two estimates are equal as the samples are identical, but they are different for the other samples. As we move in time, the expanding sample becomes larger and the estimated parameters for the expanding model start to stabilize in comparison to the rolling model. Also the *t*-statistic is much higher for the expanding period as the sample becomes longer and the parameters stabilize.

We can also use the empirical Hessian matrix to inspect the correlation between the estimated parameters. Panel A of Exhibit 8 depicts the average correlation matrix for the rolling sample, and Panel B reports the average results for the expanding sample. In both cases, the correlation is low. The maximum correlation is 0.41 in the expanding sample for the correlation between ß_{2} and ß_{3}.

The essence of the regime-switching model is to estimate the means of the different regimes. It is not so easy to interpret these means directly from the estimated parameters in Exhibit 7 and therefore we provide the averages in Panel A of Exhibit 9 for the different years for the rolling sample model. For easy interpretation, we chose to map the regimes directly on the S&P nomenclature: the regime with the lowest default probability is given the alphanumeric rating AAA. To have an idea about the risk premium, we also included the S&P physical default probability (from Panel C in Exhibit 5) in the last row of Panel A in Exhibit 9. We see that investment-grade regimes (ratings AAA to BBB) have higher default probabilities than those of S&P (a positive risk premium) in almost all cases. For high-yield regimes (BB to C), however, the results are more ambiguous: in the years with low credit spreads (2006 and 2007) their means are substantially lower than the S&P physical world probabilities, while for the years of the credit crisis (2008 and 2009) the means were much higher for regimes BB and B, but still lower for regime CCC/C.

For the regimes B and CCC/C, the average default probability is lower than the physical world probabilities. For regime B there are at least some years (2008 and 2009) where the default probability is higher than the physical world probabilities, but for regime CCC/C all the individual years have a lower default probability than the physical world. How can we interpret these low default probabilities for the CCC/C regime and does this for example mean that CCC/C rated bonds have a negative risk premium and are too expensive? To answer that question, we compare the result with the S&P ratings.

In Panel B of Exhibit 9, we report the average default probability for each S&P rating at the year-end. As a result we create a similar table as Panel A. We observe in Panel B of Exhibit 9 that the (risk-neutral) default probabilities vary dramatically and that the matrix is inconsistent over the years. Dramatic is the year-end 2008, where the average default probability of AAA is nearly 4% and the CCC/C category has a default probability of 96.15. Inconsistency can also be seen at year-end 2008, where the default probability of AA rated bonds is higher than A rated bonds. Those two observations can be seen as a sign that S&P was slow in adjusting the ratings in 2008.

The year 2008 was clearly an outlier, so therefore we also show the average excluding 2008 in the row above the bottom row. When we compare this average with the physical world probabilities in the last row, we observe a similar pattern as in Panel A of Exhibit 9. The risk premiums for investment-grade bonds are all positive (on average) in each individual year, but for the lower grade categories the results are ambiguous. The average default probability for the CCC/C regime is (especially when we exclude 2008) lower than the physical world probability but there are some years (2008 and 2011) where the default probability is higher.

So the pattern that we see in Panel A of Exhibit 9 for our model is also present in the S&P data in Panel B of the table, which could thus indicate that higher-yield bonds are on average expensive or that the physical world probabilities are more time-varying. We have taken the global average S&P transition matrix over the years 1981 to 2011 and there is evidence that the physical default probability varies over time. In good credit years (2006 and 2007) there is a lower probability of default and therefore the physical default probability is also lower.

There is, however, one important difference between the categorization of companies in our model and the categorization by S&P. S&P assigns a company to a single rating, while in our regime-switching model the default probability is a mixture of distributions. For example, RadioShack Corporation (RSH) at the end of 2013 had a CDS spread of 1140 bps, a default probability of 19.0%, and an S&P rating of CCC-. In our regime-switching model, on the other hand, RSH had probabilities of being in three regimes. It had a 6% probability of being in regime B, an 80.5% probability of being in regime CCC/C, and a 13.5% probability of being in regime D. Since this last regime had a default probability of 100% and since a company can have probabilities of being in more than one regime, we believe that the mean in our model of the CCC/C regime is normally lower than the results from the S&P transition matrix. Also, realize that we gave an example for the constant risk premium model with a rolling sample of one year and therefore other models and specifications might have other results.

**Time-varying risk premiums.** The model with the time-varying risk premiums in expression (20) has three more parameters (?_{1}, ?_{2}, and ?_{3}) than the constant risk premium model, representing the level, slope, and curvature factors of the exogenous indicator variables, while the model with the constant risk premium is nested (if ?_{1}, ?_{2}, and ?_{3} are set to zero). We perform the same tests as with the constant risk premium: we run the model at each year-end and distinguish between a 52-week rolling window and an expanding window where we include all weeks from the beginning of our dataset.

We could show the results for each and every year-end, but that would produce a lot of tables since we have 14 exogenous indicator variables. Instead, Exhibit 10 shows only the result for the expanding sample at the end of 2013. This is probably the most interesting as it includes the full sample and shows the impact of the exogenous indicator variables over time. In the first row we repeat the result for the model with the constant risk premium that is nested in the time-varying risk premium model when we set the parameters for the exogenous indicator variables to zero. The column labeled “likelihood” shows the value of log-likelihood of the different model specifications. We observe that the inclusion of the Chicago Fed National Activity Index (CFNAI) indicator raises the likelihood the most, from 2193 to 2217, while the 10-year Treasury raises the likelihood only from 2193 to 2198.

A change in the log-likelihood does not necessarily mean that the model fits better. Therefore, we performed the likelihood ratio test and report it in the last column of Exhibit 10 to gauge if the difference between the two likelihoods is statistically significant. We test if the addition of the three parameters (?_{1}, ?_{2}, and ?_{3}) collectively result in a statistically significant improvement in the model’s fit. Since we test for the three parameters all together, we set the number of degrees of freedom to three. The higher the test statistic, the lower the p-value (in the row below) of the hypothesis that the model with no exogenous indicator variables explains as much is true. It can be seen that adding the exogenous indicator variables is statistically significant for all exogenous indicator variables up to the 5% level.

Besides looking at the significance of including all three parameters together, we look also at each of the three parameters individually. We could do that by performing the log-likelihood test for each parameter individually, but instead we look at the significance of the parameter via the inverse of the Hessian matrix as we have done with the constant risk premium model. We see in Exhibit 10 that the curvature parameter ?_{3} is not significant for any of the exogenous indicator variables. Either the level or the slope parameters (?_{1} and ?_{2}), on the other hand, are significant for each and every exogenous indicator variable.

Only one indicator variable (Credit Spread) shows that both the level and slope parameters are significant, all other indicators being either significant for the level parameter or the slope parameter. The unemployment rate, for example, has a significant negative parameter for the slope. Therefore, if the unemployment rate is higher than the sample mean, the mean of the higher credit quality regimes (AAA, AA, and A) is higher, and the mean of the lower credit quality regimes (BB, B, and CCC/C) is lower. This is a bit counterintuitive as one would expect the means of all regimes to increase. One explanation could be that in this mixture model, in times of high unemployment, the maximum likelihood estimator gives more weight (probability) to regime D. This regime has a fixed default probability of 100% and therefore is not dependent on the unemployment. As a result of this higher weight assigned to regime D, regime CCC/C has a lower weight and a lower default probability.

The indicator with the highest likelihood-ratio test (i.e., CFNAI) has a negative parameter for the level. As the indicator measures overall activity and inflationary pressures, a negative parameter indicates that if economic activity is higher than the sample mean, the mean of all the regimes goes down. This is intuitive because when the economic environment is good, there is less probability of a default.

We leave it to further research to determine the effect of combining several exogenous indicator variables, as for example combining the CFNAI level and the unemployment slope, to see if any other indicators have a significant parameter for the curvature. In the next section we will look at the forecasting power of the different models.

**Accuracy Ratio/Forecasting Power**

The cumulative accuracy profiles (CAP) and its accuracy ratio (AR) are often used to measure the relative rating accuracy of a rating model (see Cantor [2003]). The CAP is a graph that shows the cumulative default probability of the model (or CRA) on the horizontal axis versus the actual realizations (cumulative percentage of defaults) on the vertical axis. The default probabilities on the horizontal axis are ranked from most risky to safest and are grouped in buckets. Three lines are plotted in the graph: the perfect power curve, the random power curve, and the model power curve.

The perfect power curve shows the outcome for the perfect model: if there are *n* defaults, the perfect power curve reaches 100% on the vertical axis at the point that is equal to *n* divided by the total number of companies. The random power curve shows the 45-degree line where each point has the same value on the horizontal axis as on the vertical axis. The model power curve shows the performance of the model or CRAs. If the model adds value, the line should lie between the random power curve and the perfect power curve.

The accuracy ratio or Gini-coefficient is derived from the CAP; it measures how much of the surface between the perfect power model and the random power model is captured by the model.^{42} The Gini-coefficients can be interpreted as correlation coefficients: if the coefficient is +1 there is perfect correlation and all defaults are concentrated in the lower rating categories (CCC/C); if the coefficients are 0 then there is no relation; and when the coefficient is -1, all defaults are concentrated in the higher rating categories (AAA).

**Constant risk premium.** The most granular classification that we have for the S&P ratings is the 10 letter grades complemented with the notches, for a total of 22 different grades. To compare our implied ratings with the S&P ratings we need to have a similar granularity. We proceed therefore as follows: we rank the S&P ratings from AAA to D and assign to each a numeric value from 1 to 22. Then for the implied models for a particular date, a company has a probability ? of being in one of the eight of the regimes. We multiply this probability by the numeric value that corresponds to the S&P rating without a notch (e.g., 1 to 8). So, regime 1 corresponds with AAA and therefore a numeric value of one, regime 2 corresponds with AA and therefore a numeric value of three, etc. We let regime 7 correspond with CC, as in the S&P transition matrix it is the aggregated rating of CCC, CC, and C, so it has a numeric value of 20. Regime 8 corresponds with D so it has a numeric value of 22. The product of the probabilities and the numeric values gives the numeric rating of the company, which we round down to the closest integer, so all companies have a rating ranging from 1 to 21.

We construct three model power curves: the implied model with a rolling window, the implied model with an expanding window, and the S&P ratings. For each curve, we first calculate the CAP at each year-end and then aggregate all information of the year-ends from 2005 to 2013.^{43} We only include the first default of a company (since companies can default several times in a year) and sort on the rating indicator.

Exhibit 11 shows the CAP curves. We have in total 5,537 observations at the year-ends and 50 actual defaults. The perfect power curve therefore shows on the horizontal axis that after approximately 1%^{44} of the rated universe the vertical axis is already at 100% of the defaults (i.e., 50/50). The model power curves of both implied rating curves and S&P lie between the perfect power curve and the random power curve, indicating that they contain information. Both implied rating model power curves lie above the S&P model power curve, indicating that the CDS market gives more information about future defaults than S&P ratings. The S&P model power curve performed particularly poorly in the higher-rated categories. Lehman was rated A+ in December 2007 though it defaulted in the next year and already had a CDS default probability of 2.58% in December 2007; Washington Mutual had a rating of A- in December 2007 though it defaulted in 2008 and had a CDS default probability of 8.15% in December 2007.

The Gini coefficients show in a single number which default indicator is more accurate.^{45} The constant risk premium model estimated with a rolling sample has the highest Gini coefficients (94.4%); the constant risk premium model estimated on an expanding sample followed suit (93.9%); and the S&P rating performed less well (86.8%).

**Time-varying risk premium. **
Exhibit 12 shows the forecasting power for the different models year by year and also aggregated over the years. The first row, which reports the Gini coefficient for the S&P rating, shows what we have seen already in the CAP curve in the previous section: S&P ratings are not as accurate as the implied model. Especially during the credit crisis, the Gini coefficients were low. In 2007, the Gini coefficient for the S&P ratings was only 29.9%. That is, the classification of ratings at the end of 2007 had almost no prediction power for defaults in 2008 (remember that 0% is equal to the random model).

Observe also that because we had no defaults in our database for 2007 and 2013 (see also Exhibit 6), we have no Gini-coefficient for the years 2006 and 2012.^{46} In the last three columns we show the aggregated results, just as we did in the previous section when we constructed the CAP curve. The constant risk premium model with a rolling sample had higher Gini-coefficients than the S&P ratings as can be seen in Exhibit 12 in the column labeled Gini. To judge if this difference is statistically significant, we carry out the statistical test for the difference between the two rating models via a statistic developed by Delong, Delong, and Clarke-Pearson [1988].^{47} This statistic is developed to test if the differences between the receiver operating characteristics (ROC) are different from zero, but since there is a direct relationship between the ROC and the Gini coefficients (see for example Engelmann, Hayden, and Tasche [2003]), the resulting p-values are the same. The column labeled “p-value1” in Exhibit 12 shows that the constant risk premium model with a rolling sample is different from the S&P ratings only at a significant level of 9%.

When we add exogenous indicator variables to the constant risk premium model with a rolling sample the Gini-coefficient increases on average 0.7% over all the indicators. Two models (10-year Treasury Yield and Credit Spread) show a Gini-coefficient improvement of more than 1%. We can also use Delong, Delong, and Clarke-Pearson [1988] to test if the Gini-coefficients for the models that include an exogenous indicator are significantly different from the constant risk model. The last column (labeled “p-value2”) shows the p-values for the validity of this null-hypothesis. It shows that for the rolling sample the 10-Year Treasury Yield is significant at a 5% level and the Credit Spread at a 10% level.

The constant risk premium model with an expanding sample performed better than the S&P rating but worse than the rolling sample. The p-value that the constant risk premium model with an expanding sample is different than the S&P model is 11%. When we add the exogenous indicator variables, we see that the Gini-coefficients increase on average 0.5% and that the p-values of these models give the same information as the S&P ratings decrease. The ones that had high significance in Exhibit 10 especially perform well. The S&P Volatility indicator increases the Gini-coefficient more than 1% compared to the constant model; however the p-value that this model gives the same information as the S&P rating is only 7%. The last column in Exhibit 12 (labeled “p-value2”) shows that the addition of the exogenous indicators gives no statistical different Gini-coefficients compared to the constant model.^{48}

**Stability of Ratings**

There is a trade-off between the accuracy of credit ratings and the stability of credit ratings (see Cantor and Mann [2006]). When two rating systems have the same accuracy, investors prefer the rating system with the highest stability because rating changes affect behavior and thus have an economic impact. Several measures are available to determine the stability of a rating system, including the frequency of rating changes, the frequency of large rating changes, and the frequency of rating reversals (Cantor [2003]), but we will focus in this article on the average universe rating changes.

In Exhibit 13 we report the average ratings for the universe. As before, we first translate the S&P ratings and the implied models to a numeric rating and then calculate the average numeric rating for the universe. We then convert the numeric rating back into the S&P nomenclature. The S&P ratings show remarkable stability: the average of the S&P ratings is always around BBB, especially if we compare this to the average of the default probabilities at the bottom of Exhibit 13.

The ratings of our implied models are in the subsequent rows. It is evident that the implied ratings move much more in line with the default probabilities. There are several years where the ratings of the implied model were close to the S&P Rating (2007, 2009, 2010, and 2012), but other years they are higher (2005, 2006, and 2013), slightly lower (2011), or much lower (2008).

Especially in 2008 when the CDS premiums skyrocketed, the implied ratings were much lower than the S&P ratings. We have seen in Exhibit 12 that these lower ratings from the implied model were justified by the higher Gini-coefficient and especially by the mediocre Gini-coefficient of the S&P ratings. CRAs ratings are said to be “through the cycle” ratings while market-based ratings are said to be “point in time” ratings. The benefit is that the implied ratings are more accurate as can be seen from Exhibit 12, but the cost, as seen in Exhibit 13, is that the ratings are much more volatile (see for example Kiff, Kisser, and Schumacher [2013]).

We measure the stability of the ratings by the standard deviation of the yearly average numeric upgrade or downgrade of the universe. As we know that 2008 was a very extreme year, we compute the standard deviation both over all the years (2005–2013) and over the years after the credit crisis (2009–2013). A standard deviation of 3 implies that the average universe rating changes one full letter grade per year. The standard deviation for the full period is much higher than in the past five years. The standard deviation for the full sample varies from 1.6 to 3.6, while in the past five years it varies from 0.6 to 3.1.

The implied models that were estimated with a rolling sample showed a higher rating stability than the implied models with an expanding sample. When the CDS spreads rise, the ß parameters of the model with a rolling sample adapt fast to the higher spreads. As a result, the means of the different regimes increase rather than the ratings of companies changing. The ß parameters of the implied model with the expanding window tend to stabilize in time (see Exhibit 7) and therefore changes in the CDS spreads will result in changes of companies’ ratings while the mean of the regime stays at the same level.

**CONCLUSION**

Using a regime-switching model, CDS spreads, and a selection of exogenous indicators, in this article we developed an implied credit rating model. We did so by calibrating different specifications of the model based on two different sample sizes: a rolling window of 12 months and an expanding sample that includes all data from the start of the dataset. For the period 2005–2013, the different specifications of the implied models performed better than the S&P ratings in terms of Gini-coefficients, while the implied models estimated with a rolling window performed better than the models estimated with an expanding window.

When we appraised the models based on rating stability, S&P ratings were the most stable, followed by the implied model estimated with a rolling window and then the implied model estimated with an expanding window.

The forecasting performance and rating stability of the implied models that were estimated with rolling window showed coherence: the best models in terms of rating stability, e.g., the ones using as exogenous indicator, Unemployment, 10-Year Treasury, and Credit Spread, were also the best models in terms of Gini-coefficient. These specifications were the best implied models over the period 2005–2013.

Since the CDS market is still a young market, we could only use 9.5 years of data to test the implied ratings, while there is more than 100 years of history for credit ratings. Therefore the weak performance of the S&P ratings in our sample (especially in 2005 to 2008) could be just an outlier. At least what we have seen in our research is that in the most recent years (2009 to 2013), the S&P ratings had Gini-coefficients that were much closer to or even higher than the different specifications of the implied ratings. However in the period 2009 to 2013, the best implied models still had higher Gini-coefficients than S&P. More data are needed to validate our findings.

One could argue that one of the model’s limitations is that a company needs to have CDS spreads to get rated, while CRA ratings do not have this limitation. We have estimated the model on CDS spreads because these instruments are the purest credit instruments. Although asset swap spreads can also be used, these instruments could also include optionality related to the bond’s covenants. However if one needs a rating for a bond without a CDS spread, one could simply validate the rating of that company by inserting the asset swap spread of the bond in the implied model.

The essence of a regime-switching model is to estimate the means of the different regimes. Since these means are risk-neutral probabilities, financial market participants can use these to gain an understanding about credit premiums for each of the regimes. Moreover, when banks categorize clients according to a credit profile, they could validate the client’s risk premium with the implied models.

## ENDNOTES

↵

^{1}The leverage is dependent on the value of bonds divided by the market value of the stocks. The amount of debt outstanding (bank loans and bonds) is often only known after the company reports financial statements.↵

^{2}The credit spread of a bond can also be obtained from the yield to maturity by swapping out the interest rate via an interest rate swap. However, many bonds also embed some options, so the resulting asset swap spread is not as pure as the CDS, since it reflects compensation for other risks.↵

^{3}Different document restructuring clauses exist depending on what credit event triggers the CDS: full (CR), modified (MR), modified-modified (MM), and no restructuring (XR).↵

^{4}See Munves et al. [2007].↵

^{5}See Reyngold, Kocagil, and Gupton [2007].↵

^{6}A regime-switching model is the preferred name used by economists for a finite Markov switching model (see Fruhwirth-Schnatter [2006, pg. 315]).↵

^{7}For example, two issuers may have the same rating but have different probabilities of default.↵

^{8}Note that in a regime-switching model we need to estimate the transition matrix*Q*and the mean and standard deviation of each regime. However from equation (1) we know that the mean of a regime is equal to the last column of*Q*.↵

^{9}In our empirical section we limit the value of the default state to 0.95 so that a company can never go 100% into a default state. This is similar to what CRAs do in that their ratings up to C are based on opinions and the default state is only recorded as a fact on the first occurrence of a payment default by a company on any financial obligation taking into account a possible grace period. See S&P’s definition of default in S&P [2012, pg. 65].↵

^{10}See for example: Hull, Predescu, and White [2005] and Berndt et al. [2008].↵

^{11}The default state has no variance.↵

^{12}The choice of*N*= 8 is based on the historical transition matrix that is reported by Standard & Poor’s [2012]. The classifications are AAA, AA, A, BBB, BB, B, CCC/C and D.↵

^{13}They use the function: , where*m*is the number of parameters to be estimated and*s*is the state number.↵

^{14}Expression (16) is in fact the curvature specification from Diebold and Li [2006].↵

^{15}We tried different ?s (0.01, 0.2, and 0.3) but found no major difference among the results.↵

^{16}We tried specifications with more than one parameter, but many of those functions were not well behaved or had high collinearity.↵

^{17}Creal, Gramacy, and Tsay [2014] used variables such as the Chicago Fed National Activity Index (CFNAI) index, the monthly Moody’s seasoned BAA Corporate Bond yield (BAA), the weekly VIX, and the 20-year constant maturity U.S. Treasury yield.↵

^{18}These six credit events are: (1) Bankruptcy, (2) Failure to pay, (3) Debt restructuring, (4) Obligation default, (5) Obligation acceleration, and (6) Repudiation/moratorium.↵

^{19}CR—complete restructuring (also known as full restructuring—FR). Any restructuring event qualifies as a credit event and any bond of maturity up to 30 years is deliverable. This is standard for emerging market (EM) and municipal single name (MCDX) trades. It was standard for investment grade and high-yield trades but was replaced by MR in 2001 (definition from Markit [2008, pg. 28]).↵

^{20}MR—modified restructuring: Restructuring agreements count as a credit event but the deliverable obligation against the contract has to be limited to those with a maturity of 30 months or less after the termination date of the CDS contract or the reference obligation that is restructured (regardless of maturity). Generally used for investment-grade trades in the United States. This doc-clause started in 2001 (definition from Markit [2008, pg. 28]).↵

^{21}MM—“Modified-Modified” restructuring. In 2003, market participants in Europe found the 30-month limit on deliverable bonds to be too restrictive, so MM was introduced with a maturity limit of 60 months for restructured obligations and 30 months for all other obligations. This is used mostly in Europe (definition from Markit [2008, p. 28]).↵

^{22}XR—no restructuring (also known as NR): All restructuring events are excluded as trigger events. This is prevalent in the high yield market (definition from Markit [2008, pg. 28]).↵

^{23}Recovery rates in the Markit Corporation database are obtained from market participants who provide quotes.↵

^{24}The 40% recovery rate actually has a better forcasting power (Gini-coefficients) than the recovery rate from Markit.↵

^{25}Creal, Gramacy, and Tsay [2014] estimated the default probability with contracts under the XR clause, which results in a lower default probability.↵

^{26}They analyzed the impact of these assumptions on the results by varying*K*(to 12 and 14) and d (to 0.8 and 1), but found no major impact in their result.↵

^{27}This is of course only approximately true, since a month is not always equal to four weeks. For weekly to daily data the expression is exact.↵

^{28}As stated by JLT, the majority of the transitions to an NR originate from issuers repaying outstanding debt or bringing it below a limit of $25 million, but some are caused by insufficient information being provided by the issuer, and subsequently defaults do occur.↵

^{29}Markit Reference Data provides verified reference data to participants in the credit derivatives market (www.markit.com/product/reference-data-cds).↵

^{30}The criteria used are (1) all relevant obligation documents (prospectus, final supplements) must be available, (2) suitable DTCC trading volume for an obligation, (3) maturity of bond is greater than three years but less than 30 years, (4) the outstanding amount of the bond is greater than 50% or more than 100 million in the issues’ currency, and (5) denominated currencies are deliverable according to the current ISDA physical settlement matrix (USD, EUR, GBP, CAD, JPY, CHF, AUD). We thank Hemant Garg of Markit Corporation for sending these rules in an email.↵

^{31}We found 1,135 tickers in the Markit RED file, but for 21 the identifier was XSNOREFOBLOO, which is the Markit assigned ISIN that there is no reference bond available.↵

^{32}S&P Long Term Issue Credit Rating reflects the likelihood of payment in accordance with the terms of the obligation. Junior obligations are often rated lower than senior issues, reflecting the lower priority in bankruptcy. S&P Long Term Issuer Credit Rating reflects the overall likelihood of payment and focuses on the obligor’s capacity to pay and does not apply to any specific financial obligation. (S&P Ratings Definitions, June 22, 2012).↵

^{33}Unsolicited ratings that are free of charge are relatively rare in the U.S. corporate debt market. According to company guidelines, S&P considers issuing an unsolicited rating when debt exceeds USD 1 billion and if there is significant interest. In 2014, there were only seven companies with such a rating, compared to three in 2013. Moody’s has issued few, if any corporate ratings (*Wall Street Journal*, December 24, 2014).↵

^{34}AAA, AA, A, BBB, BB, B, CCC, CC, C, and D.↵

^{35}AAA, AA+, AA, AA-, A+, A, A-, BBB+, BBB, BBB-, BB+, BB, BB-, B+, B, B-, CCC+, CCC, CCC-, CC, C, and D.↵

^{36}The ticker EFHC has for example US873168AL29 and US873168AQ16 as reference obligations. Both bonds defaulted on November 16, 2009, and reemerged on November 17, 2009, as a CCC bond. However, the US873168AL29 bond was downgraded on December 21, 2010, to CC while the other one was only downgraded to CCC- on that same day. Then on May 2, 2011, the US873168AL29 bond defaulted while the other did not (downgraded to CC on April 4, 2011). So as per May 3, 2011, the day after the default of the reference obligation we assign CC to the ticker EFHC (until that reference bond defaulted on December 6, 2012).↵

^{37}The ticker that defaulted four times was EFHC and the ones that defaulted three times were PKS and WLH. The ones that defaulted twice are CCU, CHTR, CHTR-Holdings, ENERGFU, GM-ResCLLC, HAWKER-Acqui, LEA, STN, TXU and TXU-Texas.↵

^{38}For yearly auction results see, for example, www.creditfixings.com, a joint venture between Creditex Group Inc and Markit Group Limited.↵

^{39}We checked it on the credit fixings website www.creditfixings.com.↵

^{40}We also checked for auctions that were held in 2014, but there were no auctions held in 2014 for credit events that took place in 2013.↵

^{41}The standard deviation is given by the squared root of the inverse of the Hessian matrix.↵

^{42}The accuracy ratio (AR) is defined as:↵

^{43}This is similar to what is done in Standard & Poor’s [2012, pg. 60, chart 26].↵

^{44}It is in effect 0.9% as we divided 50 defaults by 5,537 companies.↵

^{45}We calculate the Gini-coefficients with the R package pROC (Robin et al. [2011]). This package has a function to calculate the Receiver Operation Characteristic (ROC) and there is a direct relation between the Gini-coefficients and the ROC (see for example: Engelmann, Hayden, and Tasche [2003]) as the Gini-coefficient equals 2* (ROC - 0.5).↵

^{46}Although there were no defaults in 2007 and 2013, we included these years (so end of 2006 and end of 2012) in the aggregated results.↵

^{47}We performed the test with the R-package pROC (Robin et al. [2011]). The test is effectively*T*= and is asymptotically ?^{2}distributed with one degree of freedom.↵

^{48}Since the results of the S&P ratings were very poor for the ratings given at the end of 2006, we were interested in the impact of excluding this year in the analyses. The Gini-coefficients of the S&P ratings resulted in 93.1, which was much lower than the average Gini-coefficient of the implied models (95.6), but the p-values of the null-hypothesis that the difference between the S&P ratings and the implied models was equal to null was higher. The surprise to us in excluding this year was that the p-values indicating the difference between the exogenous indicator variables and the constant model were much lower.

- © 2017 Pageant Media Ltd