Risk Education
PRM Exam Preparation

A resource for the Professional Risk Manager (PRM) exam candidate. Sample PRM exam questions, Excel models, discussion forum and more for the risk professional.


Not a member?    Join Now.
It is free & easy.   
About RiskPrep
About the PRM
  • Home
  • My Exams
    • Exam1 (Finance)
    • Exam2 (Math)
    • Exam3 (Risk)
    • Exam4 (Cases)
  • Forum
  • Blog
  • All Tutorials
  • FAQ
  • Contact us

Links to all tutorial articles (same as those on the Exam pages)

VaR disaggregation in R - Marginal and Component VaR

Written by Mukul Pareek
Created on Monday, 04 January 2016 02:32

This article is a practical and real example of how value-at-risk, marginal VaR, betas, undiversified VaR and component VaRs are calculated.  Credit to Pawel Lachowicz for the idea for this article, he has done the same thing in Python here: http://www.quantatrisk.com/2015/01/18/applied-portfolio-value-at-risk-decomposition-1-marginal-and-component-var/

 

The level of detail here may not be required for the PRM exam, for that just the previous tutorial will do (the one with the formulae and explanations).  This article will be useful for practising risk analysts - we will extract real end-of-day prices, and perform our analysis on those in R.  To run the R code, you will need to install R on your machine.  The good thing with R is that it does not require administrative privileges, so if you can download it from Cran, you can run it.  If you are looking for an introduction to R, you can look at this 20 minute video I posted a few years ago: https://www.youtube.com/user/riskprep

 

Read more...

Hits: 2089

VaR disaggregation - Marginal and Component VaR

Written by Mukul Pareek
Created on Monday, 04 January 2016 02:30

This article covers how Marginal and Component VaRs are calculated.  We follow up (in a separate article) with a real life example of how VaR, MVaR, undiversified VaR, Component VaR are calculated - based on actual price data pulled from Quandl, an open source market data website.

 

Read more...

Hits: 6655

Understanding Kurtosis

Written by Mukul Pareek
Created on Sunday, 02 December 2012 01:46

 

The topic of kurtosis can cause some confusion.  Think fat and thin tails, and "peakedness" and leptokurtosis.  How is it that having a ‘pointier’ peak means having fat tails, and how is that different from a normal distribution that just has a smaller standard deviation? And how does a t-distribution manage to have fatter tails when its peak can in fact be lower than that of the normal distribution?

Read more...

Hits: 22144

Understanding Principal Component Analysis (PCA)

Written by Mukul Pareek
Created on Monday, 12 December 2011 03:07
Before we even start on Principal Component Analysis, make sure you have read the tutorial on Eigenvectors et al here.
 
This article attempts to provide an intuitive understanding of what PCA is, and what it can do.  By no means is this a mathematical treatise, nor am I a mathematician.  This is an attempt to share how I understand PCA, and how you might internalize it.
 
The problem with too many variables
The problem we are trying to solve with PCA is that when we are trying to look for relationships in data, there may sometimes be too many variables for doing anything meaningful.  What PCA allows us to do is to replace a large number of variables with much fewer ‘artificial’ variables that effectively represent the same data.  These artificial variables are called principal components.  So you might have a hundred variables in the original data set, and you may be able to replace them with just two or three mathematically constructed artificial variables that explain the data just about as well as the original data set.  Now these artificial variables themselves are built mathematically (and I will explain that part in a bit), and are linear combinations of the underlying original variables.  These new artificial variables, or principal components, may or may not be capable of any intuitive human interpretation.  Sometimes they may just be combinations of the underlying variables in a way that make no logical sense, and sometimes they do provide a better conceptual understanding of the underlying variables (as fortunately they do in the analysis of interest rates and for many other applications as well).
 
Let us take an example.  Imagine a stock analyst who is looking at the returns from a large number of stocks.  There are many variables he/she is looking at for the companies he is analyzing, in fact dozens of them.  He has all of his data in a large spreadsheet, where each row contains a single observation (for different quarters, months, years or whatever way the data is organized).  The columns read something like the list below.  The data can be thought of as a T x n matrix, where T is the rows, with each row being an observation at a point in time, and n being the number of data fields.
 
Company name
PE ratio (Price Earnings)
NTM earnings (Next 12 months consensus earnings)
Revenue
Debt
EPS (Earning per share)
EBITDA (Earnings before interest, tax, depreciation and amortization)
LQ Earnings (Last quarter's earnings)
Short Ratio
Beta
Cash flow
…etc up to column n.
 
If the analyst wants to analyze the relationship of the above variables to returns, it is not an easy task as the number of variables involved is not manageable.  Of course, many of these variables are related to each other, and are highly correlated.  For example, if revenues are high, EBITDA will be high.  If EBITDA is high, Net Income will be high too, and if Net Income is high, so will be the EPS etc.  Similar linkages and correlations can be seen with cash flows, LQ earnings etc.  In other words, this represents a highly correlated system.
 
So what we can do with this data is to reduce the number of variables by condensing some of the correlated variables together into one single representation (or an artificial variable) called a ‘principal component’.  How good is this principal component as a representation of the underlying data it represents?  That question is answered by calculating the extent of variation in the original data that is captured by the principal component.  (All these calculations, including how principal components are identified, are explained later in this article, but for the moment let us just go along to understand conceptually what PCA is.)  The number of principal components that can be identified for any dataset is equal to the number of the variables in the dataset.  But if one had to use all the principal components it would not be very helpful because the complexity of the data is not reduced at all, and in fact is amplified because we are replacing natural variables with artificial ones that may not have a logical interpretation.  However, the advantage that principal component identification brings is that we can decide which principal components to use and which to discard.  Each principal component accounts for a part of the total variation that the original dataset had.  We pick the top 2 or 3 (or n) principal components so we have a satisfactory proportion of the variation in the original dataset.
 
At this point, let us look at what this ‘total variation’ means.  Think of the data set as a scatterplot.  If we had two variables, think about how they would look when plotted on a scatter plot.  If we had three variables, try to visualize a three dimensional plane and how the data points would look – like a cloud kind of clustering together a little bit (or not) depending upon how correlated the system is.  The ‘spread’ of this cloud is really the ‘variation’ contained in the data set.  This can be measured in the form of variance, with each of the n columns having a variance.  When all the principal components have been calculated, each of the principal components has a variance.  We arrange the principal components in descending order of the variance each of them explains, take the top few principal components, add up their variance, and compare it to the total variance to determine how much of the variance is accounted for.  If we have enough to meet our needs, we stop there, otherwise we can pick the next principal component in our analysis too.  In finance, PCA is often performed for interest rates, and generally the top three components account for nearly 99% of the variance allowing us to use the just those three instead of the underlying 50-100 variables that arise from the various maturities (1 day to 30 years, or more).
 
PCA in practice
PCA begins with the covariance (or correlation) matrix.  First, we calculate the covariance of all the original variables and create the covariance matrix.
For this covariance (or correlation) matrix, we now calculate the eigenvectors and eigenvalues. (This can be done using statistical packages, and also some Excel add-ins.)
Every eigenvector would be a column vector with as many elements as the number of variables in the original dataset.  Thus if we had an initial dataset of the size T x n (recall: rows are the observations, columns represent variables, therefore we have T observations of n variables), the covariance matrix would be of the size n x n, and each of the eigenvectors will be n x 1.
The eigenvalues for each of the eigenvectors represent the amount of variance that the given eigenvector accounts for.  We arrange the eigenvectors in decreasing order of the eigenvalues, and pick the top 2, 3 or as many eigenvalues that we are interested in depending upon how much variance we want to capture in our model.  If we include all the eigenvectors, then we would have captured all the variance but this would not give us any advantage over our initial data.
In a simplistic way, that is about all that there is to PCA.  If you are reading the above for the first time, I would understand if it all gobbledygook.  But no worries, we are going to go through an example to illustrate exactly how all of the above is done.
 
 
 
Example:
Let us consider the following data that explains the above.  This is just a lot of data, typical stuff of the kind that you might encounter at work.  Let us assume that this is data relating to stocks, with the symbols appearing in column A, and various variables relating to the symbol on the right.  What we want to do is to perform PCA on this data and reduce the number of variables from 8 to something more manageable.  We believe this can be done because if we look at the variables, some are obviously quite closely related.  Higher revenues likely mean a higher EBITDA, and a higher EBITDA probably means a higher tev (total enterprise value).  Similarly, a higher price (col C) probably means a higher market cap too – though of course we can not be sure unless we analyze the data.  But it does appear that the data is related together in a way that we can represent all of this fairly successfully using just a few ‘artificial variables’, or principal components.  Let us see how we can do that.
 
 
 
 
 
Based on the above, we can calculate the correlation matrix or the covariance matrix – either manually, or in one easy step using Excel’s data analysis feature.
 
The problem of scale – (ie, when to use the correlation matrix and when the covariance matrix?)
 
Let us get this question out of the way first.  Consider two variables for which the units of measure differ significantly in terms of scale.  For example, if one of the variables is in dollars and another in millions of dollars, then the scale of the variables will affect the covariance.  You will find that the covariance will be significantly affected by the variable that has larger numerical quantities.  The variable with the smaller numbers – even though this may be the more important number – will be overwhelmed by the other larger numbers in what it contributes to the covariance.  One way to overcome this problem is to normalize (or standardize) the values by subtracting the mean and dividing by standard deviation (ie, obtain the z-score, equal to (x – μ)/σ)) and replacing all observations by such z-scores.
 
We can then use this normalized set of observations to perform further analysis because now everything will be expressed on a standard unitless scale in a way that the mean is zero and standard deviation is 1 for both sets of observations.  Now if for this set of normalized observations you will find that the correlation and covariance are identical.  Why?  Think for a minute about the conceptual difference between correlation and covariance.  Covariance is in the units of both the variables.  In a case where observations have been normalized, they do not have any units already.  So you end up with a unitless covariance which is nothing but correlation.  Formulaically, correlation is covariance divided by the standard deviations of the two variables.  In the case where the observations have been normalized, the standard deviation of both the variable is 1 and dividing covariance by 1 leaves us with the same number.  In other words, correlation is really nothing but covariance normalized to a standard scale.
 
Because covariance includes the units of both the variables, it is affected by the scale.  Basing PCA on a correlation matrix is identical to using standardized variables.  Therefore in situations where scale is important and varies a great deal between the variables, correlation matrices be preferable.  In other cases where the units are the same and the scale does not vary widely, covariance matrices would do just fine.  If we run PCA on the same data once using a correlation matrix and another time using a covariance matrix, the results will not be identical.  Which one to use may ultimately be a question of judgment for the analyst.
 
In this case, we decide to proceed with the correlation matrix, though we could well have used the covariance matrix as all the variables are in dollars.  However, for the mechanics of calculating the principal components that I am trying to demonstrate, this does not matter – so we will proceed with the correlation matrix.
 
The correlation matrix appears as follows.  The top part is what the Excel output looks like (Data Analysis, correlation.  If you don’t have Data Analysis available in your version of Excel, that is probably because you haven’t yet installed the data analysis plug in.  Search Google on how to do that).  The lower part is merely the same thing with the items above the upper diagonal filled in.
 
 
And there will be n eigenvectors.  It is acceptable to put all the column vectors for the eigenvectors in one big n x n matrix too (which is how you will find it in some of the textbooks, and also below).
 
The eigenvectors multiplied by the underlying variables represent the principal components.  Thus if an eigenvector is say [0.25,0.23,0.22,0.30,0.40]-1, and the 5 variables in our analysis are v1, v2, v3, v4 and v5, then the principal component represented by that eigenvector is = 0.25v1 + 0.23v2 + 0.22v3 + 0.30v4 + 0.40v5.  The other principal components are similarly calculated using the other eigenvectors.
The eigenvalues for each of the eigenvectors represent the proportion of variation captured by that eigenvector.  The total variation is given by the sum of all the eigenvectors.  So if the eigenvalue for a principal component is 2.5 and the total of all eigenvalues is 5, then this particular principal component captures 50% of the variation.
 
We decide how many principal components to keep based upon the amount of variation they account for.  For example, we may select all principal components above a certain threshold of contribution to accounting for variation, or all principal components whose eigenvectors have an eigenvalue greater than 1.
 
We can now represent the original variables as a function of the principal components – each original variable is equal to a linear combination of the principal components
 
What is ‘Total variance’ that we talk about?
The total variance in the data set is nothing but the mathematical aggregation of the variance of each of the variables.  If the observations have been normalized, then the variance of each of the variables will be 1, and therefore the total variance will be 1 x n = n, where n is the number of variables.  Once principal components have been computed, there will be a total of n principal components for n variables.  The important thing to note is that the total of the variances of the principal components will be equal to the total variance of the observations.  This allows us to pick the more relevant principal components by picking the ones with the most variance and ignoring the ones with the smaller variances, and still be able to cover most of the variation in the data set.
 
Calculating eigenvalues and eigenvectors
There is no easy way I know of that can help us calculate eigenvalues and eigenvectors in Excel.  For our hypothetical example, the eigenvectors and the eigenvalues are given below.  (These were calculated in a different statistical package (R, which is free to use, and can be downloaded from www.r-project.org)).
 
 
 
 
Notice that the total of the eigenvalues is 8 – which is the same as the number of variables.  Because we effectively normalized the variables by using the correlation matrix, the total of our eigenvalues is 8 which is the sum of the individual normalized variances (=1) of each of the eight variables.  (If we had used the covariance matrix, the eigenvalues would have added to whatever the sum of the variances of each individual variable would have been.)
 
We also see that just the first 2 principal components account for nearly 75% of the variance.  If we are comfortable with the simplicity that 2 variables offer instead of 8 at a cost of losing 25% of the variation in the data, we will use 2 principal components.  If not, we can extend our model to include the third principal component which brings the total variance accounted for to nearly 88%.
 
 
Is Aw = λw?
Do our eigenvalues and eigenvectors satisfy Aw = λw, where w is the eigenvector, A is a square matrix, w is a vector and λ is a constant?  Let us test that.  In this case, A is our correlation matrix, w is the eigenvector and λ is the eigenvalue.  We find this relationship to be true for all eigenvectors.
 
 
 
Constructing principal components from eigenvectors
We derive the principal components from the eigenvectors as follows.
 
PC1 = -0.02070*quantity  -0.14822*entry_price  0.04413*profit_dollar  -0.47031*market_cap  -0.37456*cash_and_marketable  -0.47874*tev  -0.46308*revenues  -0.41295*ebitda
 
PC2 = 0.60642*quantity  -0.63912*entry_price  0.33582*profit_dollar  0.00458*market_cap  0.30794*cash_and_marketable  0.01122*tev  0.04846*revenues  -0.11699*ebitda
 
PC3 = -0.44257*quantity  0.16687*entry_price  0.81688*profit_dollar  -0.00946*market_cap  0.23303*cash_and_marketable  -0.03034*tev  0.08786*revenues  -0.21437*ebitda
 
And so on.
 
How do we use the principal components for further analysis?  For any observation, we set up a routine to calculate the principal component equivalent and use that instead of the original variables – whether for regression, or any other kind of analysis.  So PCA is merely a means to an end – it does not give you an ‘answer’ that you can use right away.
 
Interpreting principal components
We look at the coefficients assigned to each of the principal components, and try to see if there is a common thread between the factors (in this case, quantity, entry_price, profit_dollar etc are the ‘factors’).  The common thread could be an underlying common economic cause, or other explanation that makes them similar.  In our hypothetical example, we find that the first principal component is heavily “loaded” on the last 5 factors.  The second principal component is similarly “loaded” on the first three, and also a bit on the fifth one.  (You can see this by examining the eigenvectors for the respective principal component.)
 
 
In the case of interest rates, by looking at how the principal components are constructed, we find that the first 3 principal components are called the trend, the tilt and the curvature components.
 
That is all for PCA folks!  Hope the above made sense and is helpful.  Feel free to reach out to me by email if you think something above is not correct or can be improved, or if you just have a comment or a question.
 
 

This article attempts to provide an intuitive understanding of what PCA is, and what it can do.  PRMIA has been asking questions on PCA, but the way the subject is presented in the Handbook is not appropriate for someone who has not studied it before in the classroom. This article aims to provide an intuitive understanding of what PCA is so you can approach the material in the handbook with confidence.

 

But before we even start on Principal Component Analysis, make sure you have read the tutorial on Eigenvectors et al here.  Otherwise much of this is not going to make any sense.

Read more...

Hits: 43677

Regression Analysis

Written by Mukul Pareek
Created on Sunday, 11 December 2011 01:43
Linear regression is an important concept in finance and practically all forms of research.  It is also used extensively in the application of data mining techniques.  This article provides an overview of linear regression, and more importantly, how to interpret the results provided by linear regression.  We will discuss understanding regression in an intuitive sense, and also about how to practically interpret the output of a regression analysis.  In particular, we will look at the different variables such as p-value, t-stat and other output provided by regression analysis in Excel.  We will also look at how regression is connected to beta and correlation.
Imagine you have data on a stock’s daily return and the market’s daily return in a spreadsheet, and you know instinctively that they are related.  How do you figure out how related they are?  And what can you do with the data in a practical sense?  The first thing to do is to create a scatter plot.  That provides a visual representation of the data
What appears below is a scatter plot of Novartis’s returns plotted against the S&P 500’s returns (data downloaded from Yahoo finance).  Here is the spreadsheet with this data.
A regression model expresses a ‘dependent’ variable as a function of one or more ‘independent’ variables, generally in the form:
y = α + β_1 x_1+ β_2 x_2….+ ϵ
What we also see below is the fitted regression line, ie the line that expresses the relationship between the y variable, called the dependent variable, in this case the returns for the Novartis stock.
What we are going to do next is go deeper into how regression calculations work.  For this article, I am going to limit myself to one independent variable, but the concepts discussed apply equally to regressing on multiple independent variables.  Regression with a single dependent variable y whose value is dependent upon the independent variable x is expressed as
y= α +βx+ ϵ
whereα is a constant, so is β. x is the independent variable and ϵ is the error term (more on the error term later).  Given a set of data points, it is fairly easy to calculate alpha and beta – and while it can be done manually, it can be done using Excel using the SLOPE (for calculating β)and the INTERCEPT (α) functions.
If done manually, beta is calculated as:
β = covariance of the two variables / variance of the independent variable
Once beta is known, alpha can be calculated as
α = mean of the dependent variable (ie y) - β * mean of the independent variable (ie x)
Beta and correlation
At this point it is important to point out the relationship between beta and correlation.
[insert image]
Predicted versus the observed value
Now let us go back to the initial equation:
y = α +βx+ ϵ
Now that we have α and β, it is probably possible to say that we can ‘predict’ y if we know the value of x.  The ‘predicted’ value of y is provided to us by the regression equation.  This is unlikely to be exactly equal to the actual observed value of y.  The difference between the two is explained by the error term - ϵ.  This is a random ‘error’ – error not in the sense of being a mistake – but in the sense that the value predicted by the regression equation is not equal to the actual observed value.  This error is ‘random’ and not biased, which means that the mean of ϵ across all data points is zero.  Some observations are farther away from the predicted value than others, but the sum of all the differences add up to zero.  Intuitively, the smaller the observed values of ϵ, the better is our regression model.  How do we measure how small the values of ϵ are?  One obvious way would be to add them up and divide by the number of observations to get an ‘average’ value per data point – but that would just be zero as just explained.  So what we do is the next best thing: take a sum of the squares of ϵ and divide by the number of observations. For a variable whose mean is zero, this is nothing but its variance.
This number is called the standard error of the regression line – and you may find it referred to as ‘standard error of the regression’, ‘standard error of the estimate’ or even ‘standard error of the line’, the last phrase being from the PRMIA handbook.
This error variable ϵ is considered normally distributed with a mean of zero, and a variance equal to σ^2.
The standard error can be used to calculate confidence intervals around an estimate provided by our regression model, because using this we can calculate the number of standard deviations either side of the predicted value and use the normal distribution to compute a confidence interval.  We may need to use a t-distribution if our sample size is small.
Interpreting the standard error of the regression
The standard error of the regression is a measure of how good our regression model is – or its ‘goodness of fit’.  The problem though is that the standard error is in units of the dependent variable, and on its own is difficult to interpret as being big or small.  The fact that it is expressed in the squares of the units makes it a bit more difficult to comprehend.
(RMS error: We can also then take a square root of this variance to get to the standard deviation equivalent, called the RMS error, RMS standing for Root Mean Square – which is exactly what we did, we squared the errors, took their mean, and then the square root of the resultant.  This takes care of the problem that the standard error is expressed in square units.)
Coming back to the standard error - what do we compare the standard error to in order to determine how good our regression is?  How big is big?  This takes us to the next step – understanding the sums of squares – TSS, RSS and ESS.
TSS, RSS and ESS (Total Sum of Squares, Residual Sum of Squares and Explained Sum of Squares)
Consider the diagram below.  Yi is the actual observed value of the dependent variable, y-hat is the value of the dependent variable according to the regression line, as predicted by our regression model.  What we want to get is a feel for is the variability of actual y around the regression line, ie, the volatility of ϵ.  This is given by the distance yi minus y-hat.  Represented in the figure below as RSS.  The figure below also shows TSS and ESS – spend a few minutes looking at what TSS, RSS and ESS represent.
Now ϵ = observed – expected value of y
Thus, ϵ = yi – y-hat.  The sum of ϵ is expected to be zero. So we look at the sum of squares:
The value of interest to us is = Σ (yi – y-hat)^2.  Since this value will change as the number of observations change, we divide by ‘n’ to get a ‘per observation’ number.  (Since this is a square, we take the root to get a more intuitive number, ie the RMS error explained a little while earlier.  Effectively, RMS gives us the standard deviation of the variation of the actual values of y when compared to the observed values.)
If s^2 is the standard error of the regression, then
S^2 = RSS/(n – 2)
(where n is the number of observations, and we subtract 2 from this to take away 2 degrees of freedom*.)
Now  TSS = Σ (y_i– y ̅ )^2
RSS = Σ (y_i– y ̂ )^2
ESS = Σ (y ̂- y ̅ )^2
How good is the regression?
Intuitively, the regression line given by α + βx will be a more accurate prediction of y if the correlation between x and y is high.  We don’t any math to say that if the correlation between the variables is low, then the quality of the regression model will be lower because the regression model is merely trying to fit a straight line on the scatter plot in the best possible way.
Generally, R^2, called the coefficient of determination, is used to evaluate how good the ‘fit’ of the regression model is.  R^2 is calculated as ESS/TSS, ie the ratio of the explained variation to the total variation.
R^2 =  ESS/TSS
R^2 is also the same thing as the square of the correlation (stated without proof, but you can verify it in Excel).  Which means that our initial intuition that the quality of our regression model depends upon the correlation of the variables was correct.  (Note that in the ration ESS/TSS, both the numerator and denominator are squares of some sort – which means this ratio explains how much of the ‘variance’ is explained, not standard deviation.  Variance is always in terms of the square of the units, which makes it slightly difficult to interpret intuitively, which is why we have standard deviation.)
How good are the coefficients?
Our regression model provides us values for α and β.  These, after all, are only estimates.  We can assess how good these estimates are, and how ‘significant’ they are.  As estimates, the calculated values represent point estimates that have a range of possibilities for the true value to be on either side of this estimate – depending upon their standard deviation.
We can calculate the standard deviation of both alpha and beta – but the formulae are pretty complex if calculated manually.  Excel does a great job of providing these standard deviations as part of its Data Anaslysis, Regression functionality as we shall see in a moment.
Once the standard deviations, or the standard errors of the coefficients are known, we can determine confidence levels to determine the ranges within which these estimated values of the coefficients lie at a certain level of significance.
Intuitively, we know that alpha and beta are meaningless if their value is zero, ie, if beta is zero, it means the independent variable does not impact the dependent variable at all.  Therefore one test often performed is determining the likelihood that the value of these coefficients is zero.
This can be done fairly easily – consider this completely made up example.  Assume that the value of beta is 0.5, and the standard error of this coefficient is 0.3.  We want to know if at 95% confidence level this value is different from zero.  Think of it this way: if the real value were to be zero, how likely is it that we ended up estimating it to be 0.5?  Well, if the real value were to be zero, and were to be distributed according to a normal distribution, then 95% of the time we would have estimated it to be in the range within which the normal distribution covers 95% of the area under the curve on either side of zero.  This area extends from -1.96 standard deviations to +1.96 standard deviations on either side of zero.  This should be -0.59 (=0.3*1.96) to +0.59.  Since the value we discovered was 0.5, it was within the range -0.59 to 0.59, which means it is likely that the real value was indeed zero, and that our calculation of that as 0.5 might have been just a statistical fluke. (What we did just now was hypothesis testing in plain English.)
Determining the goodness of the regression - The ‘significance’ of R^2 & the F statistic
Now the question arises as to how ‘significant’ is any given value of R^2? When we speak of ‘significance’ in statistics, what we mean is the probability of the variable in question being right.  It means that we believe that the variable or parameter in question has a distribution, and we want to determine if the given value falls within the confidence interval (95%, 99% etc) that we are comfortable with.
The value of R^2 follows an F-distribution.  The F-distribution has two parameters – the degrees of freedom for each of the two variables ESS and TSS that have gone into calculating R^2.  The F-distribution has a minimum of zero, and approaches zero to the right of the distribution.  In order to test the significance of R^2, one needs to calculate the F statistic as follows:
F statistic = ESS / (RSS/(T-2) ), where T is the number of observations.  This F statistic can then be compared to the value of the F statistic at the desired level of confidence to determine its significance.
This is best explained with an example:
Imagine ESS = 70, TSS = 100, and T=10 (all made up numbers).
In this case, R^2 = 0.7 (=20/100)
Since ESS + RSS = TSS, RSS = 30 (= 100 – 20)
Therefore the F statistic = 20/(30/(10-2)) = 5.33
Assume we want to know if this F statistic is significant at 95%.  We find out what the F statistic should be at 95% - and compare that to the value of ‘5.33’ we just calculated.  If ‘5.33’ is greater than the value at 95%, we conclude that R^2 is significant at the 95% level of confidence (or is significant at 5%).  If ‘5.33’ is less than what the F value is at the 95% level of confidence, we conclude the opposite.  This means
The value of F distribution at the desired level of confidence (= 1 – level of significance) can be calculated using the Excel function =FINV(x, 1, T – 2).  In this case, =FINV(0.05,1,8)= 5.318.  Since 5.33>5.318, we conclude that our R^2 is significant at 5%.
We then go one step further – we can determine at what level does this F statistic become ‘critical’ – and we can do this using the FDIST function in Excel.  In this case, =FDIST(5.33,1,8) =0.0498, which happens to be quite close to 5%.  The larger the value of the F statistic, the lower the value the FDIST function returns, which means a higher value of the F statistic is more desirable.
Understanding Excel’s Regression results
Consider a made up example of two variables as follows:
We then perform the regression analysis as follows:
Note 1: Coefficient of determination (E5)
The coefficient of determination is nothing but R^2, something we discussed in detail earlier (ie, the ESS/TSS), and equal to the square of the correlation.
Note 2: Adjusted R^2 (E6)
Adjusted R^2 is a more refined way of calculating the coefficient of determination.  It is possible to increase R^2 by including more explanatory variables in the regression, and while the value of R^2 may increase due to this, it may not make the model any superior because all we have achieved is a misleading overfitted model that may not provide any better predictions.
The adjusted R^2 takes into account the number of independent variables in the model, and the sample size, and provides a more accurate assessment of the reliability of the model.
Adjusted R^2 is calculated as 1 – (1 – R^2)*((n-1)/(n-p-1)); where n is the sample size and p the number of regressors in the model.
Ie, Adjusted R^2 = 1 – (1 – R^2 )*((n-1)/(n-p-1))
In this case, R^2 = =1 - ((1 - E5)*((10 - 1)/(10 - 1 - 1))) = 0.4745
Note 3: Standard error (E7)
Standard error = SQRT(RSS/(T – 2)) where T is the sample size.  We reduce 2 from the sample size to account for the loss of two degrees of freedom, one for the regression estimate itself, and the second for the explanatory variable.
In this case, standard error = SQRT(56.1 / (10 – 2)) = 2.648
Note 4: F  (H12)
The F statistic is explained earlier in this article.  Calculated as ESS / (RSS/(T-2)), in this case that is =64/(56.1/8)
Note 5: Significance F
Significance F gives us the probability at which the F statistic becomes ‘critical’, ie below which the regression is no longer ‘significant’.  This is calculated (as explained in the text above) as =FDIST(F-statistic, 1, T-2), where T is the sample size.  In this case, =FDIST(9.126559714795,1,8) = 0.0165338014602297
Note 6: t Stat
The t Stat describes how many standard deviations away the calculated value of the coefficient is from zero.  Therefore it is nothing but the coefficient/std error.  In this case, these work out to 3.86667/1.38517=2.7914 and 0.6667/0.22067 = 3.02101 respectively.
Why is this important?  This is because if the coefficient for a variable is zero, then the variable doesn’t really affect the predicted value.  Though our regression may have returned a non-zero value for a variable, the difference of that value from zero may not be ‘significant’.  The t Stat helps us judge how far is the estimated value of the coefficient from zero – measured in terms of standard deviations.  Since the value of the coefficients follows the t distribution, we can check, at a given level of confidence (eg 95%, 99%), whether the estimated value of the coefficient is significant.  All of Excel’s regression calculations are made at the 95% level of confidence by default, though this can be changed using the initial dialog box when the regression is performed.
Note 7: p value
In the example above, the t stat is 2.79 for the intercept.  If the value of the intercept were to be depicted on a t distribution, how much of the area would lie beyond 2.79 standard deviations?  We can get this number using the formula =TDIST(2.79,8,2) = 0.0235.  That gives us the p value for the intercept.
Note 8: Lower and upper 95%
Assume the coefficient (either the intercept or the slope) has a mean of 0, and a standard deviation as given.  Between what values either side of 0 will 95% of the area under the curve lie?  This question is answered by these values.
If the estimated value of the coefficient lies within this area, then there is a 95% likelihood that the real value could be anything within this area, including zero.  These ranges allow us to judge whether the values of the coefficients are different from zero at the given level of confidence.
How is this calculated?  In the given example, we first calculate the number of standard deviations for the given confidence level either side of zero that we can go, and we assume a t distribution.  In this case, there are 8 degrees of freedom and therefore the number of SDs is TINV(0.05,8) = 2.306.  We multiply this by the standard error for the coefficient in question and add and subtract the result from the estimate.  For example, for the intercept, we get the upper and lower 95% as follows:
Upper 95% = 3.866667 + (TINV(0.05,8) * 1.38517) = 7.0608 (where 3.866667 is the estimated value of the coefficient per our model, and 1.38517 is its standard deviation)
In the same way,
Lower 95% = 3.866667 - (TINV(0.05,8) * 1.38517) = 0.67245
We do the same thing for the other coefficient, and get the upper and lower 95% limits
Lower 95% = 0.66667 - (TINV(0.05,8) * 0.220676) = 0.15779
Upper 95% = 0.66667 + (TINV(0.05,8) * 0.220676) = 1.174459
*A note about degrees of freedom:  A topic that has always confused me, but has never been material enough to warrant further investigation.  This is because as you subtract 1 or 2 from your sample size n, its impact vanishes rapidly as n goes up.  For most practical situations it doesn’t affect things much, so I am happy to accept it as explained in textbooks.

Linear regression is an important concept in finance and practically all forms of research. It is also used extensively in the application of data mining techniques. This article provides an overview of linear regression, and more importantly, how to interpret the results provided by linear regression. We will discuss understanding regression in an intuitive sense, and also about how to practically interpret the output of a regression analysis. In particular, we will look at the different variables such as p-value, t-stat and other output provided by regression analysis in Excel. We will also look at how regression is connected to beta and correlation.

Read more...

Hits: 78602

Modeling interest rate changes

Written by Mukul Pareek
Created on Thursday, 28 July 2011 01:38
Modeling the behavior of short term interest rates
We have seen how we expect stock prices to behave – returns being normally distributed and consequently prices distributed lognormally.  If you recall, the process for the behavior of stock prices is explained as follows
The same process however cannot be applied to interest rates.  This is because while stock prices may follow a random walk, interest rates are generally considered mean reverting.  Mean reverting means they tend to come back to some long term average, and can’t increase or decrease indefinitely.
There are a number of ways that are used for modeling short term interest rates.  These have not been covered in the PRMIA handbook, but they find a reference in one of their study guides.  So just to be cautious, a bit of explanation for these is provided here in case there are questions in the exam relating to these concepts.
What should you know? Well, in all likelihood you will not be asked a question on this topic, but if you have 20 minutes, have a read and you will be slightly better prepared.
The models of short term interest rates help determine the shape of the yield curve, and option pricing on bond options.  There are two broad categories of models of the short rate – Equilibrium models, and No-arbitrage models.
Equilibrium models
The Vasicek model
In the Vasicek model, interest rates can be modeled using the following equation:
dr = a(b – r)dt + σ dz
where dr is the change in the rate, a is the ‘speed of reversion’ to the mean, b is the long term mean for the rate, σ is the volatility of the rate, and dz is a weiner process.  (Recall that for our practical purposes a ‘weiner process’ is nothing but a random drawing from a normal distribution).  Interest rate volatility is constant at σ.
With the Vasicek model, rates will get pulled to the long term mean ‘b’ because of the first term in the formula above.
With the equation dr = a(b – r)dt + σ dz it is possible (with some more mathematics, and with known values of a, b and r) to determine the price of a zero coupon bond at any time t in the future, and thereby construct an yield curve.  This yield curve will likely be different from the real one in the markets at the present time, and this divergence from reality is a big limitation of the Vasicek model.  One way to overcome this limitation is to choose a, b and r in such a way that we get an yield curve close to the current yield curve, but that is often not a practical solution as such ‘fitting’ still leaves large errors.
The Cox, Ingersoll, and Ross model
According to the Cox, Ingersoll and Ross model, the risk neutral process for the short rate r  is as follows:
dr = a(b – r) dt + σ √r dz
The only difference between the Vasicek model and this is that the change in the short rate is also determined by r, in the form of √ r being incorporated in the formula.  This has the effect of increasing the standard deviation of the rate as r goes up.
Both the Vasicek and the Cox, Ingersoll and Ross models are single factor models, dependent only upon the value of r as the single factor driving changes to short rates.
No-arbitrage models over this limitation by providing the presently known yield curve as an input into the model.  The ‘drift’ of the interest rate is determined from the yield curve – implying that if the yield curve is sloping downwards, then the drift is negative, zero if the yield curve is flat, and positive if the yield curve is rising.  The drift has ‘time’ associated with it, ie the rise or fall of the rate per unit of time.  The model therefore ends up predicting the range of short term interest rates around the known yield curve.
No-arbitrage models
The yield curves predicted by the equilibrium models are generally different from what are being observed at the current time in the markets.  It is possible to tweak the values of a and b to arrive to a close fit, but still large errors often remain.  This problem is solved by using the current term structure as an input into the model so the model ends up being exactly consistent with the today’s term structure.
The Ho-Lee model
Under the Ho-Lee model,
dr = Θ (t) dt + σ dz
All variables have their usual meaning, but Θ(t) is a function of time selected to fit the initial term structure.  Θ(t) can be analytically calculated using a formula (probably not relevant for the exam), suffice it to say it is a function time t.
The Hull White model
The Hull White model is an extension of the Vasicek model with the difference that it can be made to fit the current term structure.
Under the Hull White model:
dr = [Θ(t} – ar]dt + σ dz
dr = a[Θ(t}/a– r]dt + σ dz
The mean reversion happens at the rate a.   At time t, the rate shows a mean reversion to Θ(t)/a at rate a.
The Black-Derman-Toy model
Under the Ho-Lee and Hull-White models, interest rates can become negative.  The BDT model allows only positive interest rates, and is as follows:
d ln⁡r= [Θ(t)–σ^'/σ  ln⁡(r) ]dt + σ(t)dz
Where Θ(t) is a function of time, and σ is the volatility of the short rate.
The models of short rates are used to price fixed income derivatives and bonds.

Modeling the behavior of short term interest rates

 

There are a number of ways that are used for modeling short term interest rates. These have not been covered in the PRMIA handbook, but they find a reference in one of their study guides. So just to be cautious, a bit of explanation for these is provided here in case there are questions in the exam relating to these concepts.

 

What should you know? Well, in all likelihood you will not be asked a question on this topic, but if you have 20 minutes, have a read and you will be slightly better prepared.  Recognize the formulae for the 5 models mentioned here, and try to remember some of the differences between the no-arbitrage and equlibrium models.  Or you could entirely skip this as well.  I will be putting a few questions in on this subject, but will clearly mark them as unlikely.

Read more...

Hits: 7836

Volatility, returns and the behavior of stock prices

Written by Mukul Pareek
Created on Sunday, 15 May 2011 03:01

This article follows the earlier tutorial on stochastic processes.  It talks about a couple of related things, and tries to provide an intuitive understanding of some of the following concepts:

  1. Continuous returns and discrete returns models of stock prices,
  2. An understanding of how volatility affects the net realized returns obtained by an investor, and how the 'average' of discrete returns over a period of time can be misleading,
  3. How the true returns obtained by an investor are driven lower by the volatility of the discrete returns,
  4. The difference between log returns and discrete returns, and how the two are related, and
  5. Why is the formula for expected stock prices in the future stoch2_semu and not e^(mu+sigma^2 /2)

 

Read more...

Hits: 15118

CreditRisk+, or the actuarial approach to measuring credit risk

Written by Mukul Pareek
Created on Thursday, 30 December 2010 01:35

This is the final of five articles - each explaining at a high level one each of the five credit risk models in the PRMIA handbook.  This write-up deals with the actuarial, or the 'CreditRisk+' model.Credit Risk +, or the actuarial approach

Read more...

Hits: 9415

The KMV approach to measuring credit risk

Written by Mukul Pareek
Created on Wednesday, 29 December 2010 02:04

This is the fourth of five articles covering each of the main portfolio approaches to credit risk as explained in the handbook.  The idea is to provide a high level and concise explanation of each of the approaches so it may be easier to deal with the detail provided in the handbook.  I hope you find it useful.

Read more...

Hits: 29996

The structural approach to credit risk

Written by Mukul Pareek
Created on Wednesday, 29 December 2010 01:28

This is the third of five articles covering credit risk - this one addresses the 'structural approach'.

Read more...

Hits: 9547

Credit Portfolio View

Written by Mukul Pareek
Created on Monday, 27 December 2010 22:13

This is the second of five articles that discuss the various approaches to measuring credit risk in a portfolio.  This article covers CreditPortfolio view.  CreditPortfolio View is conceptually not too dissimilar from the Credit Metrics model described earlier, ie it relies upon a knowledge of the transition matrices between the different credit ratings.  The only difference is that the transition matrix itself has an adjustment applied to it for the business cycle.  But once this adjusted transition matrix has been obtained, the rest of the process works in the same way as for the Credit Metrics model.

Read more...

Hits: 11709

Quick primer on Black Scholes

Written by Mukul Pareek
Created on Monday, 27 December 2010 21:08

The conceptual idea behind Black Scholes is rather simple – but as the argument advances beyond the initial idea, things become more complex with differential equations, risk-neutrality and log returns stepping in.  For the PRMIA exam, you will not be asked for a derivation of Black Scholes, so it may suffice to know just a couple of things.  This brief write-up aims to summarize just those few things.

 

Read more...

Hits: 9128

Eigenvectors, eigenvalues and orthogonality

Written by Mukul Pareek
Created on Thursday, 09 December 2010 01:30

This is a quick write up on eigenvectors, eigenvalues, orthogonality and the like. These topics have not been very well covered in the handbook, but are important from an examination point of view.

Read more...

Hits: 39443

Understanding convexity: first and second derivatives of a price function

Written by Mukul Pareek
Created on Saturday, 20 November 2010 15:31

First and second derivatives are important in finance – in particular in measuring risk for fixed income and options.  In fixed income – the first and second derivatives are modified duration and convexity respectively, and for options, these are delta and gamma.  But what do these really mean – and what does one think about them when one sees a number?  The rest of this article attempts to provide an intuitive look at how price changes for a bond (or an option) are determined by the first and the second derivative, what they mean, and how they are to be interpreted.

Read more...

Hits: 37869

Credit Migration Framework

Written by Mukul Pareek
Created on Saturday, 02 October 2010 02:05

This is the first of five articles that provide a high level understanding of the various portfolio models of credit risk covered in the PRMIA syllabus.  Being the first one, this discusses the credit migration framework.  (I am still working on the others.)  This article is intended to provide a conceptual understanding of the approach and I have not provided numerical examples for the reason that I don’t want to duplicate what is already there in the Handbook.  Once you have read this, the scattered explanation in the Handbook will hopefully make more sense.

Read more...

Hits: 11064

Combining expected values and variances

Written by Mukul Pareek
Created on Sunday, 29 August 2010 00:44

When constructing portfolios we are often concerned with the return (ie the mean, or expected value), and the risk (ie the volatility, or standard deviation) of of combining positions or portfolios.  We may also be faced with situations where we need to know the risk and return if position sizes were to be scaled up or down in a linear way.  This brief article deals with how mean and variances for two different variables can be combined together, and how they react to being added or multiplied by constants.

 

Read more...

Hits: 21409

Default Correlations

Written by Mukul Pareek
Created on Thursday, 01 July 2010 01:44

This is a brief article on default correlations – what it means, and how to interpret it.  To keep things simple, let us consider only two securities– A and B.  Let us look at how default correlations are calculated, and then try to think about how to intuitively interpret a given default correlation number between two securities. 

 

Read more...

Hits: 16459

Credit VaR - an intuitive understanding

Written by Mukul Pareek
Created on Friday, 26 March 2010 18:12

This brief article intends to clarify the differences between some concepts relating to credit VaR.  One thing to note about credit risk is that you need to watch out whether you are inferring VaR from a distribution of the value of the portfolio, or from a distribution of the losses in the portfolio.  One is a mirror image of the other, they give the same results, but they are not identical and you should intuitively understand the difference.

 

Read more...

Hits: 34909

Capital tiers under Basel II

Written by Mukul Pareek
Created on Tuesday, 23 February 2010 02:40

The constituents of capital under Basel II

Read more...

Hits: 29566

More about continuous compounding

Written by Mukul Pareek
Created on Saturday, 09 January 2010 01:19

It is important to understand continuously compounded rates.  These rates are rarely encountered in day-to-day life, but are relevant to a finance professional.  You will never see, for example, a bank advertise ‘continuously compounded rates’ for its deposits.  (In fact, it may even be against the law to do so as they may be required to disclose easier to understand APRs).

 

To understand continuously compounded rates, think about natural processes.  Think about how, for example, population grows.  It does not grow in discrete steps.  It just grows all the time. 

 

Read more...

Hits: 9242

Descriptive stats for the PRMIA exams

Written by Mukul Pareek
Created on Wednesday, 28 October 2009 20:28

This is a very brief article, perhaps unjust given what it covers.  I have tried to keep it very short, so as to be a practical reference to key statistical terms that are used throughout risk management.  This covers standard deviation, variance, covariance, correlation, regression and the famous 'square root of time' rule.  The PRMIA handbook has more stuff but this covers the key things you must know - almost by heart!

 

Read more...

Hits: 4594

VaR and heavy tails

Written by Mukul Pareek
Created on Friday, 23 October 2009 00:40

Value at risk is affected by tails and there is so much stuff in the PRMIA handbook about dealing with heavy tails.  This can be confusing as the handbook sort of presumes an understanding of how tails affect VaR - so here is a short tutorial to explain how heavy tails affect Value at Risk.

Read more...

Hits: 5192

Distributions in finance

Written by Mukul Pareek
Created on Wednesday, 21 October 2009 23:14

A lot of finance and risk management is about distributions.  For the PRMIA exam, you really need to understand the concepts underlying distributions, what different shapes mean, what the parameters are, what a cdf is (vs a pdf) and which to use when.  Of course, the most commonly used distribution assumption is that returns are normally distributed, so this article talks about the normal distribution and also other important distributions.  More importantly, I have provided spreadsheets that model each of the distributions so you can play around and see the behaviour of the distribution as you change the parameters underlying it.

Read more...

Hits: 16807

A refresher on logarithms

Written by Mukul Pareek
Created on Wednesday, 21 October 2009 22:01

A quick refresher on logarithms.  Just the basics, and not a whole lot more.  Explains what logarithm means, how it is related to e, how to add/subtract logs and convert bases.

Read more...

Hits: 4310

Sequences: Arithmetic and Geometric Progressions

Written by Mukul Pareek
Created on Wednesday, 21 October 2009 21:10

Quick refresher on arithmetic and geometric progressions - straightforward, and lists just the formulae.  Not a whole lot.

Read more...

Hits: 3213

Interest rates and continuous compounding

Written by Mukul Pareek
Created on Wednesday, 21 October 2009 20:53

If you are new to finance, or haven't actually done much math in a while, the differences between discrete, compounded and continuously compounded interest rates can be quite confusing.  You may go through many chapters in the handbook while still having a nagging doubt as to if you really get the interest rate part - sometimes they use (1+r)^n, at other times it is exp(rn), what's going on?  This brief article explains what continuously compounded interest rates are, how they work and how they are to be used. 

Read more...

Hits: 46006

Calculating forward exchange rates - covered interest parity

Written by Mukul Pareek
Created on Wednesday, 21 October 2009 20:48

An easy hit in the PRMIA exam is getting the question based on covered interest parity right.  It will come with a couple of exchange rates, interest rates and dates, and there would be one thing missing that you will be required to calculate.  This brief write up attempts to provide an intuitive understanding of how and why covered interest parity works.  There are a number of questions relating to this that I have included in the question pool, and this article addresses the key concepts with some examples.

Read more...

Hits: 132785

Modeling portfolio variance in Excel

Written by Mukul Pareek
Created on Wednesday, 21 October 2009 14:09

This article is about an Excel model for calculating portfolio variance.  When it comes to calculating portfolio variance with just two assets, life is simple.  But consider a situation when there are 10, 15, maybe hundreds of assets.  This brief article is a practical demonstration of how portfolio variance can be modeled in Excel - the underlying math, and an actual spreadsheet for your playing pleasure! Enjoy!

Read more...

Hits: 116872

Option strategies

Written by Mukul Pareek
Created on Tuesday, 20 October 2009 23:48

A brief discussion of option strategies relevant to the PRM exam

 

Read more...

Hits: 3438

Understanding option Greeks

Written by Mukul Pareek
Created on Tuesday, 20 October 2009 22:06

Option Greeks are tricky beasts with their complex formulae and intimidating names.  This article is not about the formulae at all. The idea is to discuss what each of the Greeks represent, and understand what drives each of them.  Often the PRMIA exam will ask a question about a Greek, and very likely that question will expect you to understand the relationship between the variable and the underlying asset, and how the Greeks can be used to measure and manage risks.

Read more...

Hits: 11092

Valuing an option

Written by Mukul Pareek
Created on Tuesday, 20 October 2009 21:50

This rather brief article covers some familiar theory - how are options valued - but without repeating all the text you can find in a textbook.  Of course, we all know about the Black Scholes model, and anyone can punch it into Excel or use any number of online calculators available for free.  What this article tries to do is to provide an intuitive understanding of what drives option values, and also provides an Excel model incorporating the Black Scholes that you could use to play around with.

Read more...

Hits: 3599

Introduction to vanilla options

Written by Mukul Pareek
Created on Tuesday, 20 October 2009 20:28

In this article we will discuss basic vanilla options, calls and puts, and understand payoff diagrams. We will also look at the put-call parity. The put-call parity is important to understand for the PRMIA exam as a number of questions, such as those relating to the relationship between call and put values, the additive nature of option Greeks is based upon the put-call parity.

 

Read more...

Hits: 6461

Risk adjusted performance measures

Written by Mukul Pareek
Created on Tuesday, 20 October 2009 13:18

Returns are the reward for taking risk: when there will be no risk, there will be no profits either.  This article discusses the Sharpe ratio, Treynor ratio, Information Ratio, Jensen’s alpha and the Kappa indices, which are all measures to evaluate risk adjusted performance.

 

Read more...

Hits: 17339

Stochastic processes

Written by Mukul Pareek
Created on Thursday, 08 October 2009 18:07

Stochastic processes

In finance and risk, you will always be running into what are called 'stochastic processes'.  Well, that is just a more complex way of saying that a variable is random.  A variable is considered 'stochastic' when its value is uncertain.  In finance, security returns are usually considered stochastic.  In this brief article , we look at some key concepts relating to stochastic variables, including the geometric Brownian motion process (borrowed from particle physics) which is often used to model asset returns.

Read more...

Hits: 7742
Copyright © 2018 www.RiskPrep.com. All Rights Reserved.