Singfat Chu
National University of Singapore
Journal of Statistics Education v.4, n.3 (1996)
Copyright (c) 1996 by Singfat Chu, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.
Key Words: Extrapolation; Interpretation of intercept; Model-building; Transformations.
Data presented in a newspaper advertisement suggest the use of simple linear regression to relate the prices of diamond rings to the weights of their diamond stones. The intercept of the resulting regression line is negative and significantly different from zero. This finding raises questions about an assumed pricing mechanism and motivates consideration of remedial actions.
1 Linear regression is covered in many introductory statistics courses. The availability of statistical software makes it a popular topic with students. However, students often fail to criticise the suitability of their fitted model.
2 One example that I have used to illustrate this point concerns the relationship between the price and diamond caratage of ladies' diamond rings. The application has significant mundane appeal to students who may be contemplating the purchase of such jewelry.
3 The source of the data is a full page advertisement placed in the Straits Times newspaper issue of February 29, 1992, by a Singapore-based retailer of diamond jewelry.
4 The advertisement contained pictures of diamond rings and listed their prices, diamond content, and gold purity. Only 20K ladies' rings, each mounted with a single diamond stone, were considered for this study. 20K rings are made with gold of 20 carat purity. (Pure gold is rated as 24K.)
5 There were 48 such rings of varying designs. The weights of the diamond stones ranged from 0.12 to 0.35 carats (a one carat diamond stone weighs 0.2 gram) and were priced between $223 and $1086. The jewelry store adopted a fixed-price policy.
6 I distribute photocopies of the advertisement to pairs of students and ask them to use the data to uncover, interpret, and comment on a plausible mechanism for pricing the diamond rings.
7 In Singapore, the pricing of gold jewelry is simple. The price equals the current market value of the gold content (i.e., weight times the going rate per gram of gold) plus a craftsmanship fee.
8 However, the pricing of other jewelry like diamond rings is more complicated because they are not as standardized as gold jewelry. The price of diamond jewelry depends on the four C's: caratage, cut, colour, and clarity of the diamond stone. A good cut gives a diamond more sparkle. Colourless diamonds are the most prized. A flawless diamond has maximum clarity because the passage of light is unimpeded through the stone. Cut, colour, and clarity are subjective factors and are very hard for the layman to gauge.
9 Students find it reasonable to assume that they are dealing with "commercial grade" jewelry intended for the mass market. Such jewelry is similar in terms of design, gold weight, and the diamond qualities of cut, colour, and clarity. Therefore, the carat size of the diamond stones becomes the obvious factor to use in pricing the rings.
10 Examination of a scatter plot of the data suggests the viability of a simple linear regression (SLR) model; see Figure 1. The fitted regression model is Estimated price = -259.63 + 3721.02 (carats). The fit is excellent as indicated by an R² of 0.978. The t-statistics for the intercept and slope coefficients are -14.99 and 45.50, respectively.
11 Students invariably employ an SLR model. They also perform residual analysis, and, finding nothing unusual, they are satisfied with the regression fit.
12 In their reports, the students commonly point out
13 Many students point out that extrapolation of the regression line for bigger diamond stones would not be advisable as these stones are rarer and command a different price range. A few students mention that they find the negative intercept puzzling because it suggests that a zero-carat diamond ring has a negative economic value!
14 Upon returning the assignments, I follow up on the remarks about the negative intercept and ask the students whether this warrants remedial action. Our presumed pricing mechanism implies that the intercept represents the value of the gold content plus a craftsmanship fee. Thus, it should be non-negative. Three lines of thought have emerged during the discussions.
Thought 1: No remedial action is required. The model applies only to the restricted range of the data. Extrapolation to a zero-weight stone (i.e., to the intercept) is dangerous because the underlying pricing mechanism may be piecewise linear or even nonlinear. Thus the possibility that the true intercept is non-negative cannot be excluded.
Thought 2: Remedial action is required. One proposal is to force the regression line through the origin. Some students, however, do not buy the idea that the gold content and craftsmanship fee are freebies. A computer demonstration reveals that the fit is quite poor.
Thought 3: Remedial action is required, and data transformations may be entertained. Suitable models that are intrinsically linear and have a non-negative intercept include $y = A x^B \epsilon$ and $y = \exp(A + Bx + \epsilon)$. For the latter model, ln(price) is regressed on diamond caratage using standard SLR technology. Our attempt at fitting that model turned out to be unsatisfactory; the residuals scattered along a dome pattern, cycling from negative to positive and back to negative as the carat size increased. This led us to consider the model $y = \exp(A + Bx + Cx² + \epsilon)$. This proved to be an adequate model; the residuals scattered randomly, and the adjusted R² was 0.97. The estimated fit was ln(price) = 3.89 + 14.86 (carats) - 17.54 (carats²). The anti-log of the intercept is $49. This is a reasonable figure for the value of the gold content and the craftsmanship fee. Figure 1 shows this model after back-transforming price from the log scale.
15 This paper illustrates model-building in linear regression. It is shown that a possibly counter-intuitive negative intercept may be avoided by using a multiplicative or exponential regression model. These regression models are intrinsically linear, and they are estimated using standard linear regression technology after a suitable transformation of the data.
16 The file diamond.dat.txt contains the raw data. The file diamond.txt is a documentation file containing a brief description of the dataset.
Columns 6 - 8 Size of diamond in carats 16 - 19 Price of ring in Singapore dollarsValues are aligned and delimited by blanks. There are no missing values.
Singfat Chu
Department of Decision Sciences
National University of Singapore
10 Kent Ridge Crescent
Singapore 119260