Princeton University Physics Department PHY 209 main page 
Physics 209:Week 4: Hubble's Law and Leastsquares fitting of data to a modelFall 2014 
The goals today are:
On Python, I suggest that you read or skim Chapter 9 sections 12 and Chapter 27 (or Chapter 32 in the 4th edition) in Learning Python, 3rd Edition. They are "Tuples, Files and Everything Else" and "Exception Basics", respectively. Remember that we'll be reading files and testing for exceptions this week. (Link requires an institutional subscription. That is, you must either be on campus or set up a VPN to campus.)
{x}={0,1,2,3,4}
{y}={4,2,2,2,0}
σ or {δy}={1,1,1,1,1}
numpy.array convert list or tuple to a vector of data numpy.sum sum of a vector numpy.average mean of a vector numpy.fabs absolute value numpy.mean mean of a vector numpy.std standard deviation pylab.plot 2d scatter or line plots pylab.errorbar 2d scatter plots with error bars pylab.subplot multiple plots on one window numpy.random.normal array of random values with normal (Gaussian) distribution scipy.stats.chisqprob Chisquared probability to exceed (the scipy module should be installed on the lab machines and feynman)
v
, d
, and derr
. Notice that we
have to build the data as lists and convert to arrays later, because
lists have an append
method but arrays don't. (Lists are
mutable and arrays are immutable, in the Python lingo.)
Try to add the missing pieces so that you end up with numpy arrays named v, d, and derr containing the data. If you are stuck, refer to the Python reading.from numpy import * datafile=open('hubble_low_z.dat','r') # 2nd argument says open for reading for line in datafile: # Iterating over a file goes linebyline (a,b,c)=line.split() # split() breaks text at whitespace # Somehow have to keep a record of a,b,c??? # Arrays can't be built elementbyelement. # But lists can... datafile.close() # Close the data file
read_hubble_data(filename)
. Have it
return a "tuple" of the arrays. (That is, use return
(z,d,derr)
as the last line of the function. The parentheses
are optional in that statement.) Tuples are like lists, but they are
both more efficient and more annoying (because you can't modify them,
you may only replace them with others; again, this is what Python
calls an "immutable" object). Your function should accept an argument
filename
that tells it what data file to read. Save this
function in a file (using emacs), to which we will soon add the other
features of this project. By doing this as a function, you have
encapsulated the things you've solved so far and made them easier to
use as a component in a complete solution to the bestfit problem.
You can check how Joe did it.
file=raw_input('What file contains the data? ')
or
perhaps it uses a default filename. In any of these choices, use the
exception handling pair try: ... except:
. Here's an
unrelated example of the syntax:
list=[1,3,5,7] n=int(raw_input('What list item do you want? ')) # Now do something that might produce an error. Don't be afraid! try: print 'list[',n,']=',list[n] except IndexError: print 'list has no element n=',n
pylab
plotting package (grab sample_plot.py):
Does it look like real experimental data with uncertainties but clustered near a line? Let's hope so. If so, you are in good shape. Notice how the uncertainties in distance increase for more distant galaxies. Good thing then that we are using linefit results that permit different uncertainties on each point!# Notice three new features this week # 1. plotting errorbars instead of just points or lines # 2. plotting a function (any valid python or numpy expression depending on x) # 3. drawing to a file. # Joe Oct 8, 2007 from pylab import * errorbar(z,d,derr,ecolor='r', fmt='o') # Error bars. Circles at points. No line xlabel('Redshift z') # Axis title and overall title ylabel('Distance (Mpc)') title('Supernova Hubble Diagram') plot(z,distance(z), color='green') # Overplot a theory curve savefig('sample_hubble.png')
x, y, dy
as the data? Do you want the
program to allow dy
to be a single number (implying that
all data have the same absolute error)? Today, this is just an
exercise. But for bigger projects, the design phase is critical (so
it's a good habit to build). There is no right answer, or rather
there are many, many right answers (and only practice will help you to
learn which right answers are most efficient and helpful to you).
All designed? Good. Now try writing the function(s) you have planned. You can jump right in and test it on real data. In my experience, physicists usually do this. But a wise and patient person will run the brandnew program on some wellunderstood data first, data whose best fit is totally unambiguous. That is, we need to do some tests. You should think about what tests would convince you that your fitter works. Just a few is appropriate for this kind of program. After you ponder this and write down notes on your ideas, you can look at my ideas for tests (and note on introducing classes).
Write some tests as separate functions test1()
,
test2()
and so forth. Have each test make a report of
the sums over the data, the bestfit intercept and slope and their
errors. In fact, it would be quite useful to have a function
fit_report()
that prints such things in a standard
format. Did you include a fitreporter in your original design? If
not, add one now, and run your tests. Did they work as expected?
First fit the hubble_low_z.dat
file. This
contains 18 nearby supernovae (only out to redshift z ~ 0.1),
where the Hubble relation is found to work well.
Though the Hubble law should have zero intercept, we
will fit a line
d(z) = a z + bas a test of how well the law works. Just using your judgment, how good is the fit? (We'll save the chisquared test for the next section.) What is the Hubble constant? Note that the Hubble constant is usually reported as c/a where c is the speed of light (299,800 km/s) and a is the slope you fit for. Why did we fit for the inverse of the Hubble constant? Because the distance data are far more uncertain than the velocity data! This makes fitting d(z) more straightforward than z(d). The generally accepted value for the Hubble constant is approximately 70 km per second per Mpc. Is your intercept b consistent with zero, as excected? Use the formula for the uncertainty in b to check. What is your error on the Hubble constant? Use the formula for the uncertainty in a and keep in mind that a and H_{0}=c/a have the same fractional uncertainty.
Make a plot of your raw data (as points with errorbars) and your
bestfit line. Use the pylab.savefig('filename.png')
function for your plot to save it as a PNG. You can similarly save as
a PostScript file if you give it a name like
savefig('filename.ps')
. In principle, pylab
can do EPS, JPEG, PDF, PNG, PS, and SVG, though you might need to
"switch your pylab backend" to do some of these.
Let's not fit to the higherz supernovae yet. The idea is that they are not going to fit well to a line, so we need first to study the chisquared statistic for goodnessoffit.
Compute the chisquared for your fit to the lowz Hubble data. How large is it? There are (182) degrees of freedom: 18 data values, minus two degrees of freedom that were "spent" on fitting the line (since the line has two free parameters). So we expect chisq to be roughly 16 ± 6. (The chisquared distribution has expected value of r and variance 2r when there are r degrees of freedom.) Do you get a value less than around 20 to 25? That is, is this a good fit?
Now fit hubble.dat
, the full
set, which goes out to z ~ 0.8. How large is Chisquared?
Compare this to the 602 degrees of freedom.
It is illustrative to plot the fit you got from the lowz data alone
over the full set. Astronomers expected a deviation from the Hubble
law, but not in this direction! It appears that the expansion of the
universe was slower in the past, or that the expansion is
accelerating.
For both data sets, you can compute the "probability to exceed" for
the chisquared you found (and given the number of degrees of
freedom). To be sure you are using it right, make sure that you find
a probability of 65.94% for the chisq=1.6 df=3 case that corresponds
to the prelab work. The scipy modules has a function
that computes the chisquared probability to exceed.
Use from scipy.stats import chisqprob; help(chisqprob)
.
For previous years when scipy wasn't available, Joe wrote a program
that does the same thing,
see chisqprob.py (colorized or plain). However you shouldn't need to use this code.
You recognize a really bad fit by
the fact that the data are wildly improbable to happen if the fit is
right. Usually a probability to exceed of greater than around 1% to 5%
suggests a good fit.
Aside on the cosmology: We have made a few oversimplifications for these data. First, astronomers correct for a variation in the supernova intrinsic brightness of the order 30% based on the "light curve", that is, how long the supernova explosion lasts. For reasons not well understood, we know that brighter supernovae last longer (it has to do with variations in their nickel content, but this itself is not understood). Second, we have converted the raw data (apparent brighness) into an estimate of distance, assuming identical intrinsic brightness for all Ia supernovae. Finally, we've ignored the question of whether we should expect a straight line for d(z) or d(v), given that v=cz is only true for small z. The point here is that we can detect curvature in the relationship, using statistical methods.
Generate fake Hubble data N times and fit each data set.
(N can probably be around 10,000 or more, depending on your
computer and your patience.) To make the fake data, use the given
list of independent variables (redshifts or velocities). At each
redshift, use some standard line (e.g. the one with Hubble constant of
H_{0}=70 km/s Mpc^{1} to make a theoretical value of the
distance. To this idealized value, add a Gaussian (normal) deviate with
zero mean and the appropriate standard deviations (i.e. matching the
stated errors on the data). Check out
help(numpy.random.normal)
Keep track
of the slope and intercept from each fit (e.g., build a Python list?).
Check whether the mean and standard deviation of the slope and
intercept agree with the computed values. What does a plot of
a versus b look like? Are the values anticorrelated?
What are the means and standard deviations of the two? What did you expect? Make a
histogram of their values (see pylab's hist
function)
Also, for each fit, compute chisquared. Make a histogram of its values. Does it have the expected mean and standard deviation? (They should be N and sqrt(2N) for N degrees of freedom.) Compute the probability to exceed, and histogram its values. What should that distribution look like?
Contact instructors:
Cristiano Galbiati or Shawn Westerdale

Back to this week's main page
Up to PHY 209 main page 