Harvard University

Kennedy School of Government API-205

Graduate School of Education H-013

Empirical Analysis and Program Evaluation

Professor Richard J. Light

l999 - 2000

 

 

Goal of this course.

This course has two goals. The first is to present and extend ideas from basic statistics and data analysis to students who arrive here with little or no background in these topics. The second is to apply the ideas of basic statistics to a broad variety of concrete examples from the field of program evaluation that stress applications. Many evaluation examples will emphasize how to make decisions - for example, how to improve the management of a program, or an education intervention, or a health policy. While there will be some theory and explanation of "where the statistical ideas come from," derivations are not the emphasis for this course. Rather, there will be an intense focus on solving concrete problems. The three exams will all focus on solving concrete problems.

 

How This Course Differs from Several Other Basic Statistics Courses.

We will be doing three things differently in this course from the way some other basic statistics courses are organized. First, we will cover the topic of basic probability in less depth, and instead will spend far more time on concepts from program evaluation. This topic is rarely covered in basic courses - we will do a serious exploration of several evaluation questions. Three concrete examples are, how can we design an analysis of how well the Head Start program works? What research design is most effective for understanding how to improve the effectiveness of job training interventions? How can we design a policy analysis to shape regulations for what constitutes acceptable quality of daycare for young children? Since we have a finite amount of time, everyone taking this class should understand they are accepting this tradeoff.

A second difference is that this course will de-emphasize computing, and focus far more than usual on writing. While we will use computers, the entire class is designed for people who plan to be practitioners, or managers, rather than professional statisticians or econometricians or research specialists. Therefore, to be specific, rather than asking each of you to enter data into a record and then do elaborate computations of multiple regression equations, you will instead be presented with several such equations, together with lots of background information about each, and then you will be asked to write about them. A typical question for both your homework and in-class exams will be, "Explain this finding in non-statistical, policy terms to the Governor." Any student who believes this is inappropriate training for their needs should choose a different statistics course. Several are available at the Kennedy School, Graduate School of Education, and the Statistics Department, as well as in particular programs such as the Government Department and Social Studies concentration in FAS. To repeat, our emphasis will be on words, and writing, and explanations, and critiques, rather than on heavy computing. We will compute chi-squares, and regression analyses, and confidence intervals and t-tests, but they will be only part of the homework assignments and exam questions. The core will be writing, writing, and more writing. Be prepared.

A third difference is in how the computer will be used. Most modern statistics and empirical methods courses use computer exercises for day-to-day homework assignments. We will do it differently. I will stress statistical concepts, and principles, and interpretation of data, in the main body of this course. Then, when we have covered the main concepts, we will begin the second half of this course with a serious exposure to computing, and use the computing to solve several concrete in-depth, exercises. So instruction in computing will not take place simultaneously with the early class sessions. Rather, it will be concentrated in the second half of the semester. Also, since students in this class will probably come from several different Schools or departments at Harvard, I will try to arrange that students can work with the computer facilities in their own School. This may or may not turn out to be possible, but I will try.

The reason for all of these details is that Harvard offers about eight superb, basic statistics and empirical methods courses. Each has different emphases and different strengths. It would be a great shame if halfway through the semester, any student in this course suddenly became upset that there is not more elaborate computer work, or calculations, early in the semester. Or conversely, that they are being asked in this class to write too much. Think carefully about which of Harvard's various courses in basic empirical methods will best meet your needs.

 

Substantive areas.

The techniques and methods we discuss are applicable to an enormous number of settings. But we will emphasize human service areas, especially education and health. Throughout the semester there will be readings and homework problems to supplement the textbook - they will emphasize education, health, job training, criminal justice, and effective human services delivery.

One special feature of this class is that we will emphasize questions such as when various techniques are most appropriate, and what assumptions, sometimes stated and sometimes unstated, underlie various techniques. We will constantly ask and re-ask the questions, "How can we use information to improve a policy, or a law, or a public sector social program, or our own work, in specific ways?" All three exams will emphasize this.

 

Textbook and additional readings.

The textbook everyone should buy is by Neil A. Weiss, Introductory Statistics, Fifth Edition. It is published by Addison Wesley, in l999, and is available at the Harvard Coop. We will go well beyond this book's coverage, and I will hand out additional problems and readings. But the text will form the core of our coursework. It is the one, absolutely required text.

 

Class format, and Teaching Fellows.

We will meet in class twice every week, for an hour and a half each time, as a full class. I will present material in these sessions. There will be additional problem solving sessions, scheduled throughout the semester, led by several superb Teaching Fellows. These extra problem solving sessions are voluntary, and they will meet in addition to the two weekly classes. In particular, we will have a regular Friday review class.

 

Student obligations.

Each student will have several responsibilities. First, you are expected to come to each class session well prepared to contribute to class discussion. And please come on time. Second, there will be several homework assignments throughout the semester. Third, there will be three in-class, closed-book exams and occasional shorter quizzes during the semester. I will return your exams with quick turnaround, since each quiz and test is designed to help you learn. After all, solving problems is what statistics is all about.

 

 

Topics by class. Problem sets to accompany these classes will be distributed regularly throughout the semester.

 

Sept. 22 - Introductory class. Overview of the semester. Buy the textbook and start reading Chapter 4. Several introductory examples of how statistics and data analysis can be used to solve policy problems. Discussion of basic definitions we will use throughout the semester. Buy the textbook.

 

Sept. 27 - Read Chapter 4 in the textbook. Basic probability. Addition theorem, multiplication theorem, conditional probability, and using tables to describe probabilities. The concept of statistical independence, and why it is so important for practical applications.

 

Sept. 29 - Bayes Theorem. What it is and why it is important. How Bayesian analysis can help to solve policy problems, especially those that involve screening large groups of people. We will do examples of screening for child abuse; screening the blood supply for the HIV virus, and screening employees for drug use in the workplace. Finish reading chapter 4. First problem set will be distributed.

October 4 - Descriptive measures. Read chapter 3. We will discuss how to summarize large bodies of data in an efficient way.

 

October 6 - Introduction to probability distributions. Read Chapter 5. Discussion and presentation of the binomial distribution. Examples of how the binomial distribution can solve policy problems. We will discuss using this distribution to help determine staffing policies in a large organization. First Problem Set is Due.

October 13 - We will begin Chapter 6. So read Chapter 6. We will discuss the standard normal distribution (Gaussian distribution). We will go over how to compute areas under the normal curve, and various shortcuts for doing this.

 

October 18 - We will finish the normal distribution, especially how to use the normal curve as an approximation to the binomial distribution to solve practical problems. So finish reading Chapter 6. Then begin reading Chapter 7. We will begin a discussion of sampling that will continue for several classes. What is random sampling? What is the distribution of sample means and why is it so important? We will do several examples of how knowing the behavior of the distribution of sample means will help us to solve concrete policy and management problems.

 

October 20 - Review Class, with Practice Problems.

 

October 25 - FIRST IN-CLASS EXAM.

 

October 27 - Finish reading Chapter 7. We will discuss the Central Limit Theorem, one of the most important ideas in statistical analysis. We will also go over the distribution of sample proportions, and solve several problems using this sampling distribution.

 

November 1 - Read Chapter 8 in the textbook. Confidence intervals for a single mean. Confidence intervals for a single proportion. How to interpret confidence intervals, and common mistakes that can lead to unfortunate consequences. How can we make constructive decisions based on sample information and confidence intervals?

 

November 3 - Read Chapter ll in the textbook. Finish discussion of Chapter 8 material on confidence intervals. Introduce the "t" distribution and t-tests. Everyone should have finished reading Chapter 8.

 

November 8 - Skip to Chapter 10. Skip sections l0.4, l0.5 and l0.6. Begin reading Chapter 10. The first main idea will be to compare means from two independent samples. We will go over the sampling distribution for differences between means. Then, we will discuss how to set up confidence intervals for differences between means. Finally, we will do several examples of how to make concrete, policy decisions based on comparing two independent sample means.

 

November 10 - Continue with Chapter 10. Skip sections l0.4, l0.5, and l0.6. In class we will discuss how to compare proportions from two independent samples. We will go over the sampling distribution of proportions, with an emphasis on how it helps to set up confidence intervals. Again, we will apply these ideas to several concrete policy problems, where a manager must make difficult choices between competing alternatives. Also, I will distribute two old "Second Exams" in class to everyone, to help prepare for our second exam the following week.

 

November 15 - Special topics, building on material from Chapters 8, 10, and ll. We will answer the question, how large a sample do you need to take when trying to estimate a population mean? What does it depend upon? Also, how large a sample do you need to take when trying to estimate a population proportion? What does it depend upon?

 

November 17 - Read Chapter 9. We will discuss hypothesis testing. How is testing a null hypothesis versus an alternative, different from setting up confidence intervals? How can you organize a null and an alternative hypothesis to minimize errors? What are Type I and Type II errors? What are trade-offs that we must face when testing a hypothesis, and deciding how much error we are willing to tolerate? Several concrete examples from education, health, and criminal justice policy are helpful here.

 

November 22 - SECOND IN-CLASS EXAM. CLOSED BOOK, BUT FORMULA SHEETS WILL BE DISTRIBUTED WITH THE EXAM. IT WILL BE A 90 MINUTE EXAM

 

November 29 - Read section 13.4. Disucssion of the chi-square distribution, and tests for independence between two discrete variables.

 

December 1 - Read sections l4.l and l4.2. Introduction to simple, linear regression analysis. What is the difference between regression and correlation? What assumptions underlie using regression for descriptive purposes? Why might you actually choose to do a regression analysis, and when is it inappropriate? Introduction to the "method of least squares."

 

December 6 - Read sections l4.3, 14.4, and l4.5. Correlation analysis. How can we compute and interpret a coefficient of correlation? When we combine regression and correlation, how do we test the overall line, and individual coefficients, for statistical significance? What are pitfalls and risks of generalizing to a larger population from a regression line? Introduction to multiple regression. Also I will give out two old FINAL EXAMS for everyone to look over.

 

December 8 - Practice with multiple regression. Interpreting the results from multiple regressions. We will do a practice problem in detail, together.

 

December 13 - Continuation of multiple regression. Inferential methods. What are partial regression coefficients. What is multicollinearity? How can we know when multicollinearity is a threat to making a good policy inference? We will discuss tests of significance for partial and full correlation coefficients, so we can know when different multiple regression models are most helpful for concrete policy problems. At the end of class, I will summarize the highlights of recent material to help everyone prepare for the final exam.

 

December 15 - THIRD AND FINAL IN-CLASS EXAM. CLOSED BOOK. BUT FORMULA SHEETS WILL BE DISTRIBUTED WITH THE EXAMS. YOUR FINAL EXAMS AND FINAL COURSE GRADES CAN BE PICKED UP AFTER THE CHRISTMAS BREAK, ON JANUARY 25TH, AT MY KENNEDY SCHOOL OFFICE, ROOM 318 IN THE LITTAUER BUILDING.

 

GRADES: Each time I teach a statistics course some students want to know, sometimes to the third decimal place, how their final grades will be determined. A good approximation is that the first and second in-class exams are each worth 30 percent, and the final in-class exam is worth 40 percent. Homework problem sets will be used as "tie-breakers," in the sense that a student who is borderline between two grades will get the higher grade if he or she has systematically done well on the several problem sets, and otherwise will not get the higher grade.