Lars Vilhuber
there are many:
Big data is:
Representative big data is:
“Respondent load should always be considered when planning a statistical collection and there should be policies and practices in place to manage relationships with respondents. The aim should always be to keep reporting load to the minimum and to maintain the high quality of collections.”
Data collection in administrative data
What about using organic data?
Using (changes in) Facebook home location
Using tweeted information on “new house”
Travel tweets
… they all refer back to official statistics!
Allow researchers to explore the entire multi-dimensional distribution, including its extremes, for instance for rare events or measures of inequality or program impact.
Computing gap
Challenges in
Administrative data and organic data increase the challenge
“… the current Census Bureau survey and census methods are unsustainable. Changes must occur in the acquisition of data and construction of statistical information for the Census Bureau to succeed.”
Robert Groves, Director, Census Bureau, September 8, 2011
“Modern computational tools play the same role now that survey design and implementation did in the 1960s.”
John Abowd and Steve Fienberg, CNSTAT, May 8, 2015
“We would like to suggest […] implementing a variety of new models for facilitating the movements of researchers between academia and the Federal Statistical System”
Report of the NSF reverse site visit of the NCRN, April 2015