What is Microdata

Microdata are the units of data that aggregate statistics are compiled from. Microdata is data about individual people, households, or organisations and consist of sets of records containing information on individual respondents or other entities as opposed to the aggregated statistics appearing in a published report. Microdata represent observed or derived values of certain variables for certain objects. Microdata may also be data about other characteristics of the Pacific Islands such as geographical data. National microdata is usually available from censuses, surveys and administrative data. These data are most commonly collected by the national government or Pacific Island National Statistics Offices (NSO) and access provided by the NSO or the national archive. The data are collected at an individual, household, or institution level as appropriate (Desai and Cowell, 2006).

What is the Pacific Data Hub Microdata Library

The Pacific Data Hub - Microdata Library is a central repository for Pacific Island statistical microdata, reports and documents. It is an online cataloguing and dissemination system of survey and census metadata and microdata. It is a service established to facilitate access to microdata that provide information about people living in Pacific Island developing countries, their institutions, their environment, their communities and the operation of their economies. SPC provides safe access to microdata via its Pacific Data Hub-Microdata Library microdata.pacificdata.org to enable research and analysis that benefits Pacific Island people. Microdata is a level of data that creates a risk of recognition/identification of individual people, households or organisations and as such must be managed carefully to protect against this risk. 

Data Acquisition

Data Acquisition – describes ways to collect and collate microdata and its associated metadata. Microdata and metadata are generated from various data collection activities such as household surveys, population censuses, and administrative recording systems. Many organisations in the Pacific as part of their work not only capture their own microdata, but also acquire microdata from other sources as well. They can be generated by many official and non-official producers for example Pacific Island National Statistics Offices (NSO), line ministries, researchers, and the private sector. To better understand the steps involved in data acquisition for Pacific Island microdata please visit the following resources:

What is Data Curation and Preservation?

Data curation and preservation is the art of maintaining the value of data. A data curator does this by collecting data from many different sources and then aggregating and integrating it into an information source that is many times more valuable than its independent parts. During this process, data might be annotated, tagged, presented, and published for various purposes. The goal is to keep the data valuable so it can be reused in as many applications as possible. Through the curation process, data are organized, described, cleaned, enhanced, and preserved for public use, much like the work done on paintings or rare books to make the works accessible to the public now and in the future. With modern technology, it's increasingly easy to post and share data. Without curation, however, data can be difficult to find, use, and interpret. For more information about the process and principles of Data Curation and Preservation visit the following links below:

Cataloging

Cataloging involves publishing detailed metadata in an on-line searchable catalog to make data discoverable. Cataloging also provides information such as creator names, titles, and subject terms that describe resources, typically through the creation of bibliographic records. The records serve as proxies for the stored information resources. To “catalog” a dataset or information about a data collection involves several interrelated processes. Cataloguing is the process of creating metadata representing information resources, such as datasets. To enable a person to find a particular dataset interested users must be properly informed about the existence and characteristics of the datasets available. Many potential users have very little if any information about the available datasets. Good metadata must be made available, preferably in the form of a searchable on-line catalog. To learn more about this visit the following links:

Data Discovery and re-use (citations)

Data citation refers to the practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to other scholarly resources. They are references that can be included at the study level which point to published works that have used the data from a particular study such as a journal article, working paper, or news article. A citation gives credit to the data source and distributor and identifies data sources for validation. They are also a good way of showing the funders of surveys that the data are being used for policy and research purposes. Citations support researchers to manage and share data and enabling data citation and linking data with publications increase visibility and accessibility of data and the research itself. One of the features of the Pacific Data Hub – Microdata Library is a bibliography of publications that have cited the use of a dataset listed in the catalog. To learn more about the process of citing a dataset from the Pacific Data Hub – Microdata Library visit the following resources:

Metadata Documentation and data management

Data documentation is important because it helps the researcher to find the information that is necessary for them to fully exploit the analytic potential of the data. Names, abstracts, keywords and other important metadata elements make it easier for a researcher to locate specific datasets and variables. Documentation also helps the researcher to understand what the data are measuring and how they have been created. Data documentation also explains how data were created, what data mean, what their content and structure are and any data manipulations that may have taken place. Documenting data should be considered best practice when creating, organising and managing data and is important for data preservation. Whenever data are used sufficient contextual information is required to make sense of that data. 

Evaluation of Microdata and Metadata quality

Evaluating a micro-dataset is a crucial process following it’s acquisition. Data structure, completeness and correctness are checked- for example the structure, size and type, completeness, and correctness of the dataset agrees with description of the dataset content. Metadata completeness is also checked for the microdata file. The following resources provide more detail on the procedures involved in anonymisation.

Introduction to Statistical Disclosure Control (anonymisation) 

When disseminating microdata files, the data producer must safeguard the confidentiality of information about individual respondents. Processes aimed at protecting confidentiality are referred to as Statistical Disclosure Control (SDC). SDC techniques include the removal of direct identifiers (names, phone numbers, addresses, etc) and indirect identifiers (detailed geographic information, names of organizations to which the respondent belongs, exact occupations, exact dates of events such as birth, death and marriage) from the data files. Statistical disclosure control or anonymisation methods have been developed to make it possible for statistical offices to anonymize and release microdata in a controlled way which protects the privacy and statistical confidentiality of individuals and other entities so that there is a low risk of individuals and households being identified within the data. Such methods make it possible to disseminate microdata to researchers in universities or in government thus more fully exploiting its potential value for social research and policy analysis. The following resources provide more detail on the procedures involved in anonymisation.

Dissemination of Microdata Files

Data producers are faced with an ever-expanding demand for relevant and accurate information. Providing researchers with access to microdata has many potential benefits: This broadens the use of existing data, and increase the return on data collection investments. For more detailed information on Microdata dissemination see the following:

Protocols/procedures for licensed data access requests to the Pacific Data Hub – Microdata Library

There are some steps needed to be followed when applying for data held by the Pacific Data Hub – Microdata Library including suggested eligibility criteria and selection criteria. License application requests will not be considered unless researchers satisfy the following eligibility criteria.

Suggested eligibility criteria

For one’s research to be selected, the proposal must satisfy the following quality criteria:

  • Researcher(s) must be affiliated with a credible research and/or teaching institution, such as an accredited university or recognised research organisation or an NGO
  • Researcher(s) must either have a proven track record of analysing large datasets or in the case of students, must have a supervisor or thesis adviser who could oversee her/his work
  • Researcher(s) must accept and adhere to the terms of use (SPC) and any other specific conditions outlined in a signed Data License Agreement (between SPC and data owner) and stipulated in Data Access Agreement (DAA) between the researcher and SPC or under a 3rd party license between data owner and the researcher.
  • Researcher(s) must agree to provide the final output of his/her work as a report, paper or otherwise.  All data sources must be cited in the produced work.
  • Researcher(s) must satisfy eligibility requirements as outlined above

Suggested selection criteria

The quality of the proposal must be high and must fully develop the following points:

  • Good justification for and significance of chosen topic is provided
  • Outline how the data will be used and specifically how the research will be compiled
  • Not replicating analysis already conducted or ongoing. 
  • Research questions are clearly stated and answerable based on analysis to be conducted (this requires specifics and if more information is required must be provided).
  • Methods clearly explained on how the analysis will be done and how it will be appropriate for answering study questions (if there is a disconnect between study subject and the dataset there needs to be more clarity how research can be achieved).
  • The dissemination approach or plan of results should be described.
  • Explain how the work on the analysis will be undertaken between team members (roles of each research team member clearly defined).
Protocols and procedures for data access requests to the Pacific Data Hub – Microdata Library.