Canadian Federal Government's revenue sources ...

The Provincial and Territorial Registrars across Canada collect vital statistics data on live births, fetal deaths and deaths occurring in Canada as well as some deaths of Canadian residents occurring in the United States. Surname and alternate surname fields are assigned a phonetic New York State Intelligence Information System (NYSIIS) code. The live birth and stillbirth data are stored in the Canadian Birth Data Base (CBDB) and mortality data are stored in the Canadian Mortality Data Base (CMDB). Income tax summary files are available from 1984 onward to help evaluate death searches and confirm whether an individual is alive. The Canadian Cancer Data Base is an historic file held at Statistics Canada. It contains cancer incidence data from 1969 onward reported by all Canadian provincial and territorial cancer registries. The CCDB was created for undertaking historical cancer incidence record linkage and epidemiological studies.

The mainframe software currently available are: Generalized Record Linkage System V1, Match360 (both of which have been developed at Statistics Canada), and SAS programs. GRLS V4 has been recently developed. Some of the features of the system are:
• Runs using UNIX and ORACLE
• Based on Fellegi-Sunter linkage methodology
• Has graphical interface
• Allows multiple concurrent users
• Allows user-defined rules which are programmed in C
• Linked records can be grouped into “weak” and “strong” groups
• Allows refinements of weights and thresholds
• Bilingual
• On-line help is available
• Allows for sampling, and
• Has NYSIIS and Soundex rules built-in.

in modern statistics the model is king

There are other ways to distinguish between statistical models, but I will not go into this here. The point to which I want to draw attention here is simply that in modern statistics the model is king. The computation, the model selection criteria, and so on are all secondary, mere details in the task of [...]

Full Story
THE NATURE OF STATISTICS

THE NATURE OF STATISTICS

It is pointless attempting a definition of a discipline as broad as statistics. All that such an attempt would achieve would be to attract disagreement. Instead I want to focus on a few of the properties of the discipline which stand in contrast to those of data mining. One such difference is related to the [...]

Full Story
Web Information service © 2010 - Registered