The Swiss Cheese Algorithm

Author:: Serge Monkewitz

Contents

Definitions and Terms

D1 : Apparition

Let $A$ be the complete set of observed apparitions $A = \{ a_i |\: i \in \aleph^+,\: 1 \leq i \leq n \}$ . The total number of apparitions $|A|$ is equal to $n$ .

D2 : Position

Each apparition $a_i$ has a position on the sky $p_i$ , specified as a unit vector in $\Re^3$ (that is, $p_i \cdot p_i = 1$ ).

D3 : Distance

Define the distance between two positions $v_1, v_2$ to be

$d(v_1, v_2) = \arccos{(v_1 \cdot v_2)}$

D4 : Match set

Given $a_i \in A$ , define the radius-R match set for $a_i$ as follows :

$M_R(i) = \{ a_j |\: a_j \in A,\: j \neq i,\: d(p_i, p_j) \leq R \}$

D5 : Apparition (group) centroid

Each apparition $a_i \in A$ has a centroid $v_i$ which is defined as follows :

$v = p_i + \sum p_j \:,\; a_j \in M_\phi(i)$

$v_i = \frac{v}{\Vert v \Vert}$

where $\phi$ will be referred to as the density radius.

D6 : Apparition density

Each apparition $a_i \in A$ has an integer density $\rho_i$ :

$\rho_i = 2^{42}(|M_{\phi}(i)| + 1) + 2^{21}(|M_{0.66\phi}(i)| + 1) + |M_{0.33\phi}(i)| + 1$

D7 : Group

A group is a set of nearby apparitions likely to be obvservations of the same astronomical source. Define the group $G_s$ to be the set of apparitions within the group radius $\theta (\geq \phi)$ of the apparition centroid $v_s$ :

$G_s = \{ a_j |\: a_j \in A,\: d(v_s,p_j) \leq \theta \}$

Note that the group radius $\theta$ should be set so that if $d(p_i,p_j) > \theta$ for two apparitions $a_i$ and $a_j$ , then they are very unlikely to be observations of the same astronomical source.

D8 : Confused apparition

A confused apparition is an apparition that belongs to more than one group.

D9 : Confused group

A confused group is a group that contains at least one confused apparition.

D10 : Group count

Each apparition $a_i \in A$ has a group count $c_i$ which is equal to the number of groups $a_i$ has been assigned to. Note that:

$c_i > 1 \Leftrightarrow a_i$ is confused
$c_i = 1 \Leftrightarrow a_i$ is unconfused
$c_i = 0 \Leftrightarrow a_i$ does not belong to any groups

The swiss cheese algorithm guarantees $c_i \geq 1 \; \forall a_i \in A$ .

D11 : Group type

Each group $G_s$ has a group type $q_i$ which takes the following values :

: signifies that is unconfused, and was generated by a swiss cheese pass.
: signifies that is unconfused, and was generated by a cheese crumbling pass.
: signifies that is confused, and was generated by a swiss cheese pass.
: signifies that is confused, and was generated by a cheese crumbling pass.

The terms swiss cheese pass and cheese crumbling pass refer to specific steps in the group generation algorithm, and are described in detail below.

D12 : Seed set

The seed set is the set of apparitions which are to be considered as the starting points for generating groups.

D13 : Seed flag

Each apparition $a_i \in A$ has a seed flag $s_i$ which takes the following values :

: signifies that the apparition has been removed from the set of seeds by a cheese crumbling pass.
: signifies that the apparition is not in the set of seeds.
: signifies that the apparition is to be considered as the starting point for a group by swiss cheese passes.
: signifies that the apparition is to be considered as the starting point for a group by cheese crumbling passes.

D14 : Group counter

Each generated group has an integer group counter that uniquely identifies it within the set of all generated groups.

D15 : Apparition counter

Each apparition $a_i \in A$ has an integer apparition counter $k_i \in \aleph$ which uniquely identifies $a_i$ in $A$ .

D16 : Apparition comparison

The operators $=$ , $<$ , $>$ for apparitions are defined as follows :

$a_i = a_j \Leftrightarrow k_i = k_j$
$a_i > a_j \Leftrightarrow a_j < a_i$
$a_i < a_j \Leftrightarrow$ either $\rho_i < \rho_j$ or $\rho_i = \rho_j, \: k_i > k_j$

D17 : Apparition scan key

Each apparition optionally has a scan key which identifies the scan it was extracted from.

D18 : Scan count

Each group has a scan count, equal to the number of scans containing at least one apparition belonging to the group.

D19 : Apparition count

Each group has an apparition count, equal to the number of apparitions belonging to the group.

Algorithm Input and Output

The WAX application currently retrieves apparitions from a single database table, and stores results in up to 3 database tables. The DBMS hosting the input and output tables is required to support ESQL/C (WAX is currently capable of communicating only with the Informix RDBMS).

Input Apparition Table

WAX assumes that the apparition table contains columns corresponding to a 32-bit integer unique identifier (see D16), as well as J2000 right ascension and declination (handled internally as double precision floating point values). If coverage computation is desired, WAX can optionally retrieve scan keys which correspond to 32-bit integer unique identifiers for the scans apparitions were extracted from. If left unspecified, these columns are assumed to be named cntr, ra, dec, and scan_key. Additional retrieval columns can be specified at compile time - for more information concerning this feature, please see the Writing WAX Plug-Ins documentation page.

Group Information Table

This output table contains group attributes. The following columns will be present for each generated group :

gcntr : the group counter, a unique identifier for the group
napp : the group apparition count
gtype : the group type
sdet(optional) : the group scan count

The table will also contain all columns computed by the WAX Plug-In in use.

Group-apparition Link Table

This output table contains links between groups and apparitions. The following columns will be present for each distinct group-apparition pair :

gcntr : the group counter, a unique identifier for the group
cntr : the apparition counter, a unique identifier for the apparition
ngrp : the group count, the number of groups the apparition belongs to

If so specified (at compile time), the table will also contain all columns retrieved from the input table. This makes apparition information directly accessible from the link table (otherwise, a SQL JOIN with the original input table would be necessary to retrieve such information).

Singleton Table

Normally, single apparition groups (singletons) are stored in the group information table. However, if the end-user of the WAX application so desires, WAX can store singletons in a seperate table.

This output table will normally contain a single column named gcntr, a unique identifier for the single apparition group (see D14). This is the only column necessary since all singletons have an apparition count and scan count equal to 1, and a group type of 0. Note that when a singleton table is in use, singletons are not recorded in the link table since singletons always have group counters equal to their apparition counters.

Finally, there is a compile time option to include all columns retrieved from the input table in the singleton table (just as for the group-apparition link table). For more details, see the Usage Examples documentation section of the srcgen (Source Code Generator) helper application.

Grouped Apparition Table

This optional table contains information pertaining to grouped apparitions, calculated during the output pass of the swiss cheese algorithm, and allows (via the plug-in interface) a "best" group to be picked for a given apparition.

This output table will normally contain a single column named cntr, a unique identifier for the grouped apparition (see D15).

Basic Algorithm

The swiss cheese algorithm is conceptually simple, and consists of the following steps :

Set $s_i = 1, \: c_i = 0 \;\; \forall a_i \in A$
Compute $v_i, \: \rho_i \;\; \forall a_i \in A$
Using the comparison operators defined for apparitions in D16, renumber (sort) the apparitions $a_i \in A$ such that given any two distinct $i, j \in \aleph^+$ where $1 \leq i,j \leq n$ , then the condition holds if and only if .
Now, generate groups as follows :
1. Find the apparition with such that $a_s > a_i \;\; \forall a_i \in A$ where $i \neq s$ and .
2. Find and $\forall a_i \in G_s$ , increment and set .
3. If $\exists a_i \in A$ with , continue at step 1. Otherwise, the algorithm has completed.

In practice, the algorithm as described would be very hard to implement since the size of $A$ can be much greater than the physical memory available to a computer. It is therefore necessary to limit processing to small subsets of $A$ . Because the computation of groups and apparition centroids involves finding apparitions in the immediate vicinity of positions on the sky, it is natural to split $A$ into subsets corresponding to apparitions from distinct regions of the sky. A simple way of doing this is to partition the sky into a sequence of annuli (or bands) which each cover a range of declinations.

The Parallel Swiss Cheese Algorithm

Sky Subdivision: Dec Bands

Consider the set of subsets of $A$

, $B = \{ B_i | \: i \in \aleph, 0 \leq i < m \}$ where $B_i$

is a dec band containing all apparitions with positions falling into the declination range $[d_i,\; d_{i+1})$ . The bands are numbered such that $d_{i+1} > d_i$ . Furthermore, $d_0 = -90.0$

degrees and $d_m = 90.0 + \epsilon$ degrees (for a small positive constant $\epsilon$ ). Clearly, the union of the bands is equal to $A$

, and band $B_i$

is adjacent to bands $B_{i-1}$ and $B_{i+1}$ .

A particular band $B_i$ is fully processed when every group stemming from a centroid with declination falling into the range $[d_i - \theta,\; d_{i+1} + \theta)$ has been generated (and those with centroids in $[d_i,\; d_{i+1})$ have been stored). For each apparition $a$ belonging to such a group, all groups containing $a$ must have been generated so that the group count for $a$ is valid prior to storage.

Consider the group $G_s$ generated around the centroid $v_s$ . The seed apparition for the group, $a_s$ , is within distance $\phi$ of $v_s$ . Therefore, if every apparition in the declination range $[d_i - \theta - \phi,\; d_{i+1} + \theta + \phi)$ has been removed from the set of seeds, then all groups with centroids inside $[d_i - \theta,\; d_{i+1} + \theta)$ will have been generated.

Now consider an apparition $a_k \in A$ . All groups containing $a_k$ will have been generated when $s_k \leq 0$ and there exists no apparition $a_u \in A$ with $s_u > 0$ and $d(v_u,p_k) \leq \theta$ . Note that for such an apparition to exist, $d(p_u,p_k) \leq \theta+\phi$ .

Therefore, if $s_k \leq 0 \;\;\; \forall a_k \in A$ with $v_k$ in the declination range $[d_i - 3\theta,\; d_{i+1} + 3\theta)$ , then $B_i$ has been fully processed. Groups generated from apparitions in that declination range will include apparitions in the range $[d_i - 4\theta - \phi,\; d_{i+1} + 4\theta + \phi)$ , and, in order to compute densities and centroids for them, the declination range $[d_i - 4\theta - 2\phi,\; d_{i+1} + 4\theta + 2\phi)$ must be considered. For simplicity we consider only the maximum density radius, $\phi = \theta$ , arriving at the conclusion that in order to fully process the band $B_i$ , it is necessary to consider apparitions inside the declination range $[d_i - 6\theta,\; d_{i+1} + 6\theta)$ . This condition, looser than strictly necessary, is not sufficient. To see why, reconsider the Basic Algorithm. It is clearly possible for an apparition $a_j \in A$ not in the declination range $[d_i - 6\theta,\; d_{i+1} + 6\theta)$ to be denser than all apparitions in $[d_i - 6\theta,\; d_{i+1} + 6\theta)$ . Furthermore, generating $G_j$ could potentially remove apparitions inside the dec band from the set of seeds. Without considering $a_j$ , it would therefore be impossible to finish processing $B_i$ . In the general case, $A$ must be considered in its entirety to fully process $B_i$ .

If this situation is encountered, the parallel swiss cheese algorithm changes the optimal order (as defined by the Basic Algorithm) in which apparitions are considered for grouping.

Primary and Secondary Bands

Primary Band: A primary band is a band where is even.
Secondary Band: A secondary band is a band where is odd.

For band $B_i$ , the apparitions in the declination range $[d_i - m\theta,\; d_{i+1} + m\theta)$ where $m \in \aleph,\; m \geq 6$ are considered (currently, $m$ is set to $10$ by default). If the condition $d_{i+1} - d_i > 2m\theta$ is enforced, then the regions on the sky being considered by any two non-adjacent bands are disjoint. In particular, no two primary or secondary bands will process regions that overlap. Therefore all primary bands can be processed independently and in parallel. Secondary bands can also be processed independently of eachother and in parallel, but do depend on the results of adjacent primary bands.

By forcing primary bands to save any information which effects adjacent secondary bands, it is possible to guarantee full processing of secondary bands with a significantly tighter declination range.

A primary band $B_i$ will store a list of groups generated around centroids with declination greater than or equal to $d_{i+1} - \theta$ (the top group list for $B_i$ ), and a list of groups generated around centroids with declination less than $d_i + \theta$ (the bottom group list for $B_i$ ). The groups in these lists will be stored according to the order in which they were generated.

Furthermore, group counts for apparitions in the declination range $[d_{i+1},\; d_{i+1} + 2\theta)$ (the top apparition set for $B_i$ ) as well as group counts for apparitions in $[d_i - 2\theta,\; d_i)$ (the bottom apparition set for $B_i$ ) will be stored.

Before creating any new groups, a secondary band $B_{i+1}$ will read the top group list and apparition set generated by $B_i$ as well as the bottom group list and apparition set generated by $B_{i+2}$ and generate the indicated groups in the given order. Essentially, the secondary band will blindly copy the work of its neighbours. This behaviour gives each primary band the latitude to deviate from the group generation order imposed by the Basic Algorithm.

What follows is a detailed description of the various steps of the swiss cheese algorithm.

Step 1: Apparition Retrieval

Consider the band to be processed, $B_x$

. If

is a primary band, all apparitions in the declination range $[d_x - m\theta,\; d_{x+1} + m\theta)$ are retrieved. Otherwise all apparitions in $[d_x - 3\theta,\; d_{x+1} + 3\theta)$ are retrieved.

Define the removal flag $r_i$ for apparition $a_i$ as follows :

$r_i=0 \Leftrightarrow a_i$ doesn't have to be removed from the set of seeds to finish processing .
$r_i=1 \Leftrightarrow a_i$ must be removed from the set of seeds to finish processing .

Finally, define the total seed count $T$ to equal the sum of the removal flags of all retrieved apparitions.

Now for each retrieved apparition, set $r_i = 0,\; c_i = 0,\; s_i = 0,\; \rho_i = 2^{42}+2^{21}+1,\; v_i = p_i$ . Also set $T = 0$ .

Step 2: Density Pass

is a primary band the following steps are performed for each apparition $a_i$

in the declination range $[d_x - (m-1)\theta,\; d_{x+1} + (m-1)\theta)$ :

find $M_\phi(i)$
compute $\rho_i$ from $M_\phi(i)$
compute from $M_\phi(i)$
if is in $[d_x - (m-3)\theta,\; d_{x+1} + (m-3)\theta)$ , set , otherwise set .
if is in $[d_x - 3\theta,\; d_{x+1} + 3\theta)$ , set and increment .

Otherwise, for each apparition $a_i$ in $[d_x - 2\theta,\; d_{x+1} + 2\theta)$ :

find $M_\phi(i)$
compute $\rho_i$ from $M_\phi(i)$
compute from $M_\phi(i)$
if is in $[d_x,\; d_{x+1})$ , set , otherwise set .
if is in $[d_x,\; d_{x+1})$ , set and increment .

Step 3: Stitching Pass (Secondary Bands Only)

This pass simply reads the top apparition set and group list generated by band $B_{x-1}$ as well as the bottom apparition set and group list generated by band $B_{x+1}$ . Each apparition listed in the apparition sets has its group count set to the saved value, after which the saved groups are generated in the indicated order.

Step 4: Swiss Cheese Pass

Step 4.1

Examine the next as yet unconsidered apparition $a_s$

with

, find the match set $M_{\theta + \phi}(s)$ and continue to step 4.2. Note that the order in which apparitions are considered is unimportant (this means that apparitions can be considered in spatial order, allowing match sets to be found efficiently).

If there is no such $a_s$ , then the pass has completed. At this point, if the pass did not generate any new groups and $T > 0$ , the algorithm has reached an impasse and continues with a Cheese Crumbling Pass.

If $T = 0$ , then $B_x$ has been completely processed and the algorithm continues with the Output Pass.

Otherwise, another swiss cheese pass is performed.

Step 4.2

At this point the region around as must be examined, keeping in mind that groups should be generated based on a sequence of apparitions ordered by decreasing density.

Now, a group must be generated from $a_s$ if there is no other apparition $a_k$ with $s_k > 0$ (a seed) that is "denser" than $a_s$ ( $a_k > a_s$ ) which might generate a group containing $a_s$ .

Any group containing $a_s$ must have a centroid within distance $\theta$ of $a_s$ , that is to say, must be generated from an apparition within distance $\theta + \phi$ of $a_s$ .

So, if $d(v_k,p_s) > \theta \;\; \forall a_k \in M_{\theta + \phi}(s)$ with $s_k > 0$ and $a_k > a_s$ , then continue at step 4.3.

Otherwise continue at step 4.1.

Step 4.3

Scan the set of apparitions $M_{\theta + \phi}(s) \bigcup a_s$ to find $G_s$

. Now set

and subtract $r_s$

from

. Next, $\forall a_k \in G_s$ with $k \neq s, s_k > 0, a_k < a_s$ , set $s_k = 0$

and subtract $r_k$

from

. Finally, set the group type of $G_s$

and continue with step 4.4.

Step 4.4

is a primary band :

if has declination greater than or equal to $d_{x+1} - 2\theta$ , append to the top group list.
if has declination less than $d_x + 2\theta$ , append to the bottom group list.
Increment the group count for every apparition in

Otherwise :

Increment the group count for every apparition in that falls inside the declination range $[d_x,\; d_{x+1})$ .

Finally, if $v_s$ is inside $[d_x - \theta,\; d_{x+1} + \theta)$ :

Set the group counter for to
If requested, compute the group scan count by sorting the apparitions in on scan key and then counting the number of unique scan key values
Store a record of the generated group in memory

Continue at step 4.1.

Step 5: Cheese Crumbling Pass (Primary Bands Only)

This pass is identical to the swiss cheese pass, except for the following: only apparitions $a_i$

with

are considered as starting points for group generation. Furthermore, seed apparitions from which groups are generated have their seed flag values set to $-1$

(not

Step 6: Output Pass

is a primary band :

Save the top group list, bottom group list, top apparition set, and bottom apparition set to disk.

For each group $G_i$ stored by step 4.4 :

Scan the apparitions in . If any of them has a group count greater than , add to (marking the group confused). Also find the z-component of the seed apparition's centroid (this is the apparition with counter equal to the group counter for ).
Pass to a pluggable group parameter computation stage which computes group attributes (such as an average position, average magnitudes, etc...).
Lastly, if $z \in [d_x,\; d_{x+1})$ , insert and its associated apparitions and attributes into the various database output tables.

For each apparition $a_j$ in $[d_x,\; d_{x+1})$ , pass the list of groups containing $a_j$ to a pluggable apparition parameter computation stage which computes attributes for $a_j$ . Finally, insert $a_j$ and its associated attributes into the grouped apparition table.

Generated on Thu Oct 21 13:19:39 2004 for WAX Version 2.1 by