squeak-history

Description

Project for Squeak 5.2 (and above) to explore and learn about the history of Squeak and Squeak-related projects.

Details

Source
GitHub
Dialect
squeak (25% confidence)
License
MIT
Stars
6
Forks
2
Created
March 26, 2019
Updated
Aug. 2, 2025

README excerpt

# The Squeak History Project

Project for Squeak 5.3 (and above) to explore and learn about the history of Squeak and Squeak-related projects.

## How to Install

1. [Get a Squeak image](https://squeak.org/download) (5.3 or newer).
2. Load the Squeak History Project:

   ```smalltalk
   Metacello new
   	baseline: 'SqueakHistory';
   	repository: 'github://hpi-swa/squeak-history/packages';
   	load.
   ```

After the code was loaded, mailing list archives will be downloaded. See `BaselineOfSqueakHistory >> #loadData` for details.

## How to Use

The `SqhMailmanAggregator` can be used to enumerate all mail messages. Example queries can be found in the `queries` protocol. The message cache holds meta-data for each message in the image, that is, instances of `SqhMailWrapper`. The message body has to access the archive's file contents on disk, which is slower. Here is an example query, which requires disk access:

```smalltalk
countMessageLines
	| count |
	count := 0.
	self messagesCachedDo: [:wrapper |
		count := count + wrapper mailMessage bodyText lineCount].
	^ count
```

Note that you should also derive some rules for author-key normalization to further improve the overall quality of query results. Just run this:

```smalltalk
SqhMailmanAggregator new
	showProgress: true; "optional"
	deriveRulesForAuthorKeyNormalization. "ignore the warning"
```

For more information on normalization, see below.

## Notes on Normalization

There are rules to normalize different kinds of information: author names, timestamps, and mail addresses. The goal is to identify contributors and, eventually, relevant discussions. Hand-selected rules can be found in `SqhMailmanAggregator >> #rulesForAuthorKeyNormalization` and `#rulesForAuthorKeyClarification`. Here is an excerpt:

```Smalltalk
"rulesForAuthorKeyNormalization"
'alankay' -> 'alancurtiskay'.
'hhirzel' -> 'hanneshirzel'.

"rulesForAuthorKeyClarification"
'squeakdev@reider.net' -> ('squeakdev' -> 'alanreider').
'squeak@bike-nomad.com' -> ('squeak' -> 'nedkonz').
```

Then, there is a simple algorithm to derive more normalization rules using *e-mail addresses* as identifier. If two messages came from the same address, then the author's names can be used to identify the same contributor. There is also a check to avoid mapping cycles. See `SqhMailmanAggregator >> #deriveRulesForAuthorKeyNormalization` for details. We filter generic addresses such as:

```
notifications@github.com
noreply@github.com
no-reply@appveyor.com
builds@travis-ci.org
```

Note that there is also a (hand-crafted) list of generic author names (or keys) in `#genericAuthorKeys` including `github` or `travisci`.
← Back to results