GitMiner is a Pharo Smalltalk library that helps developers in analyzing git repositories. With GitMiner retrieving data about source code, diffs, developer identities, changed files, and commits in a Smalltalk environment will be simpler than ever :)
# GitMiner
GitMiner is a Pharo Smalltalk library that helps developers in analyzing git repositories. With GitMiner, retrieving data about source code, diffs, developer identities, changed files, and commits in a Smalltalk environment will be simpler than ever :)
Originally developed by [Stefano Campanella](https://github.com/StefanoStone) and [Carmen Armenti](https://carmenaarmenti.github.io) in the [REVEAL](https://reveal.si.usi.ch) group.
## Installation
> [!NOTE]
> To ensure the project works without any issues, you have to make sure that you are also running the cloc submodule on your machine.
> To do so, go check the [gitminer-cloc](https://github.com/USIREVEAL/gitminer-cloc) repository and follow the instructions to install it.
If you want to only mine the git repositories, pull from the most recent stable version `v1.0.0`:
```
Metacello new
baseline: 'GitMiner';
repository: 'github://USIREVEAL/gitminer:v1.0.0';
load.
```
If you want to use the APIs to mine GitHub repositories, you need to pull from the `github` branch:
```
Metacello new
baseline: 'GitMiner';
repository: 'github://USIREVEAL/gitminer:github';
load.
```
Once your service is running, ensure that the miner is pointing to the correct URL of your cloc service. You can do this by executing the following code (before executing any mining operation):
```smalltalk
GMFlagStore uniqueInstance CLOCEndpoint: 'http://localhost:8080/'
```
## Usage
### Mining Repositories
To use GitMiner, you can start by creating a new `GMRepository` instance with the path to your local git repository:
```smalltalk
repo := GMRepository from: aGitRepoPath.
```
This will create a new repository instance and start mining. Once the mining is done, you can access the mined data through the repository instance. You can eventually save the repository to a file for later use:
```smalltalk
repo serializeTo: aPath.
```
Or if you want to update an already existing repository:
```smalltalk
repo := GMRepository from: aGitRepoPath basedOn: aPathWithSerializedData.
```
### Loading Repositories
To load a previously serialized repository, you can use the following code:
```smalltalk
repo := GMRepository deserializeFrom: aPathWithSerializedData.
```
### CLI
You can also mine repositories using the command-line interface. For more information, see the `GitMiner-CLI` package.
> [!NOTE]
> The Serialization format used by GitMiner is proprietary and, even if based on JSON, it is not compatible with other tools.
## Publications
Gitminer was used to support the following scientific research papers:
- C. Armenti and M. Lanza (2025), _"Telling Software Evolution Stories With Sonification"_, International Conference on Program Comprehension (ICPC), pp. 398–402, IEEE. [doi: 10.1109/ICPC66645.2025.00050](https://doi.org/10.1109/ICPC66645.2025.00050)
- C. Armenti and M. Lanza (2024), _"Using Animations to Understand Commits"_, International Conference on Software Maintenance and Evolution (ICSME), pp. 660–665, IEEE. [doi: 10.1109/ICSME58944.2024.00069](https://doi.org/10.1109/ICSME58944.2024.00069)
- C. Armenti and M. Lanza (2024), _"Using Interactive Animations to Analyze Fine-grained Software Evolution"_, Working Conference on Software Visualization (VISSOFT), pp. 36–47, IEEE. [doi: 10.1109/VISSOFT64034.2024.00014](https://doi.org/10.1109/VISSOFT64034.2024.00014)
- S. Campanella and M. Lanza (2024), _"Hidden in the Code: Visualizing True Developer Identities"_, Working Conference on Software Visualization (VISSOFT), pp. 24–35, IEEE. [doi: 10.1109/VISSOFT64034.2024.00013](https://doi.org/10.1109/VISSOFT64034.2024.00013)