Galaxy Formation and Evolution via Phase-temporal Clustering with FuzzyCat \(\circ\) AstroLink

A pressing and continually evolving sub-field of astrophysics is the study of galaxy formation and evolution. This study seeks to understand how and why a galaxy and its substructure develops over time in the context of the surrounding environment and of the underlying cosmological model. To do this, astrophysicists and cosmologists will look to both observational and simulation data. In observations, we may learn from a single snapshot in time of a very large number of galaxies that arise from the ground-truth cosmological model of our Universe. While in simulations, we may learn from many snapshots of a comparatively small number of galaxies that depend on a pre-specified cosmological model. By comparing these two data types, we can hope to constrain our cosmological models and understanding of galaxy formation and evolution.

In the context of simulated data, a typical approach is to use a halo finder (+ merger tree) code to find a catalogue of haloes and their merger tree – which are then analysed in terms of their physical properties. However, generally these codes are only tracking self-bound groups that satisfy a minimum overdensity threshold. If this threshold is too high then some haloes may be disregarded and if it is too low then some haloes can be lost in the unbinding procedure. Furthermore, these codes will not capture unbound groups that have been (or are in the process of being) tidally disrupted nor will they capture fleeting structure resulting from density waves and hydrodynamical effects. Not having these kinds of structures included in any subsequent analysis means that cosmological models are never constrained against these the existence of these structures in simulations – even though they are observed to be present in our Universe.

The goal of this page is to serve as a tutorial and to highlight how the composition of AstroLink and FuzzyCat can be used as a powerful tool for studying galaxy formation and evolution. AstroLink is a general-purpose astrophysical clustering algorithm built for extracting meaningful hierarchical structure from point-cloud data defined over any feature space, and combined with FuzzyCat, this pipeline is able to find clusters that are both phase-space- and temporally-robust without making any strong assumptions about the kinds of galactic substructures that are (or are not) physically relevant for study within the field.

Data: The NIHAO-UHD suite of cosmological hydrodynamical simulations

In this tutorial we use simulations from the NIHAO-UHD suite Buck et al. (2020a) part of the Numerical Investigation of a Hundred Astronomical Objects (NIHAO) simulation suite (Wang et al. 2015). These galaxies are chosen to reflect the most MW like galaxies in terms of mass, size and disk properties. Parts of the simulation suite have previously been used to study the build-up of MW’s peanut-shaped bulge (Buck et al. 2019a, Buck et al. 2018), investigate the stellar bar properties (Hilmi et al. 2020), infer the MW’s dark halo spin (Obreja et al. 2022), study the dwarf galaxy inventory of MW mass galaxies (Buck et al. 2019b) or investigate the age-metallicity relation of MW disk stars (Lu et al. 2022a) including the chemical bimodality of disk stars (Buck et al. 2020b), their abundances (Lu et al. 2022b) and the origin of very metal-poor stars inside the stellar disk (Sestito et al. 2021). Given their high resolution and complex hydrodynamical nature, the NIHAO-UHD offer the perfect opportunity to apply our clustering pipeline.

Code: Running AstroLink and FuzzyCat on NIHAO-UHD stellar haloes

To do phase-temporal clustering on NIHAO-UHD galaxies we need a python script. Firstly, we do the necessary imports:

import os
import gc

import numpy as np
import pynbody as pb
import matplotlib.pyplot as plt
import matplotlib.colors as col
import ffmpeg

from astrolink import AstroLink
from fuzzycat import FuzzyCat, FuzzyPlots

Then, the first real piece of code we will use is a method that reads a snapshot file with pynbody and returns the necessary information about its main halo:

def loadGalaxyAsArrays(snapshotFilePath, particleName, featureSpaceNames = ['pos', 'vel']):
    """Returns the main halo data, from the simulation file `snapshotFilePath`,
    for particle `particleName`, in the feature spaces specified by
    `featureSpaceNames`.
    """

    # Load the simulation snapshot
    simulation = pb.load(snapshotFilePath)

    # Take only the largest halo and make it face-on (stellar disk is in the x-y plane)
    mainHalo = simulation.halos()[1]
    pb.analysis.angmom.faceon(mainHalo)
    mainHalo.physical_units()

    # Centre data on the median of the dark matter halo
    darkMatter = np.column_stack([mainHalo.dm[feature] for feature in featureSpaceNames])
    centre = np.median(darkMatter, axis = 0)

    # Get particle data and IDs
    if particleName == 'dark':
        darkMatter -= centre
        darkMatterIDs = mainHalo.dm['iord']
        return darkMatter, darkMatterIDs, simulation
    if particleName == 'stars':
        stars = np.column_stack([mainHalo.stars[feature] for feature in featureSpaceNames])
        stars -= centre
        starsIDs = mainHalo.stars['iord']
        return stars, starsIDs, simulation
    if particleName == 'gas':
        gas = np.column_stack([mainHalo.gas[feature] for feature in featureSpaceNames])
        gas -= centre
        gasIDs = mainHalo.gas['iord']
        return gas, gasIDs, simulation

Although we only present results for stellar particles here, this method readily allows for phase-temporal clustering of the dark matter and/or gas particles too.

With this, we can now write a method that uses AstroLink to cluster the particles in each snapshot:

def findAndSaveClustersFromSnapshots(snapshotFilePaths, workingDirectoryPath, particleName, nSamples):
    """Uses AstroLink to find the clusters within each main halo specified by
    `snapshotFilePaths` and `particleName`. Then saves them in different formats
    into the directory specified by `workingDirectoryPath`. `nSamples` is used
    to format the cluster file names.
    """

    # The number of leading digits in the saved cluster file names
    sampleNumberFormat = np.log10(nSamples).astype(int) + 1

    # For tracking which star particles have been clustered over all snapshots (for FuzzyCat memory efficiency)
    veryLargeN = 10**8 # Must be larger than the maximum iord value
    particleIDsBool = np.zeros(veryLargeN, dtype = np.bool_)

    # Track cluster file names
    clusterFileNames = []

    # Cycle through each snapshot, run AstroLink, and save the clusters
    for index, snapshotFilePath in enumerate(snapshotFilePaths):
        print(f"Loading {snapshotFilePath.split('/')[-1]}                                                         \t\t", end = '\r')
        # Load the galaxy
        particleArr, particleIDs, _ = loadGalaxyAsArrays(snapshotFilePath, particleName)

        print(f"Running AstroLink on the {particleName} particles of snapshot {snapshotFilePath.split('/')[-1]}   \t\t", end = '\r')
        # Run AstroLink and save the clusters in the snapshot
        c = AstroLink(particleArr)
        c.run()
        for clst, clst_id in zip(c.clusters[1:], c.ids[1:]):
            # Cluster file name
            clusterFileName = f"{index:0{sampleNumberFormat}}_{clst_id}.npy"
            clusterFileNames.append(clusterFileName)

            # Save the cluster with respect to the order of the data in the snapshot file
            cluster_raw = c.ordering[clst[0]:clst[1]]
            np.save(f"{workingDirectoryPath}Clusters_raw/{clusterFileName}", cluster_raw)

            # Save the cluster with respect to the particle IDs in the snapshot file
            cluster_iord = particleIDs[cluster_raw]
            np.save(f"{workingDirectoryPath}Clusters_iord/{clusterFileName}", cluster_iord)

            # Mark the particles that have been clustered
            particleIDsBool[cluster_iord] = 1

    # Save the IDs of the star particles that have been clustered
    clusteredIDs = np.where(particleIDsBool)[0]
    np.save(f"{workingDirectoryPath}clusteredIDs.npy", clusteredIDs)

    # Translate the clusters (with respect to the particle IDs) into reduced arrays (with respect to the order of the IDs of clustered particles) for improved memory efficiency with FuzzyCat
    for clusterFileName in clusterFileNames:
        cluster_iord = np.load(workingDirectoryPath + 'Clusters_iord/' + clusterFileName)
        cluster_reduced = np.where(np.isin(clusteredIDs, cluster_iord, assume_unique = True))[0].astype(cluster_iord.dtype)
        np.save(workingDirectoryPath + 'Clusters/' + clusterFileName, cluster_reduced)

The core of this method is simple; it loads the particle data, applies AstroLink to that data, and iteratively saves the clusters as .npy files. However, just saving the clusters directly from AstroLink (e.g. cluster_raw) means that each cluster file is an array of integers that can be used to slice the particleArr array and return the data points corresponding to the particles in each cluster – in reality, we need to account for the possibility of particleArr containing data points from different particles between snapshots. So, the additional code in this method translates the cluster_raw arrays into cluster_iord arrays (which contain the particle IDs of each particle in each cluster), and then finally into cluster_reduced arrays. This last translation isn’t technically necessary, but serves to make FuzzyCat handle the clusters in a more memory efficiency way since the particle IDs can be very large in value and we only care about a fraction of the total number of particles in each simulation – in this case, only those of the particles that ever make it into the main halo.

With a series of reduced cluster files in ‘Clusters’ folder of the working directory, we can now run FuzzyCat and save its output.

def runFuzzyCatOnClustersFromSnapshots(workingDirectoryPath, nSamples, minStability):
    """Runs FuzzyCat on the clusters contained in `workingDirectoryPath` with
    parameters `nSamples` and `minStability`. The `nPoints` parameter is
    determined automatically from a file containing the IDs of clustered
    particles.
    """

    # Number of points clustered
    clusteredIDs = np.load(f"{workingDirectoryPath}clusteredIDs.npy")
    nPoints = clusteredIDs.size
    del clusteredIDs

    # Run FuzzyCat
    fc = FuzzyCat(nSamples, nPoints, workingDirectoryPath, minStability = minStability, checkpoint = True, verbose = 2)
    fc.run()

    # Plot the basic results
    FuzzyPlots.plotOrderedJaccardIndex(fc)
    FuzzyPlots.plotStabilities(fc)
    FuzzyPlots.plotMemberships(fc)

    # Save outputs
    np.save(f"{workingDirectoryPath}jaccardIndices.npy", fc.jaccardIndices)
    np.save(f"{workingDirectoryPath}ordering.npy", fc.ordering)
    np.save(f"{workingDirectoryPath}fuzzyClusters.npy", fc.fuzzyClusters)
    np.save(f"{workingDirectoryPath}stabilities.npy", fc.stabilities)
    np.save(f"{workingDirectoryPath}memberships.npy", fc.memberships)
    np.save(f"{workingDirectoryPath}memberships_flat.npy", fc.memberships_flat)
    np.save(f"{workingDirectoryPath}fuzzyHierarchy.npy", fc.fuzzyHierarchy)
    np.save(f"{workingDirectoryPath}groups.npy", fc.groups)
    np.save(f"{workingDirectoryPath}intraJaccardIndicesGroups.npy", fc.intraJaccardIndicesGroups)
    np.save(f"{workingDirectoryPath}interJaccardIndicesGroups.npy", fc.interJaccardIndicesGroups)
    np.save(f"{workingDirectoryPath}stabilitiesGroups.npy", fc.stabilitiesGroups)

That’s all the methods we need to find a phase-temporal clustering of a simulated galaxy. However, among other things, we also want to be able to visualise our results. So we need a plotting function…

def paintLabelsOntoSnapshot(particleArr, clusters_raw, labels, saveFileNameStem, snapshotFileName, axisLimits, withDiskZoomIn = True):
    """Creates a two-panel plot of the clusters within a snapshot. The left
    panel is a 3D scatter plot and the right panel is a top-down view of the
    region around the disk of the galaxy.
    """

    # Colour the data according to the cluster
    colourList = [f"C{i}" for i in range(10) if i != 7]
    colours = np.zeros((particleArr.shape[0], 4))
    sizes = np.zeros(particleArr.shape[0])
    for cluster_raw, label in zip(clusters_raw, labels):
        colours[cluster_raw] = col.to_rgba(colourList[label%9], alpha = 1)
        sizes[cluster_raw] = 0.5

    # Create figure
    width = 16 if withDiskZoomIn else 8
    height = 8
    figAspectRatio = height/width
    fig = plt.figure(figsize = (width, height))
    fig.patch.set_facecolor('k')

    # Plot the 3D data
    ax = fig.add_axes((0, 0, figAspectRatio, 1), projection = '3d')
    ax.scatter(*particleArr[:, :3].T, facecolors = colours, edgecolors = 'w', s = sizes, lw = 0.05)
    # Adjust data limits
    ax.set_xlim(-axisLimits, axisLimits)
    ax.set_ylim(-axisLimits, axisLimits)
    ax.set_zlim(-axisLimits, axisLimits)
    # Remove axes
    ax.axis('off')
    ax.patch.set_facecolor('k')
    # Add cartesian coordinate axes of length 100 kpc for reference
    ax.quiver([0]*6, [0]*6, [0]*6, [1, -1, 0, 0, 0, 0], [0, 0, 1, -1, 0, 0], [0, 0, 0, 0, 1, -1],
              color = 'w', alpha = 1, length = 100, arrow_length_ratio = 0.1)
    ax.text(100, 0, 0, 'X', color = 'w')
    ax.text(0, 100, 0, 'Y', color = 'w')
    ax.text(0, 0, 100, 'Z', color = 'w')
    if withDiskZoomIn:
        # Add zoom-in box around disk
        prismColour, prismAlpha = col.to_rgba('w', alpha = 0.2), 0.05
        xyRange, zRange, onesArray = np.array([-25, 25]), np.array([-5, 5]), np.ones(4).reshape(2, 2)
        for i in range(2):
            # z-direction faces
            xx, yy = np.meshgrid(xyRange, xyRange)
            ax.plot_wireframe(xx, yy, zRange[i]*onesArray, color = prismColour)
            ax.plot_surface(xx, yy, zRange[i]*onesArray, color = prismColour, alpha = prismAlpha)
            # x-direction faces
            xy, zz = np.meshgrid(xyRange, zRange)
            ax.plot_wireframe(xyRange[i]*onesArray, xy, zz, color = prismColour)
            ax.plot_surface(xyRange[i]*onesArray, xy, zz, color = prismColour, alpha = prismAlpha)
            # y-direction faces
            ax.plot_wireframe(xy, xyRange[i]*onesArray, zz, color = prismColour)
            ax.plot_surface(xy, xyRange[i]*onesArray, zz, color = prismColour, alpha = prismAlpha)

        # Plot the 2D disk data
        axisCentre, axisHalfWidth = 0.5 + figAspectRatio/2, 0.9*(1 - figAspectRatio)
        axDisk = fig.add_axes((axisCentre - axisHalfWidth/2,
                            0.5*(1 - axisHalfWidth/figAspectRatio),
                            axisHalfWidth,
                            axisHalfWidth/figAspectRatio))
        inBoxBool = (particleArr[:, 0] > xyRange[0])*(particleArr[:, 0] < xyRange[1]) # particles in x limits
        inBoxBool *= (particleArr[:, 1] > xyRange[0])*(particleArr[:, 1] < xyRange[1]) # particles in y limits
        inBoxBool *= (particleArr[:, 2] > zRange[0])*(particleArr[:, 2] < zRange[1]) # particles in z limits
        axDisk.scatter(*particleArr[inBoxBool, :2].T, facecolors = colours[inBoxBool], edgecolors = 'w', s = 2*sizes[inBoxBool], lw = 0.05)
        # Adjust data limits
        axDisk.set_xlim(xyRange[0], xyRange[1])
        axDisk.set_ylim(xyRange[0], xyRange[1])
        # Remove axes
        axDisk.patch.set_facecolor('k')
        for side in ['top', 'left', 'bottom', 'right']:
            axDisk.spines[side].set_color('w')

    # Adjust figure margins
    top, bottom, left, right = 1, 0, 0, 1
    fig.subplots_adjust(top = top, bottom = bottom, left = left, right = right)
    # Add snapshot number
    fig.add_subplot(111, frameon = False)
    plt.tick_params(labelcolor = 'none', top = False, bottom = False, left = False, right = False)
    plt.grid(False)
    plt.text(0, 1, snapshotFileName, ha = 'left', va = 'top', fontsize = 10, color = 'w', transform = plt.gca().transAxes)
    # Save figure
    plt.savefig(f"{saveFileNameStem}{snapshotFileName}.png", dpi = 200, bbox_inches = 'tight')
    fig.clf()
    plt.close()
    gc.collect()

… and a way to make a movie out of these plots for each snapshot so that we can watch our work.

def makeMovieOfFuzzyClustersOverTime(snapshotFilePaths, workingDirectoryPath, particleName, axisLimits, frameRate):
    """Makes a movie of the fuzzy clusters found by AstroLink and FuzzyCat as
    they evolve over time.
    """
    saveFileNameStem = f"{workingDirectoryPath}Cluster_plots/plotted_clusters_"

    clusterFileNames = np.load(workingDirectoryPath + 'clusterFileNames.npy')
    ordering = np.load(workingDirectoryPath + 'ordering.npy')
    fuzzyClusters = np.load(workingDirectoryPath + 'fuzzyClusters.npy')
    whichCluster = -np.ones(clusterFileNames.size, dtype = np.int32)
    for i, clst in enumerate(fuzzyClusters):
        whichCluster[ordering[clst[0]:clst[1]]] = i

    for index, snapshotFilePath in enumerate(snapshotFilePaths):
        print(f"Loading {snapshotFilePath.split('/')[-1]}                                                         \t\t", end = '\r')
        # Load the galaxy
        particleArr, _, _ = loadGalaxyAsArrays(snapshotFilePath, particleName)

        print(f"Loading clusters of {particleName} particles from snapshot {snapshotFilePath.split('/')[-1]}      \t\t", end = '\r')
        # Load AstroLink clusters (found in this snapshot) that belong to the fuzzy clusters from FuzzyCat
        clusters_raw, fuzzyLabels = [], []
        for clusterFileName, whichFuzzyClst in zip(clusterFileNames, whichCluster):
            clstSnapshot = int(clusterFileName.split('_')[0])
            if whichFuzzyClst != -1 and clstSnapshot == index:
                cluster_raw = np.load(workingDirectoryPath + 'Clusters_raw/' + clusterFileName)
                clusters_raw.append(cluster_raw)
                fuzzyLabels.append(whichFuzzyClst)

        # Make plot of clusters
        snapshotFileName = snapshotFilePath.split('/')[-1]
        print(f"Plotting {snapshotFileName} clusters                                               \t\t", end = '\r')
        paintLabelsOntoSnapshot(particleArr, clusters_raw, fuzzyLabels, saveFileNameStem, snapshotFileName, axisLimits)

    # Make movie
    (
        ffmpeg
        .input(f"{workingDirectoryPath}Cluster_plots/plotted_clusters_*.png", pattern_type = 'glob', framerate = frameRate)
        .output(f"{workingDirectoryPath}{workingDirectoryPath.split('/')[-2]}_movie.mp4")
        .run()
    )

Our simulation files have already had the AHF galaxy/(sub)halo finder + MergerTree code applied to them. So as a comparison, we also write a function to make the equivalent movies but with these clustering results instead.

def makeMovieOfAHFHaloesOverTime(snapshotFilePaths, workingDirectoryPath, mtreeIdxFilePaths, particleName, axisLimits, frameRate):
    """Makes a movie of AHF haloes as they evolve over time.
    """
    saveFileNameStem = f"{workingDirectoryPath}Halo_plots/plotted_ahf_haloes_"

    for fileIndex, (snapshotFilePath, mtreeIdxFilePath) in enumerate(zip(snapshotFilePaths, mtreeIdxFilePaths)):
        print(f"Loading {snapshotFilePath.split('/')[-1]}                                                         \t\t", end = '\r')
        # Load the galaxy
        particleArr, particleIDs, simulation = loadGalaxyAsArrays(snapshotFilePath, particleName)

        print(f"Loading AHF haloes of {particleName} particles from snapshot {snapshotFilePath.split('/')[-1]}    \t\t", end = '\r')
        # Load AHF haloes (found in this snapshot)
        clusters_raw, ahfLabels = [], []
        haloes = simulation.halos()
        mainHalo = haloes[1]
        subhaloIDList = mainHalo.properties['children']

        # Load merger tree info and track labels between snapshots
        with open(mtreeIdxFilePath, 'r') as mtreeFile:
            lines = mtreeFile.readlines()[1:]
        haloID_mainProgenitors = np.empty((len(lines), 2), dtype = np.int32)
        for i, line in enumerate(lines):
            for j, haloID in enumerate(line[:-1].split()):
                haloID_mainProgenitors[i, j] = int(haloID)
        labelTracker = [i for i in range(100000)] # Large enough too account for the max 'halo_id' value that occurs
        if fileIndex > 0:
            for i, j in haloID_mainProgenitors:
                labelTracker[i] = oldLabelTracker[j]

        while subhaloIDList:
            # Get next subhalo object and extend subhaloes list with any new sub-subhaloes
            subhaloID = subhaloIDList.pop(0)
            subhalo = haloes[subhaloID]
            if subhalo.properties['numSubStruct']: subhaloIDList.extend(subhalo.properties['children'])
            # Create an array of indices (relative to mainHalo) for the particles in this subhalo
            if particleName == 'dark': subhaloParticleIDs = subhalo.dm['iord']
            if particleName == 'stars': subhaloParticleIDs = subhalo.stars['iord']
            if particleName == 'gas': subhaloParticleIDs = subhalo.gas['iord']
            cluster_raw = np.where(np.isin(particleIDs, subhaloParticleIDs))[0]
            clusters_raw.append(cluster_raw)
            # Append a unique subhalo label that accounts for the merger tree
            ahfLabels.append(labelTracker[subhaloID - 1])
        oldLabelTracker = labelTracker

        # Make plot of clusters
        snapshotFileName = snapshotFilePath.split('/')[-1]
        print(f"Plotting {snapshotFileName} AHF haloes                                                            \t\t", end = '\r')
        paintLabelsOntoSnapshot(particleArr, clusters_raw, ahfLabels, saveFileNameStem, snapshotFileName, axisLimits, withDiskZoomIn = False)

    # Make movie
    (
        ffmpeg
        .input(f"{saveFileNameStem}*.png", pattern_type = 'glob', framerate = frameRate)
        .output(f"{workingDirectoryPath}{workingDirectoryPath.split('/')[-2]}_ahf_movie.mp4")
        .run()
    )

Lastly, we need to set up our file paths, calculate some properties for the above methods, and run the pipeline:

if __name__ == '__main__':
    """Run phase-temporal clustering pipeline :)
    """

    # Choose a particle from ['dark', 'stars', 'gas'] to cluster
    particleName = 'stars'

    # Set up the working directory
    galaxyFolderName = '8.26e11_zoom_2_new_run'
    workingDirectoryPath = f"/PATH/TO/YOUR/WORKING/DIRECTORY/nihao_uhd_{galaxyFolderName}_{particleName}/"
    if not os.path.exists(workingDirectoryPath):
        os.makedirs(workingDirectoryPath)
        os.makedirs(f"{workingDirectoryPath}Clusters_raw/")
        os.makedirs(f"{workingDirectoryPath}Clusters_iord/")
        os.makedirs(f"{workingDirectoryPath}Clusters/")
        os.makedirs(f"{workingDirectoryPath}Cluster_plots/")
        os.makedirs(f"{workingDirectoryPath}Halo_plots/")

    # Get the simulation snapshot file paths
    simulationDirectoryPath = f"/PATH/TO/YOUR/SIMULATION/DIRECTORY/nihao_uhd/{galaxyFolderName}/"
    snapshotFilePrefix = '8.26e11.'
    snapshotNumberRange = range(1164, 2001)
    snapshotFilePaths = [f"{simulationDirectoryPath}{snapshotFilePrefix}{i:05}" for i in snapshotNumberRange]

    # Get merger tree info files for AHF comparison movie
    mtreeIdxFilePaths = [fileName for fileName in os.listdir(simulationDirectoryPath) if fileName.endswith('.AHF_mtree_idx') and int(fileName.split('.')[2]) in snapshotNumberRange]
    reorder = np.argsort([int(fileName.split('.')[2]) for fileName in mtreeIdxFilePaths])
    mtreeIdxFilePaths = [f"{simulationDirectoryPath}{mtreeIdxFilePaths[i]}" for i in reorder]

    # Info for the clustering pipeline
    nSamples = len(snapshotFilePaths)
    minLongevityOfFuzzyClusters = 230 # The minimum life-span of fuzzy clusters in Mega-years
    ageOfTheUniverse = 13800 # Age of the Universe in Mega-years
    # Calculate the minStability parameter so that fuzzy clusters live for at least `minLongevityOfFuzzyClusters` Myrs`
    minStability = minLongevityOfFuzzyClusters*(snapshotNumberRange.stop - 1)/(ageOfTheUniverse*snapshotNumberRange.step*nSamples)
    # Choose appropriate axis limits (in kpc) for the movie
    axisLimits = 150
    # Calculate movie frame rate so that 100 Myrs pass every second
    frameRate = 100*(snapshotNumberRange.stop - 1)/(ageOfTheUniverse*snapshotNumberRange.step)

    # Do clustering over snapshots with AstroLink
    findAndSaveClustersFromSnapshots(snapshotFilePaths, workingDirectoryPath, particleName, nSamples)

    # Run FuzzyCat on AstroLink clusters
    runFuzzyCatOnClustersFromSnapshots(workingDirectoryPath, nSamples, minStability)

    # Make movie of stable clusters over time
    makeMovieOfFuzzyClustersOverTime(snapshotFilePaths, workingDirectoryPath, particleName, axisLimits, frameRate)

    # Make movie of AHF haloes for comparison
    makeMovieOfAHFHaloesOverTime(snapshotFilePaths, workingDirectoryPath, mtreeIdxFilePaths, particleName, axisLimits, frameRate)

Results: Let’s visualise the clusters!

If we run the above pipeline on the stellar particles of each of our NIHAO-UHD galaxies, then we get the movies in the following subsections – which obviously contain a great deal of information on the nature of the formation and evolution of the respective galaxies. Among the structures extracted by our approach are; dwarf galaxies, infalling groups, stellar streams (and their progenitors), stellar shells, galactic bulges, and star-forming regions.

FuzzyCat \(\circ\) AstroLink: g2.79e12

FuzzyCat \(\circ\) AstroLink: g8.26e11

FuzzyCat \(\circ\) AstroLink: g1.12e12

FuzzyCat \(\circ\) AstroLink: g6.96e11

FuzzyCat \(\circ\) AstroLink: g7.08e11

FuzzyCat \(\circ\) AstroLink: g7.55e11

Results: A comparison to a more traditional halo finder

By comparison, traditional approaches are not able to find most of the structure we see with FuzzyCat \(\circ\) AstroLink. In fact, most are only capable of finding a subset what our approach finds which is (or is mostly) self-bound – this can be seen with the corresponding results from AHF.

AHF: g2.79e12

AHF: g8.26e11

AHF: g1.12e12

AHF: g6.96e11

AHF: g7.08e11

AHF: g7.55e11

Conclusions and outlook

In this work, we have demonstrated the effectiveness of the FuzzyCat \(\circ\) AstroLink pipeline as a novel unsupervised machine learning approach – particularly as a tool for analysing simulated galaxies in the context of galaxy formation and evolution. By applying our pipeline to the NIHAO-UHD suite, we have shown that it can successfully identify a diverse range of astrophysical structures that traditional halo finder (+ merger tree) methods do not – capture transient and tidally disrupted structures that are often overlooked in conventional analyses. As such, it provides the means to a more comprehensive understanding of galaxy formation and evolution.