Manage multi-modal data#

Hide code cell content
!lamin init --storage ./test-multimodal --schema bionty
import lamindb as ln
import lnschema_bionty as lb

lb.settings.species = "human"
ln.settings.verbosity = 3
ln.track()

MuData object#

Let’s use a MuData object:

Hide code cell content
mdata = ln.dev.datasets.mudata_papalexi21_subset()
mdata

First we register the file:

file = ln.File(
    "papalexi21_subset.h5mu", description="Sub-sampled MuData from Papalexi21"
)
file.save()

Register features#

Now let’s register the 3 feature sets this data contains:

  1. rna

  2. adt

  3. obs (metadata)

modalities#

For the two modalities rna and adt, we use bionty tables as the reference:

mdata["rna"].var_names[:5]
feature_set_rna = ln.FeatureSet.from_values(
    mdata["rna"].var_names, field=lb.Gene.symbol
)
mdata["adt"].var_names
feature_set_adt = ln.FeatureSet.from_values(
    mdata["adt"].var_names, field=lb.CellMarker.name
)

Link them to file:

file.features.add_feature_set(feature_set_rna, slot="rna")
file.features.add_feature_set(feature_set_adt, slot="adt")

metadata#

The 3rd feature set is the obs:

obs = mdata["rna"].obs

We’re only interested in a single metadata column:

ln.Feature(name="gene_target", type="category").save()
feature_set_obs = ln.FeatureSet.from_df(obs, "metadata")
file.features.add_feature_set(feature_set_obs, slot="obs")
gene_targets = lb.Gene.from_values(obs["gene_target"], "symbol")
file.features.add_labels(gene_targets)
labels = []
for col in ["orig.ident", "perturbation", "replicate", "Phase", "guide_ID"]:
    labels += ln.Label.from_values(obs[col])
file.features.add_labels(labels)
file.features
file.describe()