Does anyone have a good setup/configuration for converting documents to Obsidian-flavored markdown with Pandoc? I’ve been fiddling with it for a few hours but can’t seem to get everything right:
- Obsidian markdown doesn’t support
^superscript^
. I can get Pandoc to usesup
instead by allowingraw_html
, but then… - Image embeds don’t work. Pandoc wants to use
img
for some reason, and no matter what relative src I use the image just won’t show up.
I could fix all of this by running the files through a linter of some sort, but I feel like I’m missing something. Surely someone must have had these issues before me, right?
I got this mostly working, but it was not easy. Not only does Obsidian have a few peculiarities that make it less compatible with standard Markdown, but Word also does a few funny things.
Here’s the
config.yaml
I used for Pandoc:from: docx to: markdown-smart-simple_tables-multiline_tables-grid_tables+pipe_tables+yaml_metadata_block-superscript-subscript-bracketed_spans-native_spans-link_attributes-raw_html+rebase_relative_paths+four_space_rule extract-media: "./" wrap: preserve markdown-headings: atx tab-stop: 2 shift-heading-level-by: 1 standalone: true template: obsidian.md filters: - compact-list.lua - remove-single-characters.py - remove-extra-linebreaks.py metadata: tags: "tags/go/here"
The three filters:
- Removed extra linebreaks added between bulleted lists to make them more compact.
- Removed lines with only a single character in them. Usually an invisible character like
nbsp
, which made Pandoc’s linter not remove them automatically. - Removes linebreaks enclosed in
Strong
tags. This is an artifact from Word where a line is bolded but has no content: technically the line break is bolded.
I then ran the resulting file through a RegExp replacement to change the superscript carats into HTML
sup
tags.Even after all this, I still have to go through with an Obsidian plugin to convert the standard Markdown links and embeds into
[[Wikilink]]
style, since Obsidian will only do one or the other throughout your whole vault.Not sure about a specific plugin but couldn’t you sed your way out of it?
I’ve done something like this converting html to obsidian md. I interrogated gpt 3.5 with specifically what I needed to accomplish and went from there. If you can’t accomplish a formatting quirk in the same conversion process you might run iterative processes to accomplish them after conversion. I’ve done similar with BBEdit and vs code basically to find and replace across a lot of documents.
Oh wait I think you want to expressly use pandoc, my bad