aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/custom-readers.md684
-rw-r--r--doc/custom-writers.md52
-rw-r--r--doc/faqs.md8
-rw-r--r--doc/filters.md18
-rw-r--r--doc/getting-started.md1
-rw-r--r--doc/jats.md36
-rw-r--r--doc/lua-filters.md743
-rw-r--r--doc/short-guide-to-pandocs-sources.md259
8 files changed, 1676 insertions, 125 deletions
diff --git a/doc/custom-readers.md b/doc/custom-readers.md
new file mode 100644
index 000000000..37b6d6a3e
--- /dev/null
+++ b/doc/custom-readers.md
@@ -0,0 +1,684 @@
+---
+author:
+- John MacFarlane
+date: 'November 18, 2021'
+title: Creating Custom Pandoc Readers in Lua
+---
+
+# Introduction
+
+If you need to parse a format not already handled by pandoc,
+you can create a custom reader using the [Lua] language.
+Pandoc has a built-in Lua interpreter, so you needn't
+install any additional software to do this.
+
+[Lua]: https://www.lua.org
+
+A custom reader is a Lua file that defines a function
+called `Reader`, which takes two arguments:
+
+- the raw input to be parsed, as a list of sources
+- optionally, a table of reader options, e.g.
+ `{ columns = 62, standalone = true }`.
+
+The `Reader` function should return a `Pandoc` AST.
+This can be created using functions in the [`pandoc` module],
+which is automatically in scope. (Indeed, all of the utility
+functions that are available for [Lua filters] are available
+in custom readers, too.)
+
+Each source item corresponds to a file or stream passed to pandoc
+containing its text and name. E.g., if a single file `input.txt`
+is passed to pandoc, then the list of sources will contain just a
+single element `s`, where `s.name == 'input.txt'` and `s.text`
+contains the file contents as a string.
+
+The sources list, as well as each of its elements, can be
+converted to a string via the Lua standard library function
+`tostring`.
+
+[Lua filters]: https://pandoc.org/lua-filters.html
+[`pandoc` module]: https://pandoc.org/lua-filters.html#module-pandoc
+
+A minimal example would be
+
+```lua
+function Reader(input)
+ return pandoc.Pandoc({ pandoc.CodeBlock(tostring(input)) })
+end
+```
+
+This just returns a document containing a big code block with all
+of the input. Or, to create a separate code block for each input
+file, one might write
+
+``` lua
+function Reader(input)
+ return pandoc.Pandoc(input:map(
+ function (s) return pandoc.CodeBlock(s.text) end))
+end
+```
+
+In a nontrivial reader, you'll want to parse the input.
+You can do this using standard Lua library functions
+(for example, the [patterns] library), or with the powerful
+and fast [lpeg] parsing library, which is automatically in scope.
+You can also use external Lua libraries (for example,
+an XML parser).
+
+A previous pandoc version passed a raw string instead of a list
+of sources to the Reader function. Reader functions that rely on
+this are obsolete, but still supported: Pandoc analyzes any
+script error, detecting when code assumed the old behavior. The
+code is rerun with raw string input in this case, thereby
+ensuring backwards compatibility.
+
+[patterns]: http://lua-users.org/wiki/PatternsTutorial
+[lpeg]: http://www.inf.puc-rio.br/~roberto/lpeg/
+
+# Example: plain text reader
+
+This is a simple example using [lpeg] to parse the input
+into space-separated strings and blankline-separated paragraphs.
+
+```lua
+-- A sample custom reader that just parses text into blankline-separated
+-- paragraphs with space-separated words.
+
+-- For better performance we put these functions in local variables:
+local P, S, R, Cf, Cc, Ct, V, Cs, Cg, Cb, B, C, Cmt =
+ lpeg.P, lpeg.S, lpeg.R, lpeg.Cf, lpeg.Cc, lpeg.Ct, lpeg.V,
+ lpeg.Cs, lpeg.Cg, lpeg.Cb, lpeg.B, lpeg.C, lpeg.Cmt
+
+local whitespacechar = S(" \t\r\n")
+local wordchar = (1 - whitespacechar)
+local spacechar = S(" \t")
+local newline = P"\r"^-1 * P"\n"
+local blanklines = newline * (spacechar^0 * newline)^1
+local endline = newline - blanklines
+
+-- Grammar
+G = P{ "Pandoc",
+ Pandoc = Ct(V"Block"^0) / pandoc.Pandoc;
+ Block = blanklines^0 * V"Para" ;
+ Para = Ct(V"Inline"^1) / pandoc.Para;
+ Inline = V"Str" + V"Space" + V"SoftBreak" ;
+ Str = wordchar^1 / pandoc.Str;
+ Space = spacechar^1 / pandoc.Space;
+ SoftBreak = endline / pandoc.SoftBreak;
+}
+
+function Reader(input)
+ return lpeg.match(G, tostring(input))
+end
+```
+
+Example of use:
+
+```
+% pandoc -f plain.lua -t native
+*Hello there*, this is plain text with no formatting
+except paragraph breaks.
+
+- Like this one.
+^D
+[ Para
+ [ Str "*Hello"
+ , Space
+ , Str "there*,"
+ , Space
+ , Str "this"
+ , Space
+ , Str "is"
+ , Space
+ , Str "plain"
+ , Space
+ , Str "text"
+ , Space
+ , Str "with"
+ , Space
+ , Str "no"
+ , Space
+ , Str "formatting"
+ , SoftBreak
+ , Str "except"
+ , Space
+ , Str "paragraph"
+ , Space
+ , Str "breaks."
+ ]
+, Para
+ [ Str "-"
+ , Space
+ , Str "Like"
+ , Space
+ , Str "this"
+ , Space
+ , Str "one."
+ ]
+]
+```
+
+# Example: a RIS bibliography reader
+
+This is a parser for [RIS bibliography] files. It can be used
+to convert them to CSL JSON or YAML, BibTeX, or BibLaTeX.
+
+[RIS bibliography]: https://en.wikipedia.org/wiki/RIS_(file_format)
+
+```lua
+-- A sample custom reader for RIS bibliography format
+-- https://en.wikipedia.org/wiki/RIS_(file_format)
+-- The references are converted to inline pandoc/CSL YAML
+-- references in the metadata.
+
+local inspect = require"inspect"
+
+local types =
+ { ABST = "article",
+ ADVS = "motion-picture",
+ AGGR = "dataset",
+ ANCIENT = "book",
+ ART = "graphic",
+ BILL = "bill",
+ BLOG = "post-weblog",
+ BOOK = "book",
+ CASE = "legal_case",
+ CHAP = "chapter",
+ CHART = "graphic",
+ CLSWK = "book",
+ COMP = "program",
+ CONF = "paper-conference",
+ CPAPER = "paper-conference",
+ CTLG = "catalog",
+ DATA = "dataset",
+ DBASE = "dataset",
+ DICT = "book",
+ EBOOK = "book",
+ ECHAP = "chapter",
+ EDBOOK = "book",
+ EJOUR = "article",
+ WEB = "webpage",
+ ENCYC = "entry-encyclopedia",
+ EQUA = "figure",
+ FIGURE = "figure",
+ GEN = "entry",
+ GOVDOC = "report",
+ GRANT = "report",
+ HEAR = "report",
+ ICOMM = "personal_communication",
+ INPR = "article-journal",
+ JFULL = "article-journal",
+ JOUR = "article-journal",
+ LEGAL = "legal_case",
+ MANSCPT = "manuscript",
+ MAP = "map",
+ MGZN = "article-magazine",
+ MPCT = "motion-picture",
+ MULTI = "webpage",
+ MUSIC = "musical_score",
+ NEWS = "article-newspaper",
+ PAMP = "pamphlet",
+ PAT = "patent",
+ PCOMM = "personal_communication",
+ RPRT = "report",
+ SER = "article",
+ SLIDE = "graphic",
+ SOUND = "musical_score",
+ STAND = "report",
+ STAT = "legislation",
+ THES = "thesis",
+ UNBILL = "bill",
+ UNPB = "unpublished",
+ VIDEO = "graphic"
+ }
+
+local function clean(refpairs)
+ local ref = {}
+ for i = 1, #refpairs do
+ local k,v = table.unpack(refpairs[i])
+ if k == "TY" then
+ ref["type"] = types[v]
+ elseif k == "VL" then
+ ref.volume = v
+ elseif k == "KW" then
+ ref.keyword = v
+ elseif k == "PB" then
+ ref.publisher = v
+ elseif k == "CY" or k == "PP" then
+ ref["publisher-place"] = v
+ elseif k == "SP" then
+ if ref.page then
+ ref.page = v .. ref.page
+ else
+ ref.page = v
+ end
+ elseif k == "EP" then
+ if ref.page then
+ ref.page = ref.page .. "-" .. v
+ else
+ ref.page = "-" .. v
+ end
+ elseif k == "AU" or k == "A1" or k == "A2" or k == "A3" then
+ if ref.author then
+ table.insert(ref.author, v)
+ else
+ ref.author = {v}
+ end
+ elseif k == "TI" or k == "T1" or k == "CT" or
+ (k == "BT" and ref.type == "book") then
+ ref.title = v
+ elseif k == "ET" then
+ ref.edition = v
+ elseif k == "NV" then
+ ref["number-of-volumes"] = v
+ elseif k == "AB" then
+ ref.abstract = v
+ elseif k == "ED" then
+ if ref.editor then
+ table.insert(ref.editor, v)
+ else
+ ref.editor = {v}
+ end
+ elseif k == "JO" or k == "JF" or k == "T2" or
+ (k == "BT" and ref.type ~= "book") then
+ ref["container-title"] = v
+ elseif k == "PY" or k == "Y1" then
+ ref.issued = v
+ elseif k == "IS" then
+ ref.issue = v
+ elseif k == "SN" then
+ ref.ISSN = v
+ elseif k == "L" then
+ ref.lang = v
+ elseif k == "UR" or k == "LK" then
+ ref.URL = v
+ end
+ end
+ return ref
+end
+
+function Reader(input, reader_options)
+ local refs = {}
+ local thisref = {}
+ local ids = {}
+ for line in string.gmatch(tostring(input), "[^\n]*") do
+ key, val = string.match(line, "([A-Z][A-Z0-9]) %- (.*)")
+ if key == "ER" then
+ -- clean up fields
+ local newref = clean(thisref)
+ -- ensure we have an id and if not, create a sensible one
+ if not newref.id then
+ newref.id = ""
+ for _,x in ipairs(newref.author) do
+ newref.id = newref.id .. string.match(pandoc.utils.stringify(x), "%a+")
+ end
+ if newref.issued then
+ newref.id = newref.id .. string.match(newref.issued, "%d+")
+ end
+ if ids[newref.id] then -- add disambiguator if needed
+ newref.id = newref.id .. "-" .. #ids
+ end
+ end
+ table.insert(ids, newref.id)
+ table.insert(refs, newref)
+ thisref = {}
+ elseif key then
+ table.insert(thisref, {key, val})
+ end
+ end
+ return pandoc.Pandoc({}, pandoc.Meta { references = refs } )
+end
+```
+
+Example of use:
+
+```
+% pandoc -f ris.lua -t bibtex
+TY - JOUR
+AU - Shannon, Claude E.
+PY - 1948
+DA - July
+TI - A Mathematical Theory of Communication
+T2 - Bell System Technical Journal
+SP - 379
+EP - 423
+VL - 27
+ER -
+TY - JOUR
+T1 - On computable numbers, with an application to the Entscheidungsproblem
+A1 - Turing, Alan Mathison
+JO - Proc. of London Mathematical Society
+VL - 47
+IS - 1
+SP - 230
+EP - 265
+Y1 - 1937
+ER -
+^D
+@article{Shannon1948,
+ author = {Shannon, Claude E.},
+ title = {A {Mathematical} {Theory} of {Communication}},
+ journal = {Bell System Technical Journal},
+ volume = {27},
+ pages = {379-423},
+ year = {1948}
+}
+@article{Turing1937,
+ author = {Turing, Alan Mathison},
+ title = {On Computable Numbers, with an Application to the
+ {Entscheidungsproblem}},
+ journal = {Proc. of London Mathematical Society},
+ volume = {47},
+ number = {1},
+ pages = {230-265},
+ year = {1937}
+}
+```
+
+# Example: a wiki Creole reader
+
+This is a parser for [Creole common wiki markup].
+It uses an [lpeg] grammar. Fun fact: this custom reader is faster than
+pandoc's built-in creole reader! This shows that high-performance
+readers can be designed in this way.
+
+[Creole common wiki markup]: http://www.wikicreole.org/wiki/CheatSheet
+
+
+```lua
+-- A sample custom reader for Creole 1.0 (common wiki markup)
+-- http://www.wikicreole.org/wiki/CheatSheet
+
+-- For better performance we put these functions in local variables:
+local P, S, R, Cf, Cc, Ct, V, Cs, Cg, Cb, B, C, Cmt =
+ lpeg.P, lpeg.S, lpeg.R, lpeg.Cf, lpeg.Cc, lpeg.Ct, lpeg.V,
+ lpeg.Cs, lpeg.Cg, lpeg.Cb, lpeg.B, lpeg.C, lpeg.Cmt
+
+local whitespacechar = S(" \t\r\n")
+local specialchar = S("/*~[]\\{}|")
+local wordchar = (1 - (whitespacechar + specialchar))
+local spacechar = S(" \t")
+local newline = P"\r"^-1 * P"\n"
+local blankline = spacechar^0 * newline
+local endline = newline * #-blankline
+local endequals = spacechar^0 * P"="^0 * spacechar^0 * newline
+local cellsep = spacechar^0 * P"|"
+
+local function trim(s)
+ return (s:gsub("^%s*(.-)%s*$", "%1"))
+end
+
+local function ListItem(lev, ch)
+ local start
+ if ch == nil then
+ start = S"*#"
+ else
+ start = P(ch)
+ end
+ local subitem = function(c)
+ if lev < 6 then
+ return ListItem(lev + 1, c)
+ else
+ return (1 - 1) -- fails
+ end
+ end
+ local parser = spacechar^0
+ * start^lev
+ * #(- start)
+ * spacechar^0
+ * Ct((V"Inline" - (newline * spacechar^0 * S"*#"))^0)
+ * newline
+ * (Ct(subitem("*")^1) / pandoc.BulletList
+ +
+ Ct(subitem("#")^1) / pandoc.OrderedList
+ +
+ Cc(nil))
+ / function (ils, sublist)
+ return { pandoc.Plain(ils), sublist }
+ end
+ return parser
+end
+
+-- Grammar
+G = P{ "Doc",
+ Doc = Ct(V"Block"^0)
+ / pandoc.Pandoc ;
+ Block = blankline^0
+ * ( V"Header"
+ + V"HorizontalRule"
+ + V"CodeBlock"
+ + V"List"
+ + V"Table"
+ + V"Para") ;
+ Para = Ct(V"Inline"^1)
+ * newline
+ / pandoc.Para ;
+ HorizontalRule = spacechar^0
+ * P"----"
+ * spacechar^0
+ * newline
+ / pandoc.HorizontalRule;
+ Header = (P("=")^1 / string.len)
+ * spacechar^1
+ * Ct((V"Inline" - endequals)^1)
+ * endequals
+ / pandoc.Header;
+ CodeBlock = P"{{{"
+ * blankline
+ * C((1 - (newline * P"}}}"))^0)
+ * newline
+ * P"}}}"
+ / pandoc.CodeBlock;
+ Placeholder = P"<<<"
+ * C(P(1) - P">>>")^0
+ * P">>>"
+ / function() return pandoc.Div({}) end;
+ List = V"BulletList"
+ + V"OrderedList" ;
+ BulletList = Ct(ListItem(1,'*')^1)
+ / pandoc.BulletList ;
+ OrderedList = Ct(ListItem(1,'#')^1)
+ / pandoc.OrderedList ;
+ Table = (V"TableHeader" + Cc{})
+ * Ct(V"TableRow"^1)
+ / function(headrow, bodyrows)
+ local numcolumns = #(bodyrows[1])
+ local aligns = {}
+ local widths = {}
+ for i = 1,numcolumns do
+ aligns[i] = pandoc.AlignDefault
+ widths[i] = 0
+ end
+ return pandoc.utils.from_simple_table(
+ pandoc.SimpleTable({}, aligns, widths, headrow, bodyrows))
+ end ;
+ TableHeader = Ct(V"HeaderCell"^1)
+ * cellsep^-1
+ * spacechar^0
+ * newline ;
+ TableRow = Ct(V"BodyCell"^1)
+ * cellsep^-1
+ * spacechar^0
+ * newline ;
+ HeaderCell = cellsep
+ * P"="
+ * spacechar^0
+ * Ct((V"Inline" - (newline + cellsep))^0)
+ / function(ils) return { pandoc.Plain(ils) } end ;
+ BodyCell = cellsep
+ * spacechar^0
+ * Ct((V"Inline" - (newline + cellsep))^0)
+ / function(ils) return { pandoc.Plain(ils) } end ;
+ Inline = V"Emph"
+ + V"Strong"
+ + V"LineBreak"
+ + V"Link"
+ + V"URL"
+ + V"Image"
+ + V"Str"
+ + V"Space"
+ + V"SoftBreak"
+ + V"Escaped"
+ + V"Placeholder"
+ + V"Code"
+ + V"Special" ;
+ Str = wordchar^1
+ / pandoc.Str;
+ Escaped = P"~"
+ * C(P(1))
+ / pandoc.Str ;
+ Special = specialchar
+ / pandoc.Str;
+ Space = spacechar^1
+ / pandoc.Space ;
+ SoftBreak = endline
+ * # -(V"HorizontalRule" + V"CodeBlock")
+ / pandoc.SoftBreak ;
+ LineBreak = P"\\\\"
+ / pandoc.LineBreak ;
+ Code = P"{{{"
+ * C((1 - P"}}}")^0)
+ * P"}}}"
+ / trim / pandoc.Code ;
+ Link = P"[["
+ * C((1 - (P"]]" + P"|"))^0)
+ * (P"|" * Ct((V"Inline" - P"]]")^1))^-1 * P"]]"
+ / function(url, desc)
+ local txt = desc or {pandoc.Str(url)}
+ return pandoc.Link(txt, url)
+ end ;
+ Image = P"{{"
+ * #-P"{"
+ * C((1 - (S"}"))^0)
+ * (P"|" * Ct((V"Inline" - P"}}")^1))^-1
+ * P"}}"
+ / function(url, desc)
+ local txt = desc or ""
+ return pandoc.Image(txt, url)
+ end ;
+ URL = P"http"
+ * P"s"^-1
+ * P":"
+ * (1 - (whitespacechar + (S",.?!:;\"'" * #whitespacechar)))^1
+ / function(url)
+ return pandoc.Link(pandoc.Str(url), url)
+ end ;
+ Emph = P"//"
+ * Ct((V"Inline" - P"//")^1)
+ * P"//"
+ / pandoc.Emph ;
+ Strong = P"**"
+ * Ct((V"Inline" -P"**")^1)
+ * P"**"
+ / pandoc.Strong ;
+}
+
+function Reader(input, reader_options)
+ return lpeg.match(G, tostring(input))
+end
+```
+
+Example of use:
+
+```
+% pandoc -f creole.lua -t markdown
+== Wiki Creole
+
+You can make things **bold** or //italic// or **//both//** or //**both**//.
+
+Character formatting extends across line breaks: **bold,
+this is still bold. This line deliberately does not end in star-star.
+
+Not bold. Character formatting does not cross paragraph boundaries.
+
+You can use [[internal links]] or [[http://www.wikicreole.org|external links]],
+give the link a [[internal links|different]] name.
+^D
+## Wiki Creole
+
+You can make things **bold** or *italic* or ***both*** or ***both***.
+
+Character formatting extends across line breaks: \*\*bold, this is still
+bold. This line deliberately does not end in star-star.
+
+Not bold. Character formatting does not cross paragraph boundaries.
+
+You can use [internal links](internal links) or [external
+links](http://www.wikicreole.org), give the link a
+[different](internal links) name.
+```
+
+# Example: parsing JSON from an API
+
+This custom reader consumes the JSON output of
+<https://www.reddit.com/r/haskell.json> and produces
+a document containing the current top articles on the
+Haskell subreddit.
+
+It assumes that the `luajson` library is available. (It can be
+installed using `luarocks install luajson`---but be sure you are
+installing it for Lua 5.3, which is the version packaged with
+pandoc.)
+
+
+```lua
+-- consumes the output of https://www.reddit.com/r/haskell.json
+
+local json = require'json' -- luajson must be available
+
+local function read_inlines(raw)
+ local doc = pandoc.read(raw, "commonmark")
+ return pandoc.utils.blocks_to_inlines(doc.blocks)
+end
+
+local function read_blocks(raw)
+ local doc = pandoc.read(raw, "commonmark")
+ return doc.blocks
+end
+
+function Reader(input)
+
+ local parsed = json.decode(tostring(input))
+ local blocks = {}
+
+ for _,entry in ipairs(parsed.data.children) do
+ local d = entry.data
+ table.insert(blocks, pandoc.Header(2,
+ pandoc.Link(read_inlines(d.title), d.url)))
+ for _,block in ipairs(read_blocks(d.selftext)) do
+ table.insert(blocks, block)
+ end
+ end
+
+ return pandoc.Pandoc(blocks)
+
+end
+```
+
+Similar code can be used to consume JSON output from other APIs.
+
+Note that the content of the text fields is markdown, so we
+convert it using `pandoc.read()`.
+
+
+# Example: syntax-highlighted code files
+
+This is a reader that puts the content of each input file into a
+code block, sets the file's extension as the block's class to
+enable code highlighting, and places the filename as a header
+above each code block.
+
+``` lua
+function to_code_block (source)
+ local _, lang = pandoc.path.split_extension(source.name)
+ return pandoc.Div{
+ pandoc.Header(1, source.name == '' and '<stdin>' or source.name),
+ pandoc.CodeBlock(source.text, {class=lang}),
+ }
+end
+
+function Reader (input, opts)
+ return pandoc.Pandoc(input:map(to_code_block))
+end
+```
diff --git a/doc/custom-writers.md b/doc/custom-writers.md
new file mode 100644
index 000000000..6df603288
--- /dev/null
+++ b/doc/custom-writers.md
@@ -0,0 +1,52 @@
+---
+author:
+- John MacFarlane
+date: 'November 21, 2021'
+title: Creating Custom Pandoc Writers in Lua
+---
+
+# Introduction
+
+If you need to render a format not already handled by pandoc,
+or you want to change how pandoc renders a format,
+you can create a custom writer using the [Lua] language.
+Pandoc has a built-in Lua interpreter, so you needn't
+install any additional software to do this.
+
+[Lua]: https://www.lua.org
+
+A custom writer is a Lua file that defines functions for
+rendering each element of a pandoc AST.
+
+For example,
+
+``` lua
+function Para(s)
+ return "<paragraph>" .. s .. "</paragraph>"
+end
+```
+
+The best way to go about creating a custom writer is to modify
+the example that comes with pandoc. To get the example, you
+can do
+
+```
+pandoc --print-default-data-file sample.lua > sample.lua
+```
+
+# A custom HTML writer
+
+`sample.lua` is a full-features HTML writer, with explanatory
+comments. To use it, just use the path to the custom writer as
+the writer name:
+
+```
+pandoc -t sample.lua myfile.md
+```
+
+`sample.lua` defines all the functions needed by any custom
+writer, so you can design your own custom writer by modifying
+the functions in `sample.lua` according to your needs.
+
+``` {.lua include="sample.lua"}
+```
diff --git a/doc/faqs.md b/doc/faqs.md
index 2f3368837..317153d5f 100644
--- a/doc/faqs.md
+++ b/doc/faqs.md
@@ -124,5 +124,13 @@ topography as TeX, particularly when it comes to math, but they
may be fine for many purposes.)
+## Converting to PDF on an M1 Mac, I get a "Cannot allocate memory" error.
+
+We are not sure why this happens, but we have found that
+fully specifying the `pdflatex` path avoids the error. For
+example,
+
+ pandoc -o my.pdf --pdf-engine=/Library/TeX/texbin/pdflatex
+
:::
diff --git a/doc/filters.md b/doc/filters.md
index 004a83b7d..c921fc007 100644
--- a/doc/filters.md
+++ b/doc/filters.md
@@ -460,36 +460,36 @@ filter.
Object fields:
- `readerAbbreviations`
+ `abbreviations`
: set of known abbreviations (array of strings).
- `readerColumns`
+ `columns`
: number of columns in terminal; an integer.
- `readerDefaultImageExtension`
+ default-image-extension`
: default extension for images; a string.
- `readerExtensions`
+ `extensions`
: integer representation of the syntax extensions bit
field.
- `readerIndentedCodeClasses`
+ `indented-code-classes`
: default classes for indented code blocks; array of
strings.
- `readerStandalone`
+ `standalone`
: whether the input was a standalone document with header;
either `true` or `false`.
- `readerStripComments`
+ `strip-comments`
: HTML comments are stripped instead of parsed as raw HTML;
either `true` or `false`.
- `readerTabStop`
+ `tab-stop`
: width (i.e. equivalent number of spaces) of tab stops;
integer.
- `readerTrackChanges`
+ `track-changes`
: track changes setting for docx; one of
`"accept-changes"`, `"reject-changes"`, and
`"all-changes"`.
diff --git a/doc/getting-started.md b/doc/getting-started.md
index a8f7315fd..b73923daa 100644
--- a/doc/getting-started.md
+++ b/doc/getting-started.md
@@ -212,6 +212,7 @@ Now save your file as `test1.md` in the directory
Note: If you use plain text a lot, you'll want a better editor than
`Notepad` or `TextEdit`. You might want to look at
+[Visual Studio Code](https://code.visualstudio.com/) or
[Sublime Text](https://www.sublimetext.com/) or (if you're willing
to put in some time learning an unfamiliar interface)
[Vim](https://www.vim.org) or [Emacs](https://www.gnu.org/software/emacs).
diff --git a/doc/jats.md b/doc/jats.md
index c099e0a04..9b8351840 100644
--- a/doc/jats.md
+++ b/doc/jats.md
@@ -45,17 +45,28 @@ Metadata Values
element.
`affiliation`
- : list of affiliation identifiers; marks the organizations
- with which an author is affiliated. Each identifier in this
- list must also occur as the `id` of an affiliation listed in
- the top-level `affiliation` list.
+ : either full affiliation entries as described in field
+ `affiliation`, or a list of affiliation identifiers.
+
+ The identifiers link to the organizations with which an
+ author is affiliated. Each identifier in this list must
+ also occur as the `id` of an affiliation listed in the
+ top-level `affiliation` list.
+
+ If the top-level `affiliation` field is set, then this
+ entry assumed to be a list of identifiers, and a list of
+ full entries if that field is unset.
+
+ Full entries must be given if the articleauthoring tag
+ set it used, as affiliation links are not allowed in that
+ schema.
`equal-contrib`
: boolean attribute used to mark authors who contributed
equally to the work. The
- [`equal-contrib`][attr:equal-contrib] attribute is added
- to the author's [`<contrib>`] element if this is set to a
- truthy value.
+ [`equal-contrib`][attr:equal-contrib] attribute, set to
+ `yes`, is added to the author's [`<contrib>`] element if
+ this is set to a truthy value.
`cor-id`
: identifier linking to the contributor's correspondence
@@ -336,12 +347,21 @@ Metadata Values
: Additional notes concerning the whole article. Added to the
article's frontmatter via the [`<notes>`][elem:notes] element.
+`subtitle`
+: Subordinate part of the document title. Added to the
+ document's front matter as a
+ [`<subtitle>`][elem:article-title] element.
+
`tags`
: list of keywords. Items are used as contents of the
[`<kwd>`][elem:kwd] element; the elements are grouped in a
[`<kwd-group>`][elem:kwd-group] with the
[`kwd-group-type`][attr:kwd-group-type] value `author`.
+`title`
+: The article title. Added to the document's front matter via the
+ [`<article-title>`][elem:article-title] element.
+
Required Metadata
-----------------
@@ -378,6 +398,7 @@ Required metadata values:
[elem:abstract]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/abstract.html
[elem:article-id]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/article-id.html
[elem:article-meta]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/article-meta.html
+[elem:article-title]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/article-title.html
[elem:copyright-holder]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/copyright-holder.html
[elem:copyright-statement]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/copyright-statement.html
[elem:copyright-year]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/copyright-year.html
@@ -400,6 +421,7 @@ Required metadata values:
[elem:string-name]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/string-name.html
[elem:subj-group]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/subj-group.html
[elem:subject]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/subject.html
+[elem:subtitle]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/subtitle.html
[elem:surname]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/surname.html
[elem:xref]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/xref.html
diff --git a/doc/lua-filters.md b/doc/lua-filters.md
index e2a65fe1e..eba4dcfe9 100644
--- a/doc/lua-filters.md
+++ b/doc/lua-filters.md
@@ -109,7 +109,8 @@ element filtering function. In other words, filter entries will
be called for each corresponding element in the document,
getting the respective element as input.
-The return of a filter function must be one of the following:
+The return value of a filter function must be one of the
+following:
- nil: this means that the object should remain unchanged.
- a pandoc object: this must be of the same type as the input
@@ -173,7 +174,26 @@ This functionality has been added in pandoc 2.9.2.
[Inlines filter example]: #remove-spaces-before-citations
-## Execution Order
+## Traversal order
+
+The traversal order of filters can be selected by setting the key
+`traverse` to either `'topdown'` or `'typewise'`; the default is
+`'typewise'`.
+
+Example:
+
+``` lua
+local filter = {
+ traverse = 'topdown',
+ -- ... filter functions ...
+}
+return {filter}
+```
+
+Support for this was added in pandoc 2.16.3; previous versions
+ignore the `traverse` setting.
+
+### Typewise traversal
Element filter functions within a filter set are called in a
fixed order, skipping any which are not present:
@@ -203,6 +223,31 @@ All functions in set (1) are thus run before those in (2),
causing the filter function for *Meta* to be run before the
filtering of *Str* elements is started.
+### Topdown traversal
+
+It is sometimes more natural to traverse the document tree
+depth-first from the root towards the leaves, and all in a single
+run.
+
+For example, a block list `[Plain [Str "a"], Para [Str
+"b"]]`{.haskell} will try the following filter functions, in
+order: `Blocks`, `Plain`, `Inlines`, `Str`, `Para`, `Inlines`,
+`Str`.
+
+Topdown traversals can be cut short by returning `false` as a
+second value from the filter function. No child-element of
+the returned element is processed in that case.
+
+For example, to exclude the contents of a footnote from being
+processed, one might write
+
+``` lua
+traverse = 'topdown'
+function Note (n)
+ return n, false
+end
+```
+
## Global variables
Pandoc passes additional data to Lua filters by setting global
@@ -245,6 +290,38 @@ variables.
variable is of type [CommonState] and
is read-only.
+`pandoc`
+: The *pandoc* module, described in the next section, is
+ available through the global `pandoc`. The other modules
+ described herein are loaded as subfields under their
+ respective name.
+
+`lpeg`
+: This variable holds the `lpeg` module, a package based on
+ Parsing Expression Grammars (PEG). It provides excellent
+ parsing utilities and is documented on the official [LPeg
+ homepage]. Pandoc uses a built-int version of the library,
+ unless it has been configured by the package maintainer to
+ rely on a system-wide installation.
+
+ Note that the result of `require 'lpeg'` is not necessarily
+ equal to this value; the `require` mechanism prefers the
+ system's lpeg library over the built-in version.
+
+`re`
+: Contains the LPeg.re module, which is built on top of LPeg
+ and offers an implementation of a [regex engine]. Pandoc
+ uses a built-in version of the library, unless it has been
+ configured by the package maintainer to rely on a system-wide
+ installation.
+
+ Note that the result of `require 're` is not necessarily
+ equal to this value; the `require` mechanism prefers the
+ system's lpeg library over the built-in version.
+
+[LPeg homepage]: http://www.inf.puc-rio.br/~roberto/lpeg/
+[regex engine]: http://www.inf.puc-rio.br/~roberto/lpeg/re.html
+
# Pandoc Module
The `pandoc` Lua module is loaded into the filter's Lua
@@ -491,8 +568,9 @@ will output:
This is the filter we use when converting `MANUAL.txt` to man
pages. It converts level-1 headers to uppercase (using
-`walk_block` to transform inline elements inside headers),
-removes footnotes, and replaces links with regular text.
+[`walk`](#type-block:walk) to transform inline elements inside
+headers), removes footnotes, and replaces links with regular
+text.
``` lua
-- we use preloaded text to get a UTF-8 aware 'upper' function
@@ -500,10 +578,11 @@ local text = require('text')
function Header(el)
if el.level == 1 then
- return pandoc.walk_block(el, {
+ return el:walk {
Str = function(el)
return pandoc.Str(text.upper(el.text))
- end })
+ end
+ }
end
end
@@ -579,7 +658,7 @@ wordcount = {
function Pandoc(el)
-- skip metadata, just count body:
- pandoc.walk_block(pandoc.Div(el.blocks), wordcount)
+ el.blocks:walk(wordcount)
print(words .. " words in body")
os.exit(0)
end
@@ -695,12 +774,12 @@ end
function RawBlock(el)
if starts_with('\\begin{tikzpicture}', el.text) then
local filetype = extension_for[FORMAT] or 'svg'
- local fname = system.get_working_directory() .. '/' ..
- pandoc.sha1(el.text) .. '.' .. filetype
+ local fbasename = pandoc.sha1(el.text) .. '.' .. filetype
+ local fname = system.get_working_directory() .. '/' .. fbasename
if not file_exists(fname) then
tikz2image(el.text, filetype, fname)
end
- return pandoc.Para({pandoc.Image({}, fname)})
+ return pandoc.Para({pandoc.Image({}, fbasename)})
else
return el
end
@@ -752,8 +831,8 @@ Usage:
Pandoc document
Values of this type can be created with the
-[`pandoc.Pandoc`](#pandoc.pandoc) constructor. Object equality is
-determined via [`pandoc.utils.equals`].
+[`pandoc.Pandoc`](#pandoc.pandoc) constructor. Pandoc values are
+equal in Lua if and only if they are equal in Haskell.
`blocks`
: document content ([List] of [Blocks])
@@ -761,83 +840,113 @@ determined via [`pandoc.utils.equals`].
`meta`
: document meta information ([Meta] object)
-## Meta {#type-meta}
-Meta information on a document; string-indexed collection of
-[MetaValues].
+### walk {#type-pandoc:walk}
-Values of this type can be created with the
-[`pandoc.Meta`](#pandoc.meta) constructor. Object equality is
-determined via [`pandoc.utils.equals`].
+`walk(self, lua_filter)`
-## MetaValue {#type-metavalue}
+Applies a Lua filter to the Pandoc element. Just as for
+full-document filters, the order in which elements are traversed
+can be controlled by setting the `traverse` field of the filter;
+see the section on [traversal order][Traversal order].
-Document meta information items.
+Parameters:
-Object equality is determined via [`pandoc.utils.equals`].
+`self`
+: the element ([Pandoc](#type-pandoc))
-### MetaBlocks {#type-metablocks}
+`lua_filter`
+: map of filter functions (table)
-A list of blocks usable as meta value ([List] of [Blocks]).
+Result:
-Fields:
+- filtered document ([Pandoc][])
-`tag`, `t`
-: the literal `MetaBlocks` (string)
+Usage:
-### MetaBool {#type-metabool}
+ -- returns `pandoc.Pandoc{pandoc.Para{pandoc.Str 'Bye'}}`
+ return pandoc.Pandoc{pandoc.Para('Hi')}:walk {
+ Str = function (_) return 'Bye' end,
+ }
-Alias for Lua boolean, i.e. the values `true` and `false`.
-### MetaInlines {#type-metainlines}
+## Meta {#type-meta}
-List of inlines used in metadata ([List] of [Inlines])
+Meta information on a document; string-indexed collection of
+[MetaValues].
Values of this type can be created with the
-[`pandoc.MetaInlines`](#pandoc.metainlines) constructor.
+[`pandoc.Meta`](#pandoc.meta) constructor. Meta values are equal
+in Lua if and only if they are equal in Haskell.
-Fields:
-
-`tag`, `t`
-: the literal `MetaInlines` (string)
+## MetaValue {#type-metavalue}
-### MetaList {#type-metalist}
+Document meta information items. This is not a separate type, but
+describes a set of types that can be used in places were a
+MetaValue is expected. The types correspond to the following
+Haskell type constructors:
+
+- boolean → MetaBool
+- string or number → MetaString
+- Inlines → MetaInlines
+- Blocks → MetaBlocks
+- List/integer indexed table → MetaList
+- string-indexed table → MetaMap
+
+The corresponding constructors
+[`pandoc.MetaBool`](#pandoc.metabool),
+[`pandoc.MetaString`](#pandoc.metastring),
+[`pandoc.MetaInlines`](#pandoc.metainlines),
+[`pandoc.MetaBlocks`](#pandoc.metablocks),
+[`pandoc.MetaList`](#pandoc.metalist), and
+[`pandoc.MetaMap`](#pandoc.metamap)
+can be used to ensure that a value is treated in the intended
+way. E.g., an empty table is normally treated as a `MetaMap`, but
+can be made into an empty `MetaList` by calling
+`pandoc.MetaList{}`. However, the same can be accomplished by
+using the generic functions like `pandoc.List`, `pandoc.Inlines`,
+or `pandoc.Blocks`.
-A list of other metadata values ([List] of [MetaValues]).
+## Block {#type-block}
-Values of this type can be created with the
-[`pandoc.MetaList`](#pandoc.metalist) constructor.
+Block values are equal in Lua if and only if they are equal in
+Haskell.
-Fields:
+### Common methods
-`tag`, `t`
-: the literal `MetaList` (string)
+#### walk {#type-block:walk}
-All methods available for [List]s can be used on this type as
-well.
+`walk(self, lua_filter)`
-### MetaMap {#type-metamap}
+Applies a Lua filter to the block element. Just as for
+full-document filters, the order in which elements are traversed
+can be controlled by setting the `traverse` field of the filter;
+see the section on [traversal order][Traversal order].
-A string-indexed map of meta-values. (table).
+Note that the filter is applied to the subtree, but not to the
+`self` block element. The rationale is that otherwise the element
+could be deleted by the filter, or replaced with multiple block
+elements, which might lead to possibly unexpected results.
-Values of this type can be created with the
-[`pandoc.MetaMap`](#pandoc.metamap) constructor.
+Parameters:
-Fields:
+`self`
+: the element ([Block](#type-block))
-`tag`, `t`
-: the literal `MetaMap` (string)
+`lua_filter`
+: map of filter functions (table)
-*Note*: The fields will be shadowed if the map contains a field
-with the same name as those listed.
+Result:
-### MetaString {#type-metastring}
+- filtered block ([Block][])
-Plain Lua string value (string).
+Usage:
-## Block {#type-block}
+ -- returns `pandoc.Para{pandoc.Str 'Bye'}`
+ return pandoc.Para('Hi'):walk {
+ Str = function (_) return 'Bye' end,
+ }
-Object equality is determined via [`pandoc.utils.equals`].
### BlockQuote {#type-blockquote}
@@ -1133,9 +1242,94 @@ left-aligned, right-aligned, and centered, respectively. The
default alignment is `AlignDefault` (often equivalent to
centered).
+## Blocks {#type-blocks}
+
+List of [Block] elements, with the same methods as a generic
+[List](#type-list). It is usually not necessary to create values
+of this type in user scripts, as pandoc can convert other types
+into Blocks wherever a value of this type is expected:
+
+- a list of [Block] (or Block-like) values is used directly;
+- a single [Inlines] value is wrapped into a [Plain] element;
+- string values are turned into an [Inlines] value by splitting
+ the string into words (see [Inlines](#type-inlines)), and
+ then wrapping the result into a Plain singleton.
+
+### Methods
+
+Lists of type `Blocks` share all methods available in generic
+lists, see the [`pandoc.List` module](#module-pandoc.list).
+
+Additionally, the following methods are available on Blocks
+values:
+
+#### walk {#type-blocks:walk}
+
+`walk(self, lua_filter)`
+
+Applies a Lua filter to the Blocks list. Just as for
+full-document filters, the order in which elements are traversed
+can be controlled by setting the `traverse` field of the filter;
+see the section on [traversal order][Traversal order].
+
+Parameters:
+
+`self`
+: the list ([Blocks](#type-blocks))
+
+`lua_filter`
+: map of filter functions (table)
+
+Result:
+
+- filtered list ([Blocks](#type-blocks))
+
+Usage:
+
+ -- returns `pandoc.Blocks{pandoc.Para('Salve!')}`
+ return pandoc.Blocks{pandoc.Plain('Salve!)}:walk {
+ Plain = function (p) return pandoc.Para(p.content) end,
+ }
+
## Inline {#type-inline}
-Object equality is determined via [`pandoc.utils.equals`].
+Inline values are equal in Lua if and only if they are equal in
+Haskell.
+
+### Common methods
+
+#### walk {#type-inline:walk}
+
+`walk(self, lua_filter)`
+
+Applies a Lua filter to the Inline element. Just as for
+full-document filters, the order in which elements are traversed
+can be controlled by setting the `traverse` field of the filter;
+see the section on [traversal order][Traversal order].
+
+Note that the filter is applied to the subtree, but not to the
+`self` inline element. The rationale is that otherwise the
+element could be deleted by the filter, or replaced with multiple
+inline elements, which might lead to possibly unexpected results.
+
+Parameters:
+
+`self`
+: the element ([Inline](#type-inline))
+
+`lua_filter`
+: map of filter functions (table)
+
+Result:
+
+- filtered inline element ([Inline][])
+
+Usage:
+
+ -- returns `pandoc.SmallCaps('SPQR)`
+ return pandoc.SmallCaps('spqr'):walk {
+ Str = function (s) return string.upper(s.text) end,
+ }
### Cite {#type-cite}
@@ -1206,9 +1400,6 @@ Values of this type can be created with the
Fields:
-`attr`
-: attributes ([Attr])
-
`caption`
: text used to describe the image ([List] of [Inlines])
@@ -1216,7 +1407,10 @@ Fields:
: path to the image file (string)
`title`
-: brief image description
+: brief image description (string)
+
+`attr`
+: attributes ([Attr])
`identifier`
: alias for `attr.identifier` (string)
@@ -1503,6 +1697,57 @@ Fields:
`tag`, `t`
: the literal `Underline` (string)
+## Inlines {#type-inlines}
+
+List of [Inline] elements, with the same methods as a generic
+[List](#type-list). It is usually not necessary to create values
+of this type in user scripts, as pandoc can convert other types
+into Blocks wherever a value of this type is expected:
+
+- lists of [Inline] (or Inline-like) values are used directly;
+- single [Inline] values are converted into a list containing
+ just that element;
+- String values are split into words, converting line breaks
+ into [SoftBreak](#type-softbreak) elements, and other
+ whitespace characters into [Spaces](#type-space).
+
+### Methods
+
+Lists of type `Inlines` share all methods available in generic
+lists, see the [`pandoc.List` module](#module-pandoc.list).
+
+Additionally, the following methods are available on *Inlines*
+values:
+
+#### walk {#type-inlines:walk}
+
+`walk(self, lua_filter)`
+
+Applies a Lua filter to the Inlines list. Just as for
+full-document filters, the order in which elements are handled
+are are Inline → Inlines → Block → Blocks. The filter is applied
+to all list items *and* to the list itself.
+
+Parameters:
+
+`self`
+: the list ([Inlines](#type-inlines))
+
+`lua_filter`
+: map of filter functions (table)
+
+Result:
+
+- filtered list ([Inlines](#type-inlines))
+
+Usage:
+
+ -- returns `pandoc.Inlines{pandoc.SmallCaps('SPQR')}`
+ return pandoc.Inlines{pandoc.Emph('spqr')}:walk {
+ Str = function (s) return string.upper(s.text) end,
+ Emph = function (e) return pandoc.SmallCaps(e.content) end,
+ }
+
## Element components
@@ -1522,7 +1767,8 @@ This also works when using the `attr` setter:
local span = pandoc.Span 'text'
span.attr = {id = 'text', class = 'a b', other_attribute = '1'}
-Object equality is determined via [`pandoc.utils.equals`].
+Attr values are equal in Lua if and only if they are equal in
+Haskell.
Fields:
@@ -1540,6 +1786,9 @@ Fields:
List of key/value pairs. Values can be accessed by using keys as
indices to the list table.
+Attributes values are equal in Lua if and only if they are equal
+in Haskell.
+
### Caption {#type-caption}
The caption of a table, with an optional short caption.
@@ -1568,12 +1817,21 @@ Fields:
: cell contents (list of [Blocks]).
`col_span`
-: number of columns occupied by the cell; the height of the cell
- (integer).
+: number of columns spanned by the cell; the width of the cell
+ in columns (integer).
`row_span`
-: number of rows occupied by the cell; the height of the cell
- (integer).
+: number of rows spanned by the cell; the height of the cell in
+ rows (integer).
+
+`identifier`
+: alias for `attr.identifier` (string)
+
+`classes`
+: alias for `attr.classes` ([List] of strings)
+
+`attributes`
+: alias for `attr.attributes` ([Attributes])
### Citation {#type-citation}
@@ -1582,7 +1840,8 @@ Single citation entry
Values of this type can be created with the
[`pandoc.Citation`](#pandoc.citation) constructor.
-Object equality is determined via [`pandoc.utils.equals`].
+Citation values are equal in Lua if and only if they are equal in
+Haskell.
Fields:
@@ -1610,7 +1869,8 @@ Fields:
Column alignment and width specification for a single table
column.
-This is a pair with the following components:
+This is a pair, i.e., a plain table, with the following
+components:
1. cell alignment ([Alignment]).
2. table column width, as a fraction of the total table width
@@ -1623,8 +1883,6 @@ List attributes
Values of this type can be created with the
[`pandoc.ListAttributes`](#pandoc.listattributes) constructor.
-Object equality is determined via [`pandoc.utils.equals`].
-
Fields:
`start`
@@ -1643,10 +1901,13 @@ Fields:
A table row.
-Tuple fields:
+Fields:
-1. row attributes
-2. row cells (list of [Cells])
+`attr`
+: element attributes ([Attr][])
+
+`cells`
+: list of table cells ([List][] of [Cell][]s)
### TableBody {#type-tablebody}
@@ -1656,35 +1917,59 @@ number of row header columns.
Fields:
`attr`
-: table body attributes ([Attr])
+: table body attributes ([Attr][])
`body`
-: table body rows (list of [Rows])
+: table body rows ([List][] of [Row][]s)
`head`
-: intermediate head (list of [Rows])
+: intermediate head ([List][] of [Row][]s)
`row_head_columns`
: number of columns taken up by the row head of each row of a
- [TableBody]. The row body takes up the remaining columns.
+ [TableBody][]. The row body takes up the remaining columns.
### TableFoot {#type-tablefoot}
The foot of a table.
-This is a pair with the following components:
+Fields:
-1. attributes
-2. foot rows ([Rows])
+`attr`
+: element attributes ([Attr][])
+
+`rows`
+: list of rows ([List][] of [Row][]s)
+
+`identifier`
+: alias for `attr.identifier` (string)
+
+`classes`
+: alias for `attr.classes` ([List][] of strings)
+
+`attributes`
+: alias for `attr.attributes` ([Attributes][])
### TableHead {#type-tablehead}
The head of a table.
-This is a pair with the following components:
+Fields:
+
+`attr`
+: element attributes ([Attr][])
+
+`rows`
+: list of rows ([List][] of [Row][]s)
+
+`identifier`
+: alias for `attr.identifier` (string)
+
+`classes`
+: alias for `attr.classes` ([List][] of strings)
-1. attributes
-2. head rows ([Rows])
+`attributes`
+: alias for `attr.attributes` ([Attributes][])
## ReaderOptions {#type-readeroptions}
@@ -1836,7 +2121,7 @@ Values of this type can be created with the
`must_be_at_least(actual, expected [, error_message])`
Raise an error message if the actual version is older than the
-expected version; does nothing if actual is equal to or newer
+expected version; does nothing if `actual` is equal to or newer
than the expected version.
Parameters:
@@ -1867,6 +2152,7 @@ Usage:
[Block]: #type-block
[Blocks]: #type-block
[Caption]: #type-caption
+[Cell]: #type-cell
[Cells]: #type-cell
[Citation]: #type-citation
[Citations]: #type-citation
@@ -1884,6 +2170,8 @@ Usage:
[LogMessage]: #type-logmessage
[Pandoc]: #type-pandoc
[Para]: #type-para
+[Plain]: #type-plain
+[Row]: #type-row
[Rows]: #type-row
[SimpleTable]: #type-simpletable
[Table]: #type-table
@@ -1891,7 +2179,6 @@ Usage:
[TableFoot]: #type-tablefoot
[TableHead]: #type-tablehead
[Version]: #type-version
-[`pandoc.utils.equals`]: #pandoc.utils.equals
# Module text
@@ -1978,71 +2265,83 @@ format, and functions to filter and modify a subtree.
[`MetaBlocks (blocks)`]{#pandoc.metablocks}
-: Meta blocks
+: Creates a value to be used as a MetaBlocks value in meta
+ data; creates a copy of the input list via `pandoc.Blocks`,
+ discarding all non-list keys.
Parameters:
`blocks`:
: blocks
- Returns: [MetaBlocks] object
+ Returns: [Blocks](#type-blocks)
[`MetaInlines (inlines)`]{#pandoc.metainlines}
-: Meta inlines
+: Creates a value to be used as a MetaInlines value in meta
+ data; creates a copy of the input list via `pandoc.Inlines`,
+ discarding all non-list keys.
Parameters:
`inlines`:
: inlines
- Returns: [MetaInlines] object
+ Returns: [Inlines](#types-inlines)
[`MetaList (meta_values)`]{#pandoc.metalist}
-: Meta list
+: Creates a value to be used as a MetaList in meta data;
+ creates a copy of the input list via `pandoc.List`,
+ discarding all non-list keys.
Parameters:
`meta_values`:
: list of meta values
- Returns: [MetaList] object
+ Returns: [List]
[`MetaMap (key_value_map)`]{#pandoc.metamap}
-: Meta map
+: Creates a value to be used as a MetaMap in meta data; creates
+ a copy of the input table, keeping only pairs with string
+ keys and discards all other keys.
Parameters:
`key_value_map`:
: a string-indexed map of meta values
- Returns: [MetaMap] object
+ Returns: table
[`MetaString (str)`]{#pandoc.metastring}
-: Creates string to be used in meta data.
+: Creates a value to be used as a MetaString in meta data; this
+ is the identity function for boolean values and exists only
+ for completeness.
Parameters:
`str`:
: string value
- Returns: [MetaString] object
+ Returns: string
[`MetaBool (bool)`]{#pandoc.metabool}
-: Creates boolean to be used in meta data.
+: Creates a value to be used as MetaBool in meta data; this is
+ the identity function for boolean values and exists only for
+ completeness.
Parameters:
`bool`:
: boolean value
- Returns: [MetaBool] object
+ Returns: boolean
-## Blocks
+## Block
[`BlockQuote (content)`]{#pandoc.blockquote}
@@ -2222,6 +2521,20 @@ format, and functions to filter and modify a subtree.
Returns: [Table](#type-table) object
+## Blocks
+
+[`Blocks (block_like_elements)`]{#pandoc.blocks}
+
+: Creates a [Blocks](#type-blocks) list.
+
+ Parameters:
+
+ `block_like_elements`:
+ : List where each element can be treated as a [Block]
+ value, or a single such value.
+
+ Returns: [Blocks] list
+
## Inline
[`Cite (content, citations)`]{#pandoc.cite}
@@ -2511,6 +2824,27 @@ format, and functions to filter and modify a subtree.
Returns: [Underline](#type-underline) object
+## Inlines
+
+[`Inlines (inline_like_elements)`]{#pandoc.inlines}
+
+: Converts its argument into an [Inlines](#type-inlines) list:
+
+ - copies a list of [Inline] elements into a fresh list; any
+ string `s` within the list is treated as `pandoc.Str(s)`;
+ - turns a single [Inline] into a singleton list;
+ - splits a string into `Str`-wrapped words, treating
+ interword spaces as `Space`s or `SoftBreak`s.
+
+ Parameters:
+
+ `inline_like_elements`:
+ : List where each element can be treated as an [Inline]
+ values, or just a single such value.
+
+ Returns: [Inlines] list
+
+
## Element components
[`Attr ([identifier[, classes[, attributes]]])`]{#pandoc.attr}
@@ -2530,6 +2864,33 @@ format, and functions to filter and modify a subtree.
Returns: [Attr](#type-attr) object
+[`Cell (blocks[, align[, rowspan[, colspan[, attr]]]])`]{#pandoc.attr}
+
+: Create a new table cell.
+
+ Parameters:
+
+ `blocks`:
+ : cell contents (list of [Blocks])
+
+ `align`:
+ : text alignment; defaults to `AlignDefault` (Alignment)
+
+ `rowspan`:
+ : number of rows occupied by the cell; defaults to `1`
+ (integer)
+
+ `colspan`:
+ : number of columns spanned by the cell; defaults to `1`
+ (integer)
+
+ `attr`:
+ : cell attributes ([Attr](#type-attr))
+
+ Returns:
+
+ - [Cell](#type-cell) object
+
[`Citation (id, mode[, prefix[, suffix[, note_num[, hash]]]])`]{#pandoc.citation}
: Creates a single citation.
@@ -2573,6 +2934,42 @@ format, and functions to filter and modify a subtree.
Returns: [ListAttributes](#type-listattributes) object
+[`Row ([cells[, attr]])`]{#pandoc.row}
+
+: Creates a table row.
+
+ Parameters:
+
+ `cells`:
+ : list of table cells in this row
+
+ `attr`:
+ : row attributes
+
+[`TableFoot ([rows[, attr]])`]{#pandoc.tablefoot}
+
+: Creates a table foot.
+
+ Parameters:
+
+ `rows`:
+ : list of table rows
+
+ `attr`:
+ : table foot attributes
+
+[`TableHead ([rows[, attr]])`]{#pandoc.tablehead}
+
+: Creates a table head.
+
+ Parameters:
+
+ `rows`:
+ : list of table rows
+
+ `attr`:
+ : table head attributes
+
## Legacy types
[`SimpleTable (caption, aligns, widths, headers, rows)`]{#pandoc.simpletable}
@@ -2732,7 +3129,33 @@ format, and functions to filter and modify a subtree.
[`sha1`]{#pandoc.sha1}
: Alias for [`pandoc.utils.sha1`](#pandoc.utils.sha1)
- (DEPRECATED).
+ (DEPRECATED, use `pandoc.utils.sha1` instead).
+
+## Other constructors
+
+[`ReaderOptions (opts)`]{#pandoc.readeroptions}
+
+: Creates a new [ReaderOptions] value.
+
+ Parameters
+
+ `opts`:
+ : Either a table with a subset of the properties of a
+ [ReaderOptions] object, or another ReaderOptions object.
+ Uses the defaults specified in the manual for all
+ properties that are not explicitly specified. Throws an
+ error if a table contains properties which are not present
+ in a ReaderOptions object. ([ReaderOptions]|table)
+
+ Returns: new [ReaderOptions] object
+
+ Usage:
+
+ -- copy of the reader options that were defined on the command line.
+ local cli_opts = pandoc.ReaderOptions(PANDOC_READER_OPTIONS)
+
+ -- default reader options, but columns set to 66.
+ local short_colums_opts = pandoc.ReaderOptions {columns = 66}
## Helper functions
@@ -2805,17 +3228,23 @@ Returns: the transformed inline element
### read {#pandoc.read}
-`read (markup[, format])`
+`read (markup[, format[, reader_options]])`
Parse the given string into a Pandoc document.
Parameters:
`markup`:
-: the markup to be parsed
+: the markup to be parsed (string)
`format`:
-: format specification, defaults to `"markdown"`.
+: format specification, defaults to `"markdown"` (string)
+
+`reader_options`:
+: options passed to the reader; may be a ReaderOptions object or
+ a table with a subset of the keys and values of a
+ ReaderOptions object; defaults to the default values
+ documented in the manual. ([ReaderOptions]|table)
Returns: pandoc document
@@ -2828,6 +3257,8 @@ Usage:
-- The inline element in that block is an `Emph`
assert(block.content[1].t == "Emph")
+[ReaderOptions]: #type-readeroptions
+
# Module pandoc.utils
This module exposes internal pandoc functions and utility
@@ -2882,12 +3313,13 @@ Test equality of AST elements. Elements in Lua are considered
equal if and only if the objects obtained by unmarshaling are
equal.
+**This function is deprecated.** Use the normal Lua `==` equality
+operator instead.
+
Parameters:
`element1`, `element2`:
-: Objects to be compared. Acceptable input types are [Pandoc],
- [Meta], [MetaValue], [Block], [Inline], [Attr],
- [ListAttributes], and [Citation].
+: Objects to be compared (any type)
Returns:
@@ -2927,9 +3359,21 @@ non-null, `Header` levels will be reorganized so
that there are no gaps, and so that the base level
is the level specified.
+Parameters:
+
+`number_sections`
+: whether section divs should get an additional `number`
+ attribute containing the section number. (boolean)
+
+`base_level`
+: shift top-level headings to this level. (integer|nil)
+
+`blocks`
+: list of blocks to process ([Blocks](#type-blocks))
+
Returns:
-- List of [Blocks](#type-block).
+- [Blocks](#type-blocks).
Usage:
@@ -2939,6 +3383,38 @@ Usage:
}
local newblocks = pandoc.utils.make_sections(true, 1, blocks)
+### references {#pandoc.references}
+
+`references (doc)`
+
+Get references defined inline in the metadata and via an external
+bibliography. Only references that are actually cited in the
+document (either with a genuine citation or with `nocite`) are
+returned. URL variables are converted to links.
+
+The structure used represent reference values corresponds to that
+used in CSL JSON; the return value can be use as `references`
+metadata, which is one of the values used by pandoc and citeproc
+when generating bibliographies.
+
+Parameters:
+
+`doc`:
+: document ([Pandoc](#type-pandoc))
+
+Returns:
+
+- list of references. (table)
+
+Usage:
+
+ -- Include all cited references in document
+ function Pandoc (doc)
+ doc.meta.references = pandoc.utils.references(doc)
+ doc.meta.bibliography = nil
+ return doc
+ end
+
### run\_json\_filter {#pandoc.utils.run_json_filter}
`run_json_filter (doc, filter[, args])`
@@ -3050,6 +3526,39 @@ Usage:
-- create normal table block again
table = pandoc.utils.from_simple_table(simple)
+### type {#pandoc.utils.type}
+
+`type (value)`
+
+Pandoc-friendly version of Lua's default `type` function,
+returning the type of a value. This function works with all types
+listed in section [Lua type reference][], except if noted
+otherwise.
+
+The function works by checking the metafield `__name`. If the
+argument has a string-valued metafield `__name`, then it returns
+that string. Otherwise it behaves just like the normal `type`
+function.
+
+Parameters:
+
+`value`
+: any Lua value
+
+Returns:
+
+- type of the given value (string)
+
+Usage:
+
+ -- Prints one of 'string', 'boolean', 'Inlines', 'Blocks',
+ -- 'table', and 'nil', corresponding to the Haskell constructors
+ -- MetaString, MetaBool, MetaInlines, MetaBlocks, MetaMap,
+ -- and an unset value, respectively.
+ function Meta (meta)
+ print('type of metavalue `author`:', pandoc.utils.type(meta.author))
+ end
+
# Module pandoc.mediabag
The `pandoc.mediabag` module allows accessing pandoc's media
@@ -3230,6 +3739,22 @@ methods and convenience functions.
Returns: a new list containing all elements from list1 and
list2
+[`pandoc.List:__eq (a, b)`]{#pandoc.list:__concat}
+
+: Compares two lists for equality. The lists are taken as equal
+ if and only if they are of the same type (i.e., have the same
+ non-nil metatable), have the same length, and if all elements
+ are equal.
+
+ Parameters:
+
+ `a`, `b`:
+ : any Lua object
+
+ Returns:
+
+ - `true` if the two lists are equal, `false` otherwise.
+
## Methods
[`pandoc.List:clone ()`]{#pandoc.list:clone}
@@ -3442,7 +3967,7 @@ filepath
Returns:
-- `true` iff `filepath` is an absolute path, `false` otherwise.
+- `true` if `filepath` is an absolute path, `false` otherwise.
(boolean)
### is_relative (filepath) {#pandoc.path.is_relative}
@@ -3456,7 +3981,7 @@ filepath
Returns:
-- `true` iff `filepath` is a relative path, `false` otherwise.
+- `true` if `filepath` is a relative path, `false` otherwise.
(boolean)
### join (filepaths) {#pandoc.path.join}
diff --git a/doc/short-guide-to-pandocs-sources.md b/doc/short-guide-to-pandocs-sources.md
new file mode 100644
index 000000000..722063e03
--- /dev/null
+++ b/doc/short-guide-to-pandocs-sources.md
@@ -0,0 +1,259 @@
+---
+title: Short guide to pandoc's sources
+subtitle: Laying a path for code wanderers
+author: Albert Krewinkel
+date: 2021-06-07
+---
+
+Pandoc, the universal document converter, can serve as a nice intro
+into functional programming with Haskell. For many contributors,
+including the author of this guide, pandoc was their first real
+exposure to this language. Despite its impressive size of more than
+60.000 lines of Haskell code (excluding the test suite), pandoc is
+still very approachable due to its modular architecture. It can
+serve as an interesting subject for learning.
+
+This guide exists to navigate the large amount of sources, to
+lay-out a path that can be followed for learning, and to explain the
+underlying concepts.
+
+A basic understanding of Haskell and of pandoc's functionality is
+assumed.
+
+# Getting the code
+
+Pandoc has a publicly accessible git repository on GitHub:
+<https://github.com/jgm/pandoc>. To get a local copy of the source:
+
+ git clone https://github.com/jgm/pandoc
+
+The source for the main pandoc program is `app/pandoc.hs`. The
+source for the pandoc library is in `src/`, the source for the tests
+is in `test/`, and the source for the benchmarks is in `benchmark/`.
+
+Core type definitions are in the separate [*pandoc-types* repo].
+Get it with
+
+ git clone https://github.com/jgm/pandoc-types
+
+The organization of library and test sources is identical to the
+main repo.
+
+[*pandoc-types* repo]: https://github.com/jgm/pandoc-types
+
+# Document representation
+
+The way documents are represented in pandoc is part of its success.
+Every document is read into one central data structure, the
+so-called *abstract syntax tree* (AST).
+
+The AST is defined in module `Text.Pandoc.Definition` in package
+[*pandoc-types*].
+
+It is not necessary to understand the AST in detail, just check-out
+the following points:
+
+ * The [`Pandoc`][def-Pandoc] type serves as the central structure.
+
+ * A document has metadata and a list of "block" elements.
+
+ * There are various types of [blocks][def-Block]; some contain raw
+ text, others contain "Inline" elements.
+
+ * [Inlines][def-Inline] are "running text", with many different
+ types. The most important contstructors are `Str` (a word),
+ `Space` (a space char), `Emph` (emphasized text), and `Strong`
+ (strongly emphasized text). It's worth checking their
+ definitions.
+
+ * Element attributes are captured as [`Attr`][def-Attr], which is a
+ triple of the element identifier, its classes, and the key-value
+ pairs.^[For plans to change this see [jgm/pandoc-types#88].]
+
+[*pandoc-types*]: https://hackage.haskell.org/package/pandoc-types
+[jgm/pandoc-types#88]: https://github.com/jgm/pandoc-types/issues/88
+[def-Pandoc]: https://hackage.haskell.org/package/pandoc-types/docs/src/Text.Pandoc.Definition.html#Pandoc
+[def-Block]: https://hackage.haskell.org/package/pandoc-types/docs/src/Text.Pandoc.Definition.html#Block
+[def-Inline]: https://hackage.haskell.org/package/pandoc-types/docs/src/Text.Pandoc.Definition.html#Inline
+[def-Attr]: https://hackage.haskell.org/package/pandoc-types/docs/src/Text.Pandoc.Definition.html#Attr
+
+# Basic architecture
+
+Take a look at pandoc's source files. The code is below the `src`
+directory, in the `Text.Pandoc` module. The basic flow is:
+
+ 1. Document is parsed into the internal representation by a
+ *reader*;
+
+ 2. the document AST is modified (optional);
+
+ 3. then the internal respresentation is converted into the target
+ format by a *writer*.
+
+The [*readers*] can be found in `Text.Pandoc.Readers`, while the
+[*writers*] are submodules of `Text.Pandoc.Writers`. The document
+modification step is powerful and used in different ways, e.g., in
+[*filters*].
+
+These parts are the "muscles" of pandoc, which do the heavy lifting.
+Everything else can be thought of as the bones and fibers to which
+these parts are attached and which make them usable.
+
+# Writers
+
+Writers are usually simpler than readers and therefore easier to
+grasp.
+
+Broadly speaking, there are three kind of writers:
+
+ 1. Text writers: these are used for lightweight markup languages
+ and generate plain text output. Examples: Markdown, Org,
+ reStructuredText.
+ 2. XML writers, which convert the AST into structured XML.
+ Examples: HTML, JATS.
+ 3. Binary writers, which are like XML writers, but combine the
+ output with other data and zip it into a single file. Examples:
+ docx, epub.
+
+ Most writers follow a common pattern and have three main functions:
+ docTo*Format*, blockTo*Format* and inlineTo*Format*. Each converts
+ the `Pandoc`, `Block`, and `Inline` elements, respectively. The
+ *XWiki* and *TEI* writers are comparatively simple and suitable
+ samples when taking a first look.
+
+ Most writers are self-contained in that most of the conversion code
+ is within a single module. However, newer writers often use a
+ different setup: those are built around modules from an external
+ package. The details of how to serialize the document are not in
+ the writer module itself, but in an external module. The writer
+ only has to convert pandoc's AST into the document representation
+ used by the module. Good examples: commonmark, jira.
+
+## DocLayout
+
+All writers build on the `doclayout` package. It can be thought of
+as a pretty printer with extra features suitable for lightweight
+markup languages. E.g., multiple blank lines are collapsed into a
+single blank line, unless multiple blank lines are specifically
+requested. This simplifies the code significantly.
+
+See the repo at https://github.com/jgm/doclayout, and the [hackage
+documentation](https://hackage.haskell.org/package/doclayout)
+
+# Readers
+
+The same distinction that applies to writers also applies to
+readers. Readers for XML formats use XML parsing libraries, while
+plain text formats are parsed with [parsec].
+
+## Builders
+
+The plain type constructors from the [`Text.Pandoc.Definition`]
+module can be difficult to use, which is why the module
+[`Text.Pandoc.Builder`] exists. It offers functions to conveniently
+build and combine AST elements.
+
+The most interesting and important types in `Builder` are
+[`Blocks`][def-Blocks] and [`Inlines`][def-Inlines]. All type
+constructors use simple lists for sequences of AST elements.
+Building lists can be awkward and often comes with bad performance
+characteristics, esp. when appending. The `Blocks` and `Inlines`
+types are better suited for these operations and are therefore used
+extensively in builder functions.
+
+The builder functions are named with the convention that the suffix
+`With` is added if the first argument is an `Attr`; there is usually
+another function without that suffix, creating an element with no
+attributes.
+
+[def-Blocks]: https://hackage.haskell.org/package/pandoc-types/docs/src/Text.Pandoc.Builder.html#Blocks
+[def-Inlines]: https://hackage.haskell.org/package/pandoc-types/docs/src/Text.Pandoc.Builder.html#Inlines
+[parsec]: https://hackage.haskell.org/package/parsec
+
+# PandocMonad
+
+Looking at the readers and writers, one will notice that they all
+operate within the `PandocMonad` type class. This class gives access
+to options, file operations, and other shared information. The
+typeclass has two main implementations: one operates in IO, so on
+the "real world", while the other provides a pure functional
+interface, suitable to "mock" an environment for testing.
+
+# Document modifications
+
+One of the big advantages of a central document structure is that it
+allows document modifications via a unified interface. This section
+describes the multiple ways in which the document can be altered.
+
+## Walkable
+
+Document traversal happens through the `Walkable` class in module
+`Text.Pandoc.Walk` ([*pandoc-types* package]).
+
+## Transformations
+
+Transformations are simple modifications controllable through
+command-line options.
+
+## Filters
+
+Filters allow to use Lua or any external language to perform
+document transformations.
+
+
+[`Text.Pandoc.Builder`]: https://hackage.haskell.org/package/pandoc-types/docs/Text-Pandoc-Builder.html
+[`Text.Pandoc.Definition`]: https://hackage.haskell.org/package/pandoc-types/docs/Text-Pandoc-Definition.html
+
+# Module overview
+
+The library is structured as follows:
+
+ - `Text.Pandoc` is a top-level module that exports what is needed
+ by most users of the library. Any patches that add new readers
+ or writers will need to make changes here, too.
+ - `Text.Pandoc.Definition` (in `pandoc-types`) defines the types
+ used for representing a pandoc document.
+ - `Text.Pandoc.Builder` (in `pandoc-types`) provides functions for
+ building pandoc documents programmatically.
+ - `Text.Pandoc.Generics` (in `pandoc-types`) provides functions allowing
+ you to promote functions that operate on parts of pandoc documents
+ to functions that operate on whole pandoc documents, walking the
+ tree automatically.
+ - `Text.Pandoc.Readers.*` are the readers, and `Text.Pandoc.Writers.*`
+ are the writers.
+ - `Text.Pandoc.Citeproc.*` contain the code for citation handling,
+ including an interface to the [citeproc] library.
+ - `Text.Pandoc.Data` is used to embed data files when the `embed_data_files`
+ cabal flag is used.
+ - `Text.Pandoc.Emoji` is a thin wrapper around [emojis].
+ - `Text.Pandoc.Highlighting` contains the interface to the
+ skylighting library, which is used for code syntax highlighting.
+ - `Text.Pandoc.ImageSize` is a utility module containing functions for
+ calculating image sizes from the contents of image files.
+ - `Text.Pandoc.MIME` contains functions for associating MIME types
+ with extensions.
+ - `Text.Pandoc.Lua.*` implement Lua filters.
+ - `Text.Pandoc.Options` defines reader and writer options.
+ - `Text.Pandoc.PDF` contains functions for producing PDFs.
+ - `Text.Pandoc.Parsing` contains parsing functions used in multiple readers.
+ the needs of pandoc.
+ - `Text.Pandoc.SelfContained` contains functions for making an HTML
+ file "self-contained," by importing remotely linked images, CSS,
+ and JavaScript and turning them into `data:` URLs.
+ - `Text.Pandoc.Shared` is a grab-bag of shared utility functions.
+ - `Text.Pandoc.Writers.Shared` contains utilities used in writers only.
+ - `Text.Pandoc.Slides` contains functions for splitting a markdown document
+ into slides, using the conventions described in the MANUAL.
+ - `Text.Pandoc.Templates` defines pandoc's templating system.
+ - `Text.Pandoc.UTF8` contains functions for converting text to and from
+ UTF8 bytestrings (strict and lazy).
+ - `Text.Pandoc.Asciify` contains functions to derive ascii versions of
+ identifiers that use accented characters.
+ - `Text.Pandoc.UUID` contains functions for generating UUIDs.
+ - `Text.Pandoc.XML` contains functions for formatting XML.
+
+
+<!--
+# Templating
+## DocTemplates
+-->