There are many markdown parsers in Python. Misaka was my favorite one. However, misaka is deprecated now, and the successor which is called hoedown still has issues to solve. That's why it is a was. But I still love it.
Here is a list of markdown parsers for Python in my knowledge:
- Misaka: A python binding for Sundown. (CPython required)
- Hoedown: A python binding for Hoedown, successor of Misaka.
- Discount: A python binding for Discount. (CPython required)
- cMarkdown: Markdown for Python, accelerated by C. (CPython required)
- Markdown: A pure markdown parser, the very first implementation.
- Markdown2: Another pure markdown parser.
And I've just released another pure markdown parser too, which is called mistune.
Misaka was my favorite markdown parser. It is a python binding of Sundown, which means that it has all the features that Sundown provides.
It is super fast! Actually, it is the top one in my benchmarks. Since it is a binding of a C library, no wonder that it is this fast. If speed is what you want, you should try misaka, and as well as other bindings of a C library.
But misaka is more than speed. It is the custom renderer feature that catches my heart. I am so fond of it, that's why I implement the custom renderer feature in my own markdown parser mistune.
A quick and very useful sample is code highlighting.
However, it is a binding of a C libary. It requires CPython, if you prefer PyPy, you have no access to it. Some App Engines have a limitation on compiling C libraries too, you can't use misaka in this case. And even if you are using CPython, it is still difficult to install it on a Windows OS.
Visual Studio's support for C is not optimal and most VS compilers are missing stdint.h, which is needed to compile Misaka.
If you are on a Windows, may god helps you. I don't care it a shit.
Footnote feature is missing in Misaka. Maybe many of you don't need such a thing, in this case, misaka has nothing bad. It is stable, efficient, and has many GFM features.
The only trouble is Sundown is deprecated.1
Because the Sundown library is deprecated, here comes hoedown2, which is the fork of the original Sundown. It has a Python binding also called as hoedown.
Since Hoedown is the successor of Sundown, and python-hoedown is the successor of Misaka, all features that misaka has, python-hoedown has them too. But python-hoedown is more than that.
- It is PyPy compatible.
- It has footnote feature.
It looks promissing, and even misaka's author recommends it. I've tried it, but failed with one issue, a magic error that I can't do anything:
This isssue is not fixed yet. Once it does,
hoedown would be a good choice for non-AE users.
Updated at Jun 23, 2014: you can use Hoep as the Python Binding.
cMarkdown & Discount
cMarkdown is much like Misaka, except that it is based on upskirt3 rather than sundown. The history is very interesting, sundown is a fork of upskirt, hoedown is a fork of sundown. And now, sundown is deprecated, upskirt is missing. The new markdown parser that vmg promised is still not available.
cMarkdown has all the disadvantages of Misaka, and it is a little slower than Misaka. This means you really should use misaka instead of cMarkdown.
Discount is a joke for me, I can't even install it successfully! There is not much to say. But I do know that Discount is slower than Sundown.
Markdown & Markdown2
Python-Markdown is the very first markdown parser in pure Python. It is good, except the documentation. However, I miss the renderer feature in misaka, which is not in Python-Markdown.
Python-Markdown is not that slow as I expected, since Python-Markdown2 calls itself as:
A fast and complete implementation of Markdown in Python.
But it is not true. Python-Markdown2 is much slower than Python-Markdown. I have no idea why it says itself fast. All features that 2 has, the older one has too.
The benchmark shows that Python-Markdown2 is almost twice slower than Python-Markdown. No wonder it is 2.
Mistune is a new (just released) markdown parser. It is the fastest one in all pure Python implementations. Almost 4 times faster4 than Python-Markdown in pure Python environment, almost 5 times faster with Cython's help.
I didn't expect it to be so fast when I wrote it. I know it would be a fast one, but I didn't know that it would be 4 times faster and even 5 times faster.
I have never thought of creating a Markdown parser my own. But it has been months since I reported the issue to Hoedown. The issue is still there, not solved a bit. Because it is a C binding, I am not able to do any help, the only thing I can do is waiting.
I don't use Python-Markdown or Python-Markdown2, because they have no renderer feature, and they are slow.
If you are looking for a fast, full featured5 and pure Python implementation, Mistune is a good choice. It also has renderer feature just like Misaka. You can always influnce the rendering results with custom renderers.
I did a benchmark on my MacBook Air, view the results. You can run the benchmark script yourself: bench.py
Mistune can be compiled with Cython if you have Cython installed already.
The magic happens in the
setup.py script. I'd like to introduce this part another time.
mistune is used by many great projects such as IPython, Rodeo and crossbar.
*This post and all posts in markdown format on this site are rendered with mistune.*
As part of managing the PB Python newsletter, I wanted to develop a simple way towrite emails once using plain text and turn them into responsive HTML emails for the newsletter.In addition, I needed to maintain a static archive page on the blog that links to the content of eachnewsletter. This article shows how to use python tools to transform a markdown file into a responsive HTMLemail suitable for a newsletter as well as a standalone page integrated into a pelican blog.
I am a firm believer in having access to all of the content I create in a simple text format. That is partof the reason why I use pelican for the blog and write all content in restructured text.I also believe in hosting the blog using static HTML so it is fast for readers and simple to distribute.Since I spend a lot of time creating content, I want to make sure I can easily transform it into anotherformat if needed. Plain text files are the best format for my needs.
As I wrote in my previous post, Mailchimp was getting cost prohibitive. In addition, I didnot like playing around with formatting emails. I want to focus on content and turning it intoa clean and responsive email - not working with an online email editor. I also want the newsletterarchives available for people to view and search in a more integrated way with the blog.
One thing that Mailchimp does well is that it provides an archive of emailsand ability for the owner to download them in raw text. However, once you cancel your account,those archives will go away. It’s also not very search engine friendly so it’s hard to reference backto it and expose the content to others not subscribed to the newsletter.
With all that in mind, here is the high level process I had in mind:
Before I go through the python scripts, here’s some background on developing responsive HTML-basedemails. Unfortunately, building a template that works well in all email clients is not easy. I naively assumedthat the tips and tricks that work for a web site would work in an HTML email. Unfortunately that is not the case.The best information I could find is that you need to use HTML tables to format messages so they will look acceptablein all the email clients. Yuck. I feel like I’m back in Geocities.
This is one of the benefits that email vendors like Mailchimp provide. They will go through all thehard work of figuring out how to make templates that look good everywhere. For somethis makes complete sense. For my simple needs, it was overkill. Your mileage may vary.
Along the way, I found several resources that I leveraged for portions of my final solution.Here they are for reference:
- Building responsive email templates - Really useful templates that served as the basis for the final template.
- Free Responsive Simple HTML Template - Another good set of simple templates.
- Send email written in Markdown - A python repo that had a lot of good concepts for building the markdown email.
Besides having to use HTML tables, I learned that it is recommended that all the CSS be inlinedin the email. In other words, the email needs to have all the styling included in the tags using
Once again this is very old school web and would be really painful if not for tools that will do the inliningfor you. I used the excellent premailer library to take an embedded CSS stylesheet and inline with the rest of the HTML.
You can find a full HTML template and all the code on github but here is a simple summary for reference.Please use the github version since this one is severely simplified and likely won’t work as is:
This is a jinja template and you will notice that there is a place for
title. The next step in the process is to render a markdown text file into HTML and placethat HTML snippet into a template.
Now that we know how we want the HTML to look, let’s create a markdown file.The only twist with this solution is that I want to create one markdown file thatcan be rendered in pelican and used for the HTML email.
Here is what a simple markdown file(
sample_doc.md) looks like thatwill work with pelican:
The required input file uses standard markdown. The one tricky aspect is that the top 5 lines contain meta-datathat pelican needs to make sure the correct url and templates are used when creating the output. Our final scriptwill need to remove them so that it does not get rendered into the newsletter email. If you are not trying toincorporate into your blog, you can remove these lines.
If you are interested in incorporating this in your pelican blog, here is how my content is structured:
All of the newsletter markdown files are stored in the newsletter directory and the blog postsare stored in the articles directory.
The final configuration I had to make in the
pelicanconf.py file was to make sure the pathswere setup correctly:
Markdown In Python Function
Now the blog is properly configured to render one of the newsletters.
Python Markdown Example
Now that we have HTML template and the markdown document, we need a short python scriptto pull it all together. I will be using the following libraries so make sure they are all installed:
- python-markdown2 - Turn raw markdown into HTML
- jinja2 - Template engine to generate HTML
- premailer - Inline CSS
- BeautifulSoup - Clean up the HTML. This is optional but showing how to use it if you choose to.
Additionally, make sure you are using python3 so you have access to
In order to keep the article compact, I am only including the key components. Please lookat the github repo for an approach that is a proper python standalone program that cantake arguments from the command line.
The first step, import everything:
Setup the input files and output HTML file:
Please refer to the pathlib article if you are not familiar with how or why to use it.
Now that the files are established, we need to read in the markdown file andparse out the header meta-data:
readlines to read the file ensures that each line in the file is stored in a list.This approach works for our small file but could be problematic if you had a massive file thatyou did not want to read into memory at once. For an email newsletter you should be ok withusing
Here is what it
all_content[0:6] looks like:
We can clean up the title line for insertion into the template:
Which renders a title
PB Python - Newsletter Number 6
The final parsing step is to get the body into a single list without the header:
Convert the raw markdown into a simple HTML string:
China post registered air mail. Now that the HTML is ready, we need to insert it into our jinja template:
At this point,
raw_html has a fully formed HTML version of the newsletter.We need to use premailer’s
transform to get the CSS inlined. I am alsousing BeautifulSoup to do some cleaning up and formatting of the HTML.This is purely aesthetic but I think it’s simple enough to do so I am including it:
The final step is to make sure that the unsubscribe link does not get mangled. Dependingon your email provider, you may not need to do this:
Here is an example of the final email file:
You should be able to copy and paste the raw HTML into your email marketing campaign andbe good to go. In addition, this file will render properly in pelican. See this page for somepast examples.
Markdown is a simple text format that can be parsed and turned into HTML usingvarious python tools. In this case, the markdown file can be combined with a responsiveHTML email template to simplify the process of generating content for newsletters.The added bonus is that the content can be included in a static blog so that it is searchableand easily available to your readers.
New Line In Markdown In Python
Python Markdown To Html
This solution is not limited to just building emails. Now that newer versions of pandaswill include a native
to_markdown method, this general approach could be extendedto other uses. Using these principles you can build fairly robust reports and documentsusing markdown then incorporate the dataframe output into the final results. If there isinterest in an example, let me know in the comments.