About Archive Tags RSS Feed


If your code accepts URIs as input..

12 September 2016 21:50

There are many online sites that accept reading input from remote locations. For example a site might try to extract all the text from a webpage, or show you the HTTP-headers a given server sends back in response to a request.

If you run such a site you must make sure you validate the schema you're given - also remembering to do that if you're sent any HTTP-redirects.

Really the issue here is a confusion between URL & URI.

The only time I ever communicated with Aaron Swartz was unfortunately after his death, because I didn't make the connection. I randomly stumbled upon the html2text software he put together, which had an online demo containing a form for entering a location. I tried the obvious input:


The software was vulnerable, read the file, and showed it to me.

The site gives errors on all inputs now, so it cannot be used to demonstrate the problem, but on Friday I saw another site on Hacker News with the very same input-issue, and it reminded me that there's a very real class of security problems here.

The site in question was http://fuckyeahmarkdown.com/ and allows you to enter a URL to convert to markdown - I found this via the hacker news submission.

The following link shows the contents of /etc/hosts, and demonstrates the problem:


The output looked like this:

.. localhost broadcasthost
::1 localhost
fe80::1%lo0 localhost stage files brettt..

In the actual output of '/etc/passwd' all newlines had been stripped. (Which I now recognize as being an artifact of the markdown processing.)

UPDATE: The problem is fixed now.



Comments on this entry

icon roflcopter at 11:04 on 12 September 2016

wow awsom lulz :)

icon Willem Mali at 11:12 on 12 September 2016

Nice one, I hadn't thought about this attack vector before :)

On the stripping of newlines: that's a Markdown thing. You either make a paragraph with 2 newlines, or add two spaces at the end of the line for an explicit line break; /etc/hosts doesn't have these spaces nor does it have double newlines, so there it is. The # at the start of the hosts files is interpreted as the H1 opening tag, so everything looks huge and bold.

Same goes for i.e. /etc/passwd.

icon Janne Koschinski at 11:42 on 12 September 2016

I found the same issue today, by random chance.

Even more severe is that you can access the config for the Apache2 – and likely (I haven’t tried) also the SSL private key.

This is quite an issue :/

icon arjun at 12:30 on 12 September 2016

Does it also affect on forms which have a basic REGEX to check the input? Wouldn't REGEX take care of this issue?

icon Steve Kemp at 12:37 on 12 September 2016

I figured I'd not be the only person to try, Janne!

Willem: Of course, when you say it like that it's obvious. I'm a little familir with markdown myself, but didn't make the obvious leap.

Arjun: Depends what you're doing with the URI. If you're going to fetch it, or process it, then you could be at risk. Depends really.

icon Jraut at 12:40 on 12 September 2016

Another reason why web-exposed services should not be run as root.

icon John Brayton at 12:41 on 12 September 2016

Good article. But I think you should remove references to sites that are vulnerable.

icon Kazinator at 13:11 on 12 September 2016

If I type file:///etc/passwd into a browser-based utility for fetching and displaying a URI, I expect that to work, just like I expect that to work if I enter that into its address bar.

icon Lance Nanek at 15:19 on 12 September 2016

Re Kazinator, there may be valid uses for entering a file instead of a web site, but the user would usually intend the file on their computer. It doesn't make any sense for the user to specify a file on the web server like is discussed in the post. The user has no way to have saved documents, etc. to the web server and wouldn't know where they are stored even if they did.