There are many online sites that accept reading input from remote locations. For example a site might try to extract all the text from a webpage, or show you the HTTP-headers a given server sends back in response to a request.
If you run such a site you must make sure you validate the schema you're given - also remembering to do that if you're sent any HTTP-redirects.
Really the issue here is a confusion between
The only time I ever communicated with Aaron Swartz was unfortunately after his death, because I didn't make the connection. I randomly stumbled upon the
html2text software he put together, which had an online demo containing a form for entering a location. I tried the obvious input:
The software was vulnerable, read the file, and showed it to me.
The site gives errors on all inputs now, so it cannot be used to demonstrate the problem, but on Friday I saw another site on Hacker News with the very same input-issue, and it reminded me that there's a very real class of security problems here.
The site in question was http://fuckyeahmarkdown.com/ and allows you to enter a URL to convert to markdown - I found this via the hacker news submission.
The following link shows the contents of
/etc/hosts, and demonstrates
The output looked like this:
.. 127.0.0.1 localhost 255.255.255.255 broadcasthost ::1 localhost fe80::1%lo0 localhost 127.0.0.1 stage 127.0.0.1 files 127.0.0.1 brettt.. ..
In the actual output of '/etc/passwd' all newlines had been stripped. (Which I now recognize as being an artifact of the markdown processing.)
UPDATE: The problem is fixed now.