PATCH: /usr/bin/ftp: Remove fragment/anchor identifier before making request

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

PATCH: /usr/bin/ftp: Remove fragment/anchor identifier before making request

Eric P. Mangold-2
Hi,

I was trying to use the 'ftp' program to retrieve the following URL:

https://pypi.python.org/packages/source/p/pip/pip-1.0.2.tar.gz#md5=47ec6ff3f6d962696fe08d4c8264ad49

Which fails because it considers the fragment as part of the path, and
so the server returns 404. These type of links are quite common these
days and so it would be nice if 'ftp' would handle them.

This is my first patch to OpenBSD, so please let me know if there is
anyhthing else I can do to get this feature implemented. The patch is
very simple, and I think, correct, as '#' is an "unsafe" character, and
must always be URL-encoded if it is to appear literally in a URL and
not be interpreted as the start of a fragment identifier.

Regards,
Eric P. Mangold


Index: fetch.c
===================================================================
RCS file: /cvs/src/usr.bin/ftp/fetch.c,v
retrieving revision 1.103
diff -u -r1.103 fetch.c
--- fetch.c     25 Aug 2010 20:32:37 -0000      1.103
+++ fetch.c     29 Oct 2011 03:34:02 -0000
@@ -205,6 +205,11 @@
        if (newline == NULL)
                errx(1, "Can't allocate memory to parse URL");
        if (strncasecmp(newline, HTTP_URL, sizeof(HTTP_URL) - 1) == 0) {
+                /* Remove any trailing fragment identifier from the HTTP URL.
+                 * Fragments (HTTP anchors) are identified by a hash char ('#'),
+                 * as per RFC 3986. */
+                newline = strsep(&newline, "#");
+
                host = newline + sizeof(HTTP_URL) - 1;
 #ifndef SMALL
                scheme = HTTP_URL;
@@ -221,6 +226,11 @@
 #ifndef SMALL
                scheme = FILE_URL;
        } else if (strncasecmp(newline, HTTPS_URL, sizeof(HTTPS_URL) - 1) == 0) {
+                /* Remove any trailing fragment identifier from the HTTPS URL.
+                 * Fragments (HTTP anchors) are identified by a hash char ('#'),
+                 * as per RFC 3986. */
+                newline = strsep(&newline, "#");
+
                host = newline + sizeof(HTTPS_URL) - 1;
                ishttpsurl = 1;
                scheme = HTTPS_URL;