Problem with boot block / softraid on current

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem with boot block / softraid on current

Andreas Bartelt-2
One of my amd64 machines does not boot anymore after updating to current
(attached dmesg was obtained after booting a build of current from today
but with a boot block from December, 22nd). Interestingly, the same disk
(with a boot block from today's build) still boots fine with another
amd64 machine (an old x61s thinkpad).

The disk image makes use of a softraid(4) encrypted root partition. On
the affected machine, it triggers a reboot directly after the password
has been entered.

I could narrow the problem down to the use of installboot(8), i.e.,
booting current still works on the affected machine with a boot block
based on my previous build from December, 22nd.

Best regards
Andreas

dmesg.txt (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problem with boot block / softraid on current

Stefan Sperling-5
On Tue, Jan 03, 2017 at 07:37:10AM +0100, Andreas Bartelt wrote:

> One of my amd64 machines does not boot anymore after updating to current
> (attached dmesg was obtained after booting a build of current from today but
> with a boot block from December, 22nd). Interestingly, the same disk (with a
> boot block from today's build) still boots fine with another amd64 machine
> (an old x61s thinkpad).
>
> The disk image makes use of a softraid(4) encrypted root partition. On the
> affected machine, it triggers a reboot directly after the password has been
> entered.
>
> I could narrow the problem down to the use of installboot(8), i.e., booting
> current still works on the affected machine with a boot block based on my
> previous build from December, 22nd.

I ran into the same problem on my thinkpad x130e and tracked it down to
this part of the changes committed in softraid_amd64.c r1.3:

@@ -433,7 +435,7 @@
  const char openbsd_uuid_code[] = GPT_UUID_OPENBSD;
  struct gpt_partition gp;
  static struct uuid *openbsd_uuid = NULL, openbsd_uuid_space;
- static u_char buf[DEV_BSIZE];
+ static u_char buf[4096];


So the problem seems to be that boot uses too much stack memory.

The diff below fixes the problem for me. As far as I can tell this
should work on both 512 and 4k disks. But I cannot test with a 4k disk.

Index: softraid_amd64.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/stand/libsa/softraid_amd64.c,v
retrieving revision 1.3
diff -u -p -r1.3 softraid_amd64.c
--- softraid_amd64.c 24 Dec 2016 22:49:38 -0000 1.3
+++ softraid_amd64.c 4 Jan 2017 18:14:16 -0000
@@ -435,7 +435,7 @@ findopenbsd_gpt(struct sr_boot_volume *b
  const char openbsd_uuid_code[] = GPT_UUID_OPENBSD;
  struct gpt_partition gp;
  static struct uuid *openbsd_uuid = NULL, openbsd_uuid_space;
- static u_char buf[4096];
+ u_char *buf;
 
  /* Prepare OpenBSD UUID */
  if (openbsd_uuid == NULL) {
@@ -456,6 +456,8 @@ findopenbsd_gpt(struct sr_boot_volume *b
  *err = "disk sector > 4096 bytes\n";
  return (-1);
  }
+ buf = alloc(bv->sbv_secsize);
+ bzero(buf, bv->sbv_secsize);
 
  /* LBA1: GPT Header */
  lba = 1;
@@ -466,17 +468,20 @@ findopenbsd_gpt(struct sr_boot_volume *b
  /* Check signature */
  if (letoh64(gh.gh_sig) != GPTSIGNATURE) {
  *err = "bad GPT signature\n";
+ free(buf, bv->sbv_secsize);
  return (-1);
  }
 
  if (letoh32(gh.gh_rev) != GPTREVISION) {
  *err = "bad GPT revision\n";
+ free(buf, bv->sbv_secsize);
  return (-1);
  }
 
  ghsize = letoh32(gh.gh_size);
  if (ghsize < GPTMINHDRSIZE || ghsize > sizeof(struct gpt_header)) {
  *err = "bad GPT header size\n";
+ free(buf, bv->sbv_secsize);
  return (-1);
  }
 
@@ -487,6 +492,7 @@ findopenbsd_gpt(struct sr_boot_volume *b
  gh.gh_csum = orig_csum;
  if (letoh32(orig_csum) != new_csum) {
  *err = "bad GPT header checksum\n";
+ free(buf, bv->sbv_secsize);
  return (-1);
  }
 
@@ -514,6 +520,9 @@ findopenbsd_gpt(struct sr_boot_volume *b
  found = 1;
  }
  }
+
+ free(buf, bv->sbv_secsize);
+
  if (new_csum != letoh32(gh.gh_part_csum)) {
  *err = "bad GPT entries checksum\n";
  return (-1);

Reply | Threaded
Open this post in threaded view
|

Re: Problem with boot block / softraid on current

Stefan Sperling-5
On Wed, Jan 04, 2017 at 07:16:54PM +0100, Stefan Sperling wrote:
> So the problem seems to be that boot uses too much stack memory.
>
> The diff below fixes the problem for me. As far as I can tell this
> should work on both 512 and 4k disks. But I cannot test with a 4k disk.

Alternatively, this diff also works for me and avoids alloc() / free().
I prefer the alloc() / free() version, though.

Index: softraid_amd64.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/stand/libsa/softraid_amd64.c,v
retrieving revision 1.3
diff -u -p -r1.3 softraid_amd64.c
--- softraid_amd64.c 24 Dec 2016 22:49:38 -0000 1.3
+++ softraid_amd64.c 4 Jan 2017 18:17:43 -0000
@@ -33,7 +33,8 @@
 #include "softraid_amd64.h"
 
 static int gpt_chk_mbr(struct dos_partition *, u_int64_t);
-static uint64_t findopenbsd_gpt(struct sr_boot_volume *, const char **);
+static uint64_t findopenbsd_gpt(struct sr_boot_volume *, u_char *,
+    const char **);
 
 void
 srprobe_meta_opt_load(struct sr_metadata *sm, struct sr_meta_opt_head *som)
@@ -424,7 +425,7 @@ gpt_chk_mbr(struct dos_partition *dp, u_
 }
 
 static uint64_t
-findopenbsd_gpt(struct sr_boot_volume *bv, const char **err)
+findopenbsd_gpt(struct sr_boot_volume *bv, u_char * buf, const char **err)
 {
  struct gpt_header gh;
  int i, part, found;
@@ -435,7 +436,6 @@ findopenbsd_gpt(struct sr_boot_volume *b
  const char openbsd_uuid_code[] = GPT_UUID_OPENBSD;
  struct gpt_partition gp;
  static struct uuid *openbsd_uuid = NULL, openbsd_uuid_space;
- static u_char buf[4096];
 
  /* Prepare OpenBSD UUID */
  if (openbsd_uuid == NULL) {
@@ -531,7 +531,7 @@ sr_getdisklabel(struct sr_boot_volume *b
  struct dos_mbr mbr;
  const char *err = NULL;
  u_int start = 0;
- char buf[DEV_BSIZE];
+ char buf[4096];
  int i;
 
  /* Check for MBR to determine partition offset. */
@@ -539,7 +539,7 @@ sr_getdisklabel(struct sr_boot_volume *b
  sr_strategy(bv, F_READ, DOSBBSECTOR, sizeof(mbr), &mbr, NULL);
  if (gpt_chk_mbr(mbr.dmbr_parts, bv->sbv_size /
     (bv->sbv_secsize / DEV_BSIZE)) == 0) {
- start = findopenbsd_gpt(bv, &err);
+ start = findopenbsd_gpt(bv, buf, &err);
  if (start == (u_int)-1) {
  if (err != NULL)
  return (err);

Reply | Threaded
Open this post in threaded view
|

Re: Problem with boot block / softraid on current

Stefan Sperling-5
In reply to this post by Stefan Sperling-5
On Wed, Jan 04, 2017 at 07:16:55PM +0100, Stefan Sperling wrote:
> The diff below fixes the problem for me. As far as I can tell this
> should work on both 512 and 4k disks. But I cannot test with a 4k disk.
 
And let's check for alloc() failure. Suggested by Theo.

Index: softraid_amd64.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/stand/libsa/softraid_amd64.c,v
retrieving revision 1.3
diff -u -p -r1.3 softraid_amd64.c
--- softraid_amd64.c 24 Dec 2016 22:49:38 -0000 1.3
+++ softraid_amd64.c 4 Jan 2017 18:59:45 -0000
@@ -435,7 +435,7 @@ findopenbsd_gpt(struct sr_boot_volume *b
  const char openbsd_uuid_code[] = GPT_UUID_OPENBSD;
  struct gpt_partition gp;
  static struct uuid *openbsd_uuid = NULL, openbsd_uuid_space;
- static u_char buf[4096];
+ u_char *buf;
 
  /* Prepare OpenBSD UUID */
  if (openbsd_uuid == NULL) {
@@ -456,6 +456,12 @@ findopenbsd_gpt(struct sr_boot_volume *b
  *err = "disk sector > 4096 bytes\n";
  return (-1);
  }
+ buf = alloc(bv->sbv_secsize);
+ if (buf == NULL) {
+ *err = "out of memory\n";
+ return (-1);
+ }
+ bzero(buf, bv->sbv_secsize);
 
  /* LBA1: GPT Header */
  lba = 1;
@@ -466,17 +472,20 @@ findopenbsd_gpt(struct sr_boot_volume *b
  /* Check signature */
  if (letoh64(gh.gh_sig) != GPTSIGNATURE) {
  *err = "bad GPT signature\n";
+ free(buf, bv->sbv_secsize);
  return (-1);
  }
 
  if (letoh32(gh.gh_rev) != GPTREVISION) {
  *err = "bad GPT revision\n";
+ free(buf, bv->sbv_secsize);
  return (-1);
  }
 
  ghsize = letoh32(gh.gh_size);
  if (ghsize < GPTMINHDRSIZE || ghsize > sizeof(struct gpt_header)) {
  *err = "bad GPT header size\n";
+ free(buf, bv->sbv_secsize);
  return (-1);
  }
 
@@ -487,6 +496,7 @@ findopenbsd_gpt(struct sr_boot_volume *b
  gh.gh_csum = orig_csum;
  if (letoh32(orig_csum) != new_csum) {
  *err = "bad GPT header checksum\n";
+ free(buf, bv->sbv_secsize);
  return (-1);
  }
 
@@ -514,6 +524,9 @@ findopenbsd_gpt(struct sr_boot_volume *b
  found = 1;
  }
  }
+
+ free(buf, bv->sbv_secsize);
+
  if (new_csum != letoh32(gh.gh_part_csum)) {
  *err = "bad GPT entries checksum\n";
  return (-1);

Reply | Threaded
Open this post in threaded view
|

Re: Problem with boot block / softraid on current

Mark Kettenis
> Date: Wed, 4 Jan 2017 20:03:49 +0100
> From: Stefan Sperling <[hidden email]>
>
> On Wed, Jan 04, 2017 at 07:16:55PM +0100, Stefan Sperling wrote:
> > The diff below fixes the problem for me. As far as I can tell this
> > should work on both 512 and 4k disks. But I cannot test with a 4k disk.
>  
> And let's check for alloc() failure. Suggested by Theo.

Makes sense to me.

> Index: softraid_amd64.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/amd64/stand/libsa/softraid_amd64.c,v
> retrieving revision 1.3
> diff -u -p -r1.3 softraid_amd64.c
> --- softraid_amd64.c 24 Dec 2016 22:49:38 -0000 1.3
> +++ softraid_amd64.c 4 Jan 2017 18:59:45 -0000
> @@ -435,7 +435,7 @@ findopenbsd_gpt(struct sr_boot_volume *b
>   const char openbsd_uuid_code[] = GPT_UUID_OPENBSD;
>   struct gpt_partition gp;
>   static struct uuid *openbsd_uuid = NULL, openbsd_uuid_space;
> - static u_char buf[4096];
> + u_char *buf;
>  
>   /* Prepare OpenBSD UUID */
>   if (openbsd_uuid == NULL) {
> @@ -456,6 +456,12 @@ findopenbsd_gpt(struct sr_boot_volume *b
>   *err = "disk sector > 4096 bytes\n";
>   return (-1);
>   }
> + buf = alloc(bv->sbv_secsize);
> + if (buf == NULL) {
> + *err = "out of memory\n";
> + return (-1);
> + }
> + bzero(buf, bv->sbv_secsize);
>  
>   /* LBA1: GPT Header */
>   lba = 1;
> @@ -466,17 +472,20 @@ findopenbsd_gpt(struct sr_boot_volume *b
>   /* Check signature */
>   if (letoh64(gh.gh_sig) != GPTSIGNATURE) {
>   *err = "bad GPT signature\n";
> + free(buf, bv->sbv_secsize);
>   return (-1);
>   }
>  
>   if (letoh32(gh.gh_rev) != GPTREVISION) {
>   *err = "bad GPT revision\n";
> + free(buf, bv->sbv_secsize);
>   return (-1);
>   }
>  
>   ghsize = letoh32(gh.gh_size);
>   if (ghsize < GPTMINHDRSIZE || ghsize > sizeof(struct gpt_header)) {
>   *err = "bad GPT header size\n";
> + free(buf, bv->sbv_secsize);
>   return (-1);
>   }
>  
> @@ -487,6 +496,7 @@ findopenbsd_gpt(struct sr_boot_volume *b
>   gh.gh_csum = orig_csum;
>   if (letoh32(orig_csum) != new_csum) {
>   *err = "bad GPT header checksum\n";
> + free(buf, bv->sbv_secsize);
>   return (-1);
>   }
>  
> @@ -514,6 +524,9 @@ findopenbsd_gpt(struct sr_boot_volume *b
>   found = 1;
>   }
>   }
> +
> + free(buf, bv->sbv_secsize);
> +
>   if (new_csum != letoh32(gh.gh_part_csum)) {
>   *err = "bad GPT entries checksum\n";
>   return (-1);
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Problem with boot block / softraid on current

kwesterback
In reply to this post by Stefan Sperling-5
On 01/04, Stefan Sperling wrote:

> On Tue, Jan 03, 2017 at 07:37:10AM +0100, Andreas Bartelt wrote:
> > One of my amd64 machines does not boot anymore after updating to current
> > (attached dmesg was obtained after booting a build of current from today but
> > with a boot block from December, 22nd). Interestingly, the same disk (with a
> > boot block from today's build) still boots fine with another amd64 machine
> > (an old x61s thinkpad).
> >
> > The disk image makes use of a softraid(4) encrypted root partition. On the
> > affected machine, it triggers a reboot directly after the password has been
> > entered.
> >
> > I could narrow the problem down to the use of installboot(8), i.e., booting
> > current still works on the affected machine with a boot block based on my
> > previous build from December, 22nd.
>
> I ran into the same problem on my thinkpad x130e and tracked it down to
> this part of the changes committed in softraid_amd64.c r1.3:
>
> @@ -433,7 +435,7 @@
>   const char openbsd_uuid_code[] = GPT_UUID_OPENBSD;
>   struct gpt_partition gp;
>   static struct uuid *openbsd_uuid = NULL, openbsd_uuid_space;
> - static u_char buf[DEV_BSIZE];
> + static u_char buf[4096];
>
>
> So the problem seems to be that boot uses too much stack memory.
>
> The diff below fixes the problem for me. As far as I can tell this
> should work on both 512 and 4k disks. But I cannot test with a 4k disk.
>

Reads good to me. I also can't test on 4K disk. :-(

ok krw@

.... Ken

> Index: softraid_amd64.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/amd64/stand/libsa/softraid_amd64.c,v
> retrieving revision 1.3
> diff -u -p -r1.3 softraid_amd64.c
> --- softraid_amd64.c 24 Dec 2016 22:49:38 -0000 1.3
> +++ softraid_amd64.c 4 Jan 2017 18:14:16 -0000
> @@ -435,7 +435,7 @@ findopenbsd_gpt(struct sr_boot_volume *b
>   const char openbsd_uuid_code[] = GPT_UUID_OPENBSD;
>   struct gpt_partition gp;
>   static struct uuid *openbsd_uuid = NULL, openbsd_uuid_space;
> - static u_char buf[4096];
> + u_char *buf;
>  
>   /* Prepare OpenBSD UUID */
>   if (openbsd_uuid == NULL) {
> @@ -456,6 +456,8 @@ findopenbsd_gpt(struct sr_boot_volume *b
>   *err = "disk sector > 4096 bytes\n";
>   return (-1);
>   }
> + buf = alloc(bv->sbv_secsize);
> + bzero(buf, bv->sbv_secsize);
>  
>   /* LBA1: GPT Header */
>   lba = 1;
> @@ -466,17 +468,20 @@ findopenbsd_gpt(struct sr_boot_volume *b
>   /* Check signature */
>   if (letoh64(gh.gh_sig) != GPTSIGNATURE) {
>   *err = "bad GPT signature\n";
> + free(buf, bv->sbv_secsize);
>   return (-1);
>   }
>  
>   if (letoh32(gh.gh_rev) != GPTREVISION) {
>   *err = "bad GPT revision\n";
> + free(buf, bv->sbv_secsize);
>   return (-1);
>   }
>  
>   ghsize = letoh32(gh.gh_size);
>   if (ghsize < GPTMINHDRSIZE || ghsize > sizeof(struct gpt_header)) {
>   *err = "bad GPT header size\n";
> + free(buf, bv->sbv_secsize);
>   return (-1);
>   }
>  
> @@ -487,6 +492,7 @@ findopenbsd_gpt(struct sr_boot_volume *b
>   gh.gh_csum = orig_csum;
>   if (letoh32(orig_csum) != new_csum) {
>   *err = "bad GPT header checksum\n";
> + free(buf, bv->sbv_secsize);
>   return (-1);
>   }
>  
> @@ -514,6 +520,9 @@ findopenbsd_gpt(struct sr_boot_volume *b
>   found = 1;
>   }
>   }
> +
> + free(buf, bv->sbv_secsize);
> +
>   if (new_csum != letoh32(gh.gh_part_csum)) {
>   *err = "bad GPT entries checksum\n";
>   return (-1);

Reply | Threaded
Open this post in threaded view
|

Re: Problem with boot block / softraid on current

YASUOKA Masahiko-3
In reply to this post by Stefan Sperling-5
ok yasuoka

Thanks.

On Wed, 4 Jan 2017 20:03:49 +0100
Stefan Sperling <[hidden email]> wrote:

> On Wed, Jan 04, 2017 at 07:16:55PM +0100, Stefan Sperling wrote:
>> The diff below fixes the problem for me. As far as I can tell this
>> should work on both 512 and 4k disks. But I cannot test with a 4k disk.
>  
> And let's check for alloc() failure. Suggested by Theo.
>
> Index: softraid_amd64.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/amd64/stand/libsa/softraid_amd64.c,v
> retrieving revision 1.3
> diff -u -p -r1.3 softraid_amd64.c
> --- softraid_amd64.c 24 Dec 2016 22:49:38 -0000 1.3
> +++ softraid_amd64.c 4 Jan 2017 18:59:45 -0000
> @@ -435,7 +435,7 @@ findopenbsd_gpt(struct sr_boot_volume *b
>   const char openbsd_uuid_code[] = GPT_UUID_OPENBSD;
>   struct gpt_partition gp;
>   static struct uuid *openbsd_uuid = NULL, openbsd_uuid_space;
> - static u_char buf[4096];
> + u_char *buf;
>  
>   /* Prepare OpenBSD UUID */
>   if (openbsd_uuid == NULL) {
> @@ -456,6 +456,12 @@ findopenbsd_gpt(struct sr_boot_volume *b
>   *err = "disk sector > 4096 bytes\n";
>   return (-1);
>   }
> + buf = alloc(bv->sbv_secsize);
> + if (buf == NULL) {
> + *err = "out of memory\n";
> + return (-1);
> + }
> + bzero(buf, bv->sbv_secsize);
>  
>   /* LBA1: GPT Header */
>   lba = 1;
> @@ -466,17 +472,20 @@ findopenbsd_gpt(struct sr_boot_volume *b
>   /* Check signature */
>   if (letoh64(gh.gh_sig) != GPTSIGNATURE) {
>   *err = "bad GPT signature\n";
> + free(buf, bv->sbv_secsize);
>   return (-1);
>   }
>  
>   if (letoh32(gh.gh_rev) != GPTREVISION) {
>   *err = "bad GPT revision\n";
> + free(buf, bv->sbv_secsize);
>   return (-1);
>   }
>  
>   ghsize = letoh32(gh.gh_size);
>   if (ghsize < GPTMINHDRSIZE || ghsize > sizeof(struct gpt_header)) {
>   *err = "bad GPT header size\n";
> + free(buf, bv->sbv_secsize);
>   return (-1);
>   }
>  
> @@ -487,6 +496,7 @@ findopenbsd_gpt(struct sr_boot_volume *b
>   gh.gh_csum = orig_csum;
>   if (letoh32(orig_csum) != new_csum) {
>   *err = "bad GPT header checksum\n";
> + free(buf, bv->sbv_secsize);
>   return (-1);
>   }
>  
> @@ -514,6 +524,9 @@ findopenbsd_gpt(struct sr_boot_volume *b
>   found = 1;
>   }
>   }
> +
> + free(buf, bv->sbv_secsize);
> +
>   if (new_csum != letoh32(gh.gh_part_csum)) {
>   *err = "bad GPT entries checksum\n";
>   return (-1);
>

Reply | Threaded
Open this post in threaded view
|

Re: Problem with boot block / softraid on current

Andreas Bartelt-2
In reply to this post by Stefan Sperling-5
On 01/04/17 20:03, Stefan Sperling wrote:
> On Wed, Jan 04, 2017 at 07:16:55PM +0100, Stefan Sperling wrote:
>> The diff below fixes the problem for me. As far as I can tell this
>> should work on both 512 and 4k disks. But I cannot test with a 4k disk.
>
> And let's check for alloc() failure. Suggested by Theo.
>

your patch also works for me - thanks a lot!