Debugging pxeboot on WRAP

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Debugging pxeboot on WRAP

Rolf Sommerhalder
pxeboot from OpenBSD3.8 (but also from 3.5, 3.6. and 3.7) fails to PXE
boot WRAP appliances with BIOS 1.08 which supports PXE using etherboot
(see www.pcengines.ch):


PC Engines WRAP.1C/1D/1E v1.08
640 KB Base Memory
130048 KB Extended Memory

01F0 - no drive found !
ROM segment 0xe000 length 0x8000 reloc 0x00020000
Etherboot 5.3.12 (GPL) http://etherboot.org
Drivers: NATSEMI   Images: NBI PXE   Exports: PXE
Relocating _text from: [00089370,0009b230) to [07eee140,07f00000)
Boot from (N)etwork (D)isk or (Q)uit? N

Probing pci nic...
[dp83815]
natsemi_probe: MAC addr 00:0D:B9:01:A0:A4 at ioaddr 0X1000
natsemi_probe: Vendor:0X100B Device:0X0020
dp83815: Transceiver default autoneg. enabled, advertise 100 full duplex.
dp83815: Transceiver status 7869 advertising 05E1
dp83815: Setting half-duplex based on negotiated link capability.
Searching for server (DHCP)...
Me: 10.0.0.20, Server: 10.0.0.3, Gateway 10.0.0.1
Loading 10.0.0.3:pxeboot (PXE)done
probing: pc0 com0 pci pxe![2.1]  <--- the cursor stays here


Searching the Web also for Soekris (which is similar to WRAP) hints
that the "A20 gate hack" may be the culprit for this halt.

Therefore tried to patch
 /sys/arch/i386/stand/libsa/gateA20.c
so that it leaves the A20 gate alone, even though it seems to be
already patched as outlined in
 http://blog.gmane.org/gmane.os.netbsd.devel.embedded/month=20050601
and in NetBSD3
 http://releng.netbsd.org/cgi-bin/req-3.cgi?show=504
 http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/i386/stand/lib/gatea20.c

Then I rebuilt pxeboot:
 cd /sys/arch/i386/stand
 make
 scp /usr/src/sys/arch/i386/stand/pxeboot/pxeboot  root@tftpserver/tftpboot/

Still, the boot process halts in the 'probing' line, right after 'pxe![2.1]'

My /tftpboot/bsd should be ok as the same kernel file boot ok from a
CompactFlash card.

My /tftpboot/etc/boot.conf is:
 set tty com0
 stty com0 38400
 boot tftp:/bsd


Do you have any suggestion how I could debug or prevent this freeze,
for example by using debug compile flags in the Makefile, etc.?

Thanks for any suggestions,
Rolf

Reply | Threaded
Open this post in threaded view
|

Re: Debugging pxeboot on WRAP

J.C. Roberts-2
On Mon, 26 Dec 2005 10:13:50 +0100, Rolf Sommerhalder
<[hidden email]> wrote:

>01F0 - no drive found !
>
<snip>
>My /tftpboot/bsd should be ok as the same kernel file boot ok from a
>CompactFlash card.

Should we assume you have removed the CompactFlash device?

Have you tried bsd.rd ?

jcr

Reply | Threaded
Open this post in threaded view
|

Re: Debugging pxeboot on WRAP

Rolf Sommerhalder
On 12/26/05, J.C. Roberts <[hidden email]> wrote:
> >01F0 - no drive found !
> >
> <snip>
> >My /tftpboot/bsd should be ok as the same kernel file boot ok from a
> >CompactFlash card.
>
> Should we assume you have removed the CompactFlash device?

Yes, the CF card is removed, as someone trying PXE on Soekris
experienced problems when there was a such drive. But, although the
"01F0 - no drive found !" error disappears after inserting a bootable
CF card, the pxeboot process still freezes.


> Have you tried bsd.rd ?

No, not yet. Will try that now, even though I expect that it might not
work as it was probably built for GENERIC machine and not for a WRAP
that uses an National SC1100 'Geode' ?

Rolf

Reply | Threaded
Open this post in threaded view
|

Re: Debugging pxeboot on WRAP

Rolf Sommerhalder
In reply to this post by J.C. Roberts-2
On 12/26/05, J.C. Roberts <[hidden email]> wrote:
> Have you tried bsd.rd ?

Just tried it, but pxeboot does not continue to boot either.

tcpdump on the TFTP server reveals that the WRAP's PXE client actually
requests and loads the pxeboot file, but does not get that far where
it would request the kernel file bsd or bsd.rd.

Now trying to understand the various debug switches in the Makefile,
hoping that they will reveal more output before the WRAP freezes.

Rolf

Reply | Threaded
Open this post in threaded view
|

Re: Debugging pxeboot on WRAP

Tom Cosgrove-2
In reply to this post by Rolf Sommerhalder
>>> Rolf Sommerhalder 26-Dec-05 09:13 >>>
>
> pxeboot from OpenBSD3.8 (but also from 3.5, 3.6. and 3.7) fails to PXE
> boot WRAP appliances with BIOS 1.08 which supports PXE using etherboot
> (see www.pcengines.ch):
:

> Probing pci nic...
> [dp83815]
> natsemi_probe: MAC addr 00:0D:B9:01:A0:A4 at ioaddr 0X1000
> natsemi_probe: Vendor:0X100B Device:0X0020
> dp83815: Transceiver default autoneg. enabled, advertise 100 full duplex.
> dp83815: Transceiver status 7869 advertising 05E1
> dp83815: Setting half-duplex based on negotiated link capability.
> Searching for server (DHCP)...
> Me: 10.0.0.20, Server: 10.0.0.3, Gateway 10.0.0.1
> Loading 10.0.0.3:pxeboot (PXE)done
> probing: pc0 com0 pci pxe![2.1]  <--- the cursor stays here
:
> Searching the Web also for Soekris (which is similar to WRAP) hints
> that the "A20 gate hack" may be the culprit for this halt.

This is very unlikely to have anything to do with it, as I used the
Soekris while implementing pxeboot on OpenBSD (based on the NetBSD code).

> Still, the boot process halts in the 'probing' line, right after 'pxe![2.1]'

The most likely reason is that pxeboot's calls into the PXE stack on
the WRAP are failing (to return).

> Do you have any suggestion how I could debug or prevent this freeze,
> for example by using debug compile flags in the Makefile, etc.?

Get me a WRAP board, babysit the kids for a few evenings, and wait
patiently :)

If you want to look at it yourself, make sure you understand i386
assembler, the PXE specification, and protected-to-real-and-back mode
switching.

You could find out if pxe_call works at all on the WRAP in its current
implementation by putting a printf() after it, and seeing if there'
any output.  Look in pxe.c:pxe_init().

Tom

Reply | Threaded
Open this post in threaded view
|

Re: Debugging pxeboot on WRAP

Rolf Sommerhalder
In reply to this post by Rolf Sommerhalder
After inserting some printf() debug statements into
 /sys/arch/i386/stand/libsa/pxe.c
I found that the call to the assembler subroutine
 pxe_call(PXENV_GET_CACHED_INFO);
never returns.

It looks like either there is something wrong with that call, or with
the PXE code from Etherboot.

Rolf

Reply | Threaded
Open this post in threaded view
|

Re: Debugging pxeboot on WRAP

Rolf Sommerhalder
The posting
 http://www.monkey.org/openbsd/archive2/bugs/200503/msg00001.html
is interesting, as it points out that there has already been a problem
with pxe_call.

> single-stepping back into pxeboot.  Five instructions later, I hit the
> lockup point at 4012:403c.  The instruction causing the problem is:
>
>   addrsize opsize lgdt [ds:0x45e80]
>
> which is the line marked "Load the GDT" in the following code from
> pxe_call.S in the OpenBSD source:
>
>   /*
>    * real_to_prot()
>    *
>    * Switch the processor back into protected mode.
>    */
>  .globl real_to_prot
>   real_to_prot:
>  .code16
>
>  xorw %ax, %ax
>  movw %ax, %ds /* Load %ds so we can get at Gdtr */
>  data32 addr32 lgdt Gdtr /* Load the GDT */
>           ...
>
> Note the address [ds:0x45e80] that this resolves to in the pxeboot binary.
> In particular, note that the offset contains five hexadecimal digits.
> We're allegedly in real-mode at this point.  We can't access more than 64k
> in each segment, yet this instruction is trying to access data at an
> offset of approximately 279k.  The CPU doesn't like this.


Not sure whether this issue was fixed in OpenBSD yet. That code in OpenBSD3.8 is
/sys/arch/i386/stand/libsa/pxe_call.S
...
/*
 * real_to_prot()
 *
 * Switch the processor back into protected mode.
 */
        .globl  real_to_prot
real_to_prot:
        .code16

        movw    $LINKADDR >> 4, %ax     /* We're linked to LINKADDR/16:0000 */
        movw    %ax, %ds
        data32 addr32 lgdt (Gdtr - LINKADDR)    /* Reload the GDT */

        movl    %cr0, %eax              /* Enable protected mode */
        orl     $CR0_PE, %eax
        movl    %eax, %cr0

        data32 ljmp     $S32TEXT, $r2p32 /* Reload %cs, flush pipeline */
r2p32:
        .code32
        /* Reload 32-bit %ds, %ss, %es */
        movl    $S32DATA, %eax
        mov     %ax, %ds
        mov     %ax, %ss
        mov     %ax, %es
...

Ah yes, according to CVS log
 http://www.openbsd.org/cgi-bin/cvsweb/src/sys/arch/i386/stand/libsa/pxe_call.S
that real/protected mode problem should be patched since v1.2.

But did current v1.3 eventually break it again?

Rolf

Reply | Threaded
Open this post in threaded view
|

Re: Debugging pxeboot on WRAP

Stuart Henderson
In reply to this post by Rolf Sommerhalder
> No, not yet. Will try that now, even though I expect that it might not
> work as it was probably built for GENERIC machine and not for a WRAP
> that uses an National SC1100 'Geode' ?

Doesn't really help you, but GENERIC runs just fine on Geode machines...

Reply | Threaded
Open this post in threaded view
|

Re: Debugging pxeboot on WRAP

Rolf Sommerhalder
In reply to this post by Rolf Sommerhalder
> Ah yes, according to CVS log
>  http://www.openbsd.org/cgi-bin/cvsweb/src/sys/arch/i386/stand/libsa/pxe_call.S
> that real/protected mode problem should be patched since v1.2.
>
> But did current v1.3 eventually break it again?

Replacing pxe_call.S v1.3 by v1.2 does unfortunately not solve the
problem of pxe_call() never returning.

Rolf

Reply | Threaded
Open this post in threaded view
|

Re: Debugging pxeboot on WRAP

Rolf Sommerhalder
In reply to this post by Tom Cosgrove-2
On 12/26/05, Tom Cosgrove <[hidden email]> wrote:

> You could find out if pxe_call works at all on the WRAP in its current
> implementation by putting a printf() after it, and seeing if there'
> any output.  Look in pxe.c:pxe_init().

Thanks, did that and definitely pxe_call() never returns. And it is
not specific to pxe_call(PXENV_GET_CACHED_INFO), because I also called
pxeinfo() which does a pxe_call(PXENV_UNDI_GET_NIC_TYPE) which never
returns neither!

As you say, it looks like I might have to refresh rusty gdb skills
now, and figure out what 'boch' does (as I am totally inexperienced
with baby sitting ;-)

Rolf

Reply | Threaded
Open this post in threaded view
|

Re: Debugging pxeboot on WRAP

Tom Cosgrove-2
In reply to this post by Rolf Sommerhalder
>>> Rolf Sommerhalder 26-Dec-05 11:45 >>>
>
> The posting
>  http://www.monkey.org/openbsd/archive2/bugs/200503/msg00001.html
> is interesting, as it points out that there has already been a problem
> with pxe_call.

Why is that posting interesting?  That bug was fixed.  I said that the
problem would be pxe_call in my last email to misc@.

As I said in my last email, if you want to look at it yourself, make
sure you understand i386 assembler, the PXE specification, and protected-
to-real-and-back mode switching.

Thanks

Tom


   Date: Sat, 12 Mar 2005 14:52:02 -0700 (MST)
   From: Tom Cosgrove <[hidden email]>
   To: [hidden email]
   Subject: CVS: cvs.openbsd.org: src

   CVSROOT:        /cvs
   Module name:    src
   Changes by:     [hidden email]     2005/03/12 14:52:02

   Modified files:
           sys/arch/i386/stand/libsa: pxe_call.S
           sys/arch/i386/stand/pxeboot: conf.c

   Log message:
   On return from real mode, reload the GDT using a 16-bit pointer
   rather than a 32-bit value.  Found by Tim Fletcher <tim (at)
   parrswood (dot) manchester (dot) sch (dot) uk> using Etherboot;
   thanks to Tim and the Etherboot developers who narrowed this down.

   Also bump the pxeboot version to 1.01.

   ok weingart@, "go ahead" deraadt@

Reply | Threaded
Open this post in threaded view
|

Re: Debugging pxeboot on WRAP

Rolf Sommerhalder
In reply to this post by Rolf Sommerhalder
Another OpenBSD on WRAP user wrote to me saying that pxeboot works.
Also, I found http://www.ultradesic.com/?section=43 which descripbes
PXE booting OpenBSD for the Soekris plattform which is very similar to
WRAP.

Both encouraged me to dig deeper:
a) pxeboot finds both labels '!PXE' and 'PXENV' in the BIOS code;
b) the checksums of both those BIOS section are OK, e.g. PXE code in
the BIOS appears to be intact;
c) forcing pxeboot to use the legacy PXENV (instead of the !PXE v2.1)
API results in pxe_call()  to return OK. (forced by commenting out the
line  " bang = 1; "   in /sys/arch/i386/stand/libsa/pxe.c)
However, it appears that result fields of those calls are filled with
zero. Because calls of pxeinfo() returns with IP addresses and netmask
as 0.0.0.0, instead of DHCP client & server addresses.
d) Upgrading WRAP's BIOS from 1.08 to 1.10 did not make any difference.

Notably finding c) encouraged me to also question my DHCP server
configuration, which currently is:

host wrap {
 hardware ethernet 00:0d:b9:01:a0:a4;
 option host-name "wrapobsd";
 fixed-address 10.0.0.20;
 next-server 10.0.0.3;
 option root-path "10.0.0.3:/tftpboot";
 filename "/pxeboot";
}

Just to crosscheck the PXE capability of WRAP's BIOS, I also tried to
load pxegrub from GRUB as 2nd stage boot loader, instead of pxeboot
from OpenBSD.  So far, pxegrub gets loaded, but I do not get any GRUB
prompt yet (something with serial console port parameters might still
be wron in my GRUB configure).

Any suggestions warmly welcome,
Rolf

Reply | Threaded
Open this post in threaded view
|

Re: Debugging pxeboot on WRAP

Rolf Sommerhalder
Good news - my WRAPs now pxeboot OpenBSD as expected!  The culprit was
not pxeboot, but the etherboot PXE code 5.3.12 in BIOS 1.08 and 1.10,
as supplied by PCengines.

After building an etherboot 5.4.1 binary on rom-o-matic.org, merging
it into the BIOS and flashing the WRAPs, network boot of OpenBSD now
works very nicely :-)

I'll document that a bit more in detail in order to wrap up this
thread, and to inform PCengines that OpenBSD can easily network boot.

Thanks to all who replied, notably to Marc who triggered me to try
replacing the etherboot code in the BIOS,
Rolf

Reply | Threaded
Open this post in threaded view
|

Re: Debugging pxeboot on WRAP

J.C. Roberts-2
On Tue, 27 Dec 2005 13:29:17 +0100, Rolf Sommerhalder
<[hidden email]> wrote:

>Good news - my WRAPs now pxeboot OpenBSD as expected!  The culprit was
>not pxeboot, but the etherboot PXE code 5.3.12 in BIOS 1.08 and 1.10,
>as supplied by PCengines.
>

Seems you were lucky but if you had to dig into the code yourself and
debug things, the list of required knowledge posted by Tom Cosgrove is
important.

Though I haven't tested the code presented, the article linked below
covers part of the required knowledge that Tom mentioned, namely, the
Protected Mode to Real Mode switching.

http://www.sudleyplace.com/pmtorm.html

kind regards,
jcr