bge(4) transmit performance improvement

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

bge(4) transmit performance improvement

Brad Smith-14
I'm interested in finding out if anyone on this list has a lab sort of setup
with bge gear. Where the following diff could be benchmarked, the idea being a router
or firewall like setup. I'd like to see if this diff actually has any noticeable
difference in transmit performance or if it translates out to a micro-optimization
and there is very little difference, if any at all.

Please try it out anyway, let me know how it goes and if you do test the diff
then provide me with a dmesg too.


Correct a performance bug from Bill Paul's original FreeBSD bge(4) driver:

Each call to the FreeBSD bge_start() routine the transmit producer
pointer index from the chip mailbox register BGE_MBX_TX_HOST_PROD0_LO.
The local copy of that value is then updated by bge_encap() as
bge_encap() encapsulates packets in the Tx ring. If bge_encap()
succeds in encpuslating one or more packets, bge_start() tells the
chip to start sending the newly-encinitiates writes the new value back
to the chip mailbox register.

However, comparison of the Linux drivers (Broadcom-supplied and
open-source tg3.c) and to the OpenSolaris driver confirms that
register BGE_MBX_TX_HOST_PROD0_LO is write-only to software.
Thus, we can just keep a copy in the softc, and eliminate the
(expensive) PCI register write on each call to bge_start().

From jonathan NetBSD


Index: if_bge.c
===================================================================
RCS file: /cvs/src/sys/dev/pci/if_bge.c,v
retrieving revision 1.92
diff -u -p -r1.92 if_bge.c
--- if_bge.c 14 Nov 2005 13:11:40 -0000 1.92
+++ if_bge.c 18 Nov 2005 18:29:02 -0000
@@ -1058,10 +1058,14 @@ bge_init_tx_ring(struct bge_softc *sc)
 
  sc->bge_txcnt = 0;
  sc->bge_tx_saved_considx = 0;
- CSR_WRITE_4(sc, BGE_MBX_TX_HOST_PROD0_LO, 0);
+
+ /* Initialize transmit producer index for host-memory send ring. */
+ sc->bge_tx_prodidx = 0;
+ CSR_WRITE_4(sc, BGE_MBX_TX_HOST_PROD0_LO, sc->bge_tx_prodidx);
  if (sc->bge_quirks & BGE_QUIRK_PRODUCER_BUG)
- CSR_WRITE_4(sc, BGE_MBX_TX_HOST_PROD0_LO, 0);
+ CSR_WRITE_4(sc, BGE_MBX_TX_HOST_PROD0_LO, sc->bge_tx_prodidx);
 
+ /* NIC-memory send ring not used; initialize to zero. */
  CSR_WRITE_4(sc, BGE_MBX_TX_NIC_PROD0_LO, 0);
  if (sc->bge_quirks & BGE_QUIRK_PRODUCER_BUG)
  CSR_WRITE_4(sc, BGE_MBX_TX_NIC_PROD0_LO, 0);
@@ -2805,7 +2809,7 @@ bge_start(struct ifnet *ifp)
 {
  struct bge_softc *sc;
  struct mbuf *m_head = NULL;
- u_int32_t prodidx = 0;
+ u_int32_t prodidx;
  int pkts = 0;
 
  sc = ifp->if_softc;
@@ -2813,7 +2817,7 @@ bge_start(struct ifnet *ifp)
  if (!sc->bge_link && ifp->if_snd.ifq_len < 10)
  return;
 
- prodidx = CSR_READ_4(sc, BGE_MBX_TX_HOST_PROD0_LO);
+ prodidx = sc->bge_tx_prodidx;
 
  while(sc->bge_cdata.bge_tx_chain[prodidx] == NULL) {
  IFQ_POLL(&ifp->if_snd, m_head);
@@ -2869,6 +2873,8 @@ bge_start(struct ifnet *ifp)
  CSR_WRITE_4(sc, BGE_MBX_TX_HOST_PROD0_LO, prodidx);
  if (sc->bge_quirks & BGE_QUIRK_PRODUCER_BUG)
  CSR_WRITE_4(sc, BGE_MBX_TX_HOST_PROD0_LO, prodidx);
+
+ sc->bge_tx_prodidx = prodidx;
 
  /*
  * Set a timeout in case the chip goes out to lunch.
Index: if_bgereg.h
===================================================================
RCS file: /cvs/src/sys/dev/pci/if_bgereg.h,v
retrieving revision 1.30
diff -u -p -r1.30 if_bgereg.h
--- if_bgereg.h 9 Oct 2005 23:41:55 -0000 1.30
+++ if_bgereg.h 18 Nov 2005 18:29:03 -0000
@@ -2328,6 +2328,7 @@ struct bge_softc {
  u_int16_t bge_rx_saved_considx;
  u_int16_t bge_ev_saved_considx;
  u_int16_t bge_return_ring_cnt;
+ u_int32_t bge_tx_prodidx;
  u_int16_t bge_std; /* current std ring head */
  u_int16_t bge_jumbo; /* current jumo ring head */
  SLIST_HEAD(__bge_jfreehead, bge_jpool_entry) bge_jfree_listhead;