Site Map XML
简体中文 English
Home > News > How to free updates for NPU in guaranting bandwidth statistics to collecte from the premise.

News

How to free updates for NPU in guaranting bandwidth statistics to collecte from the premise.
2010-03-03 16:42:19

Accesses the net the time: In July, 2006
Article origin: Electronic engineering special edition
Author: Jeremy Bicknell
Product Manager
Flow control control section
IDT Corporation
How to free up NPU for statistics gathering in the era of guaranteed bandwidth Content: With the explosive increase of the workload for statistics and calculation, the designers of network equipments will face the crisis for the capability of the calculation. The performance bottleneck will arise if use traditional structure based on outside memory to address the issue. In the network environment with high performance, the designers find that dedicate coprocessor optimized for statistics function will be a simple and convenient choice to meet the whole performance of the network. In not the far future, will process and saves the demand along with the statistical operation the explosive growth, the network equipment designer faces the computation crisis. With will rely on solves this problem to exterior memory buffer traditional overhead construction finally to cause the performance bottleneck. In the high performance network environment, many designers discovered the use special-purpose, the ready-made association processor which will optimize for the statistical function for satisfies the overall performance the demand to provide a simple convenient design choice. Will process and saves the demand along with the statistical operation the explosive growth, the network equipment designer faces the computation crisis. With will rely on solves this problem to exterior memory buffer traditional overhead construction finally to cause the performance bottleneck. In the high performance network environment, many designers discovered the use special-purpose, the ready-made association processor which will optimize for the statistical function for satisfies the overall performance the demand to provide a simple convenient design choice. In the past in several years, the service provider promoted the widespread difference service to create the new source of income, and satisfied the service and the user demand which the network application grew day by day. These services ranks from VoIP to VPN, frequently needed the service provider to be able to satisfy compared to the past harsher performance requirement. In order to support these new applications, the service provider's already fast transformed to the use service horizontal agreement (SLA) defines between all quarters the congruent relationship, stipulated provided current capacity category and transmission for each current capacity category data total, and guarantee network performance capability. Chart 1: IDT statistics engine this kind of tendency has the quite tremendous influence to the network equipment design. In order to support the more and more many services classification and the confirmation agreement, the service provider now must the estimated data package, and the expansion related network performance and the use aspect unceasingly increases statistic. In the IP network, usually is to TCP, UDP, ICMP, IPSec, IPv4, IPv6 and all networkings computer or the apple computer ethernet connection track statistics, so long as enters "netstat-s" in the order prompt symbol very to be easy to demonstrate. Although uses for to carry out this kind of statistical collection computer system resources to be allowed to neglect, but gathered the massive users' network equipment expenses on to be extremely different (see Table 1). Table 1: Simplification statistical collection situation (2.5Gbps) increases along with the line speed from OC-48 to the 10Gbps gathering speed and OC-192, the excess clears (oversubscription) technical and the network use fast is increasing, moreover the duty size already started to surpass the core data packet processor the area of competence. Except satisfies the current capacity and the current capacity parameter computation aspect counter quantity request, but also must consider to the total according to rate and the counter renewal rate current capacity type influence. Uses the same hypothesis, the counter renewal rate may calculate (see Table 1). The supposition fast transits to the 10Gbps data rate, follows is transiting from simple document downloading to based on the conversation level data stream, the network equipment developer needs one new method to carry out the statistical operation. Since long ago, the statistical function size and the scope have limited the line card core data packet processor processing data packet and the maintenance network transmissibility ability. The network equipment designer's challenge is found one kind to be new, a more effective method, tracks the increase the data, but does not affect the data packet actuation operation. Through unloads this primary mission from the main data packet processor, the network equipment designer may release uses for to carry out the in-depth data packet classification these processor cycles, supports the higher paraffin other wrapped gift and the longer essential character retrieval next generation network application demand. Design choice The tradition said that, the network equipment designers all are the choice complete the statistical operation with software. This duty frequently manages by general CPU or the NPU core data packet processor, and supports by exterior SRAM. So long as data rate maintenance relatively slow somewhat, this method may carry out well. But, along with the network line speed enhancement, the traditional increase/storage overhead construction limitation also remarkably increased. In this topology, the core data packet processor must look for the fetching from the piece external memory according to, carries out including the increase, the decrement or increases a counter and so on the suitable arithmetic operation, then returns to the data writes exterior memory. This complex process has taken the data packet processor cycle, and causes between CPU and exterior memory entire memory main line band width is anxious. Along with the line speed increase, the statistical operation quantity and the memory main line use possibly surpasses the context or the data processor core load, causes the processor to stop moving and reducing the line card performance. The network equipment designer attempts through all or the partial statistical duty solves this problem from the core data packet processor unloading. For example, some designers shift the statistical function to in the FPGA special-purpose logic, or integrates this function in ASIC. But, these two kind of solution has all brought many adverse effects. FPGA cannot satisfy on the piece which under the now line speed the high speed statistical operation needs the packing density. Moreover, the designer also must use exterior SRAM to support FPGA, simultaneously also faces with the traditional addressing and SRAM disposes related reads/changes/writes the detention the question. Special-purpose ASIC although may provide the high performance, and on increase massive pieces memory property, but, because ASIC the average NRE expense surpasses million US dollars, to specially completes the statistical computation ASIC to carry on the design, the confirmation and the confirmation duty is expensive with difficulty makes one accept. The network equipment designer faces the question already became: How has the cost benefit by one kind the way to solve this problem? How "liberation" core data packet processor, no matter is NPU, ASIC or FPGA, causes it to concentrate designs at first to the solution when plans the solution the data packet classification function question? The ideal situation is, this solution cooperates the processor by the ready-made low cost to be composed above, this kind of association processor is specially, may eliminate mentions blocks the question which optimizes for this function. As a solution, but also requests to have the high performance and the profession standard connection, simplifies the NPU array as well as at present popular special-purpose data packet processor which the line card the design and the support unceasingly increases. Finally, any solution all should have high software to be possible the disposition, meets each kind of different application need. Conversation boundary controller (SBC) One kind may explain how the statistical engine is helps the solution statistics operation unloading the method is observes it in a SBC realization method. Along with VoIP deployment unceasing increase, when the data passed through the network and between the network section boundary, through the letter the level, the call is controlled the level and in the data envelope real-time conversation, SBC acts in these networks the count for much role. These equipment usually establish in may trust the private network (picture to operate privately company LAN) (to like Internet) with the non- trust public network between, or between two services providers network. These equipment may provide to transmit make the information to the network core VoIP letter the visit, and through controls the network the media data packet deposit and withdrawal, supports difference service which the different media flows, for example costs with the grade of service. Used for to protect the net boundary SBC in to pass through between the network the firewall aspect to act the very important role, moreover was helpful to the implementation logarithm carries on according to a package of pronunciation legitimately intercepts this kind of conventional order. The SBC development faces one of main challenges is simply also effectively promotes between the operation business which the network alleviates fast climbs the VoIP current capacity huge pressure. At present the majority of equipment all design support 1Gbps the line speed, but many networks all are promoting, supports 10Gbps the ether mesh wire speed. Along with the line speed enhancement, with will cost the statistical operation related processing expenses which, the load balance, the firewall protection and other services will need to assume the exponential growth. A SBC design reaches as high as the 5Gbps speed, 3 microseconds detentions filters the data packet, may support reaches 32,,000 also to converse. Installs the controller may support the many kinds of securities and the address preservation characteristic on a compact 1U board, including has only the firewall pinhole which flows for the authorized media founds, as well as transforms after the dual network address and the port hides in 3 and 5 under network topology network access control. In the SLA performance aspect, the controller may support the establishment or switches over in the real-time band width conversation permission control in the deposit and withdrawal which on the link may use. The signal media 2 and 3 grades of service data packet mark may optimize in the network the current capacity section and the priority, and may prevent the grade of service stealing. SBC also is the SLA report, the question warned, the isolation and the conversation permission control provides each conversation the grade of service statistics. Along with the line speed increase, the statistical operation "will steal" the more processing cycle percentage from the core SBC data packet processor. The designer may solve this problem, and through unloads the statistical operation to a statistical engine cooperates the processor the way to lengthen the life cycle which at present designs. The statistical engine connection is passes through a profession standard network processor forum (NPF) the LA-1 connection (compatible QDR-II) to turn on the core data packet processor. This reduced the development time based on the standard main line, and through with each kind of NPU, FPGA and the ASIC seamless connection simplified the line card design. At present the LA-1 standard stipulation speed is 167MHz, but the connection statistics engine may support surpasses 200MHz the clock speed. Are same with the support standard QDR-II connection other chips, count the engine also to use the independent port to carry on the read-write data access. The main line is unidirectional, moreover is by the high main line speed, and may use has the read-write address which optimizes for the signal integrity the list DDR address bus. Reads takes the address to be possible in the clock cycle first half issue of receive, reads in the address in the clock cycle latter half issue of receive. When byte write signal if data feeds the data in which on the main line controls when the clock cycle two half issue of simultaneous reception, read-write enables in the clock cycle first half issue to carry on the receive. The echo clock outputs may take the data the under good clock outputs. The HSTL exterior connection may support is higher than SRAM the use the traditional TTL connection speed. Counts the processor to be possible simultaneously to maintain inputs and outputs the port 全带宽. All data all are have arise suddenly the level addressing ability two characters pulses form. Different is, the address wire is not uses for to support the plane address mapping, but is serves as counts the engine the control input, (OPCODES) and the indicator shifts the arithmetic operation part to is located counts the engine in the counter. As designs the electric circuit board from the beginning one substitution method, the line card use hardware may be the existing design or based on the module, the ready-made board card, if the ATCA board card (see Figure 2). Chart 2: The line card hardware may be the existing design or includes the ATCA carrier card based on the module presently 成板the card Counts the engine fast execution statistics operation the majority of abilities to be possible 归因于 it "namely to send namely abandons" (Fire and Forget) the pattern. This function permission equipment is reaching on 4 counters by an order to carry out automatically reads - changes - writes the operation, and by QDR-II speed maintenance renewal. This performance strengthens the key lies in it to be able simultaneously to transmit 32 data and the address, as well as transmits 4 lines of operation codes ability on the ALU 36 main lines. The operation code may include an increase, increase to compensate or the instruction centralism any instruction. For example, costs to the model applies said that, group of 4 OPCODES possibly includes: 1. Set Register (establishes register); 2. INC/SUM (operand: +1,/32 input); 3. SUM/SUM (operand: 16 input /16 input); 4. SUM/SUM lacks the province (the operand: 32 input /32 to lack the province); 5. DEC/SUB (operand: -1/32 input); 6. SUB/SUB lacks the province (the operand: 32 input /32 to lack the province); 7. NOP/SUB (operand: 0,/32 input); 8. SUB/NOP (operand: 32 input /0). When the equipment receives to follow has the statistics to enable (STEN) the position start and suitable statistical OPCODE as well as the data reads in the order, the statistical operation starts to carry out. Regarding an general data packet processor /SRAM disposition, must first read the fetching from SRAM according to, after completes a time of operation, then uses a tradition to read - changes - writes the cycle to return the data writes in SRAM, this needs 4 time of QDR-II operation. But "namely sends namely abandons" the function to allow the processor to count the engine to transmit an order, only can complete the renewal in a cycle to completely 4 counters. Through these all operations compression is an order, "namely sends namely abandons" the pattern to be possible to release the QDR-II band width and remarkably to improve SBC the performance. Chart 3 had demonstrated front discusses the counter renewal example the processor cycle which uses after the process statistics engine unloading (expresses for line speed function). Chart 3: In receives 50% smallest length data packet and each data stream has 4 counters in the situations, uses in NPU use percentage which the counter renews In this simple example, 50% data packet which receives is 64 bytes lengths smallest ethernet data packets, 50% data packet length is 1,,518 bytes. The NPU use counter may simultaneously track the data packet and the byte which all receives. In this example, the byte which receives divides into a 256 bytes group at first by NPU, and operates SUM by the counter renewal to add to this &#118alue a byte counter in the current memory &#118alue. When NPU receives to a complete data packet, may increase progressively the correlation data package of counter &#118alue. This kind uses in to count in the engine the technology to be possible to make the QDR-II band width to enhance 87%, causes the line card data packet processor cycle to reduce 90%. In the SBC application, the special-purpose statistical engine increase may cause the data channel processor to be more effective than other overhead constructions statistical collections. The efficiency enhancement causes to be possible to use the handling ability to be possible to redeploy, thus provides the extra network characteristic (for example a higher total volume of goods handled or more conversations). These extra incomes, but effectively for multiuser provides a richer service, thus increase service provider's income.

 



[Return Home] [Print] [Go Back]

Contact Us

点击这里给我发消息
msn
skype

Subscribe

  • My MSN
  • google reader
  • my yahoo
  • My Aol
Back Up
Rss
Site Map
Inquiry