304-487 Computer Architecture Laboratory

Project Abstract Examples


Here are some examples of preliminary project abstracts submitted for approval after the initial research and project formulation stages. They provide an indication of the required format for the preliminary abstract, as well as the scope and diversity of the topics which have been achieved in the term projects.


Fall 1996 Winter 1997
Fall 1997 Winter 1998
Fall 1998 Winter 1999
Fall 1999 Winter 2000
Fall 2000 Winter 2001
Fall 2001 Winter 2002
Fall 2002 Winter 2003
 

Fall 1996


Title: An Integer Multiplier For Large N Using Convolution This project implements the multiplication of two integers using the convolution algorithm (Leighton 1992). The multiplication works in a serial fashion and takes 4N - 1 cycles to complete for an N x N bit multiplication using a convolutional approach to calculating and storing individual bits of the product. This project has been chosen in particular to relieve the furure classes of worrying about multiplication of large numbers for other algorithms by using minimum chip area for the multiplier (e.g. RSA algorithm requires a 512 X 512 bit multiplier). The multiplier optimizes on area and the stress will be on beating the LPM multipliers provided in the Altera Library as far as size and scalability is concerned. During the past few days, we have had the opportunity to go through many different types of algorithms for multiplication. We chose this one because of its simplicity as far as scaling was concerned and the fact that lot of students are currently facing problems of space when using multipliers from the Altera library. Another area of stress woul d be the sc alability of the multiplier and that is where our research will be most concentrated and will result in this being a unique project. Although the actual implementation should be a comparatively easy task, the design strategies will have to go in depth to explore many options in an effort to achieve desirable results. References: 1. F. Thomson Leighton, Introduction To Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes, Morgan Kaufmann. 2. Hamacher, Vranesic, Zaky, Computer Organization, Computer Science Series. 3. Hayes, Computer Architecture And Organization, McGraw Hill 4. Chaum, Rivest, Sherman, Advances In Cryptology, Plenum 5. Cormen, Leiserson, Rivest, Introduction to Algorithms (McGraw Hill) 6. Patterson, Hennessy, Computer Organization And Design, The Hardware/Software Interface, Morgan Kaufmann.
Title: Pulse-Mode Digital Multilayer Neural Network In this project, a pulse-mode digital multilayer neural network (DMNN) based on stochastic computing techniques is implemented with simple logic gates as basic computing elements. The pulse-mode signal representation and the use of simple logic gates for neural operations lead to a massively parallel yet compact and flexible network architecture, well suited for VLSI implementation. Moreover, algebraic neural operations are replaced by stochastic processes using pseudorandom pulse sequences. The synaptic weights and neuron states are represented as probabilities and estimated as average pulse occurence rates in corresponding pulse sequences. Finally, a DMNN feedforward architecture for recognizing patterns (digits) is presented and its performance is evaluated.. The architecture and all digital sub-components in the DMNN are modeled and simulated in VHDL. References: 1. Laurene Fausett, 'Fundamentals of Neural Networks', Prentice-Hall, 1994. 2. David E. Van den Bout, T. K. Miller, 'A Stochastic Architecture for Neural Nets', Proceedings of IJCNN-88, 1988. 3. Young-Chul Kim, Micheal A. Shanblatt, 'Architecture and Statistical Model of a Pulse-Mode Digital Multilayer Neural Network', IEEE Transactions on Neural Networks, 1995. 4. Dennis J. Walker, M. A. Sivilotti, 'A Digital Neural Network Architecture for VLSI', Proceedings of IJCNN-92, 1992. 5. Young-Chul Kim, Micheal A. Shanblatt, 'An implementable Digital Multilayer Neural Network', proceedings of IJCNN-91, 1991.
Title: Design of the RSA (Rivest, Shamir, and Adleman) public-key cryptosystem using Altera's VHDL." Cryptography is a security measure used to render data unintelligible to unauthorized parties. In this day and age, vast amounts of digital data is transmitted within complex communications networks, and thus cryptography becomes a necessity. For our project, we will undertake the design of the RSA public-key cryptosystem using VHDL. We want to be able to encrypt/decrypt a message sent between two communicating parties, using both a public key and a secret key. Due to time constraints and the level of complexity of the RSA algorithm, we will not be incorporating the "digital signature" aspect of the RSA cryptosystem into our design. References: 1. T. H. Cormen, C. E. Leiserson and R. L. Rivest, "Introduction to Algorithms" McGraw-Hill Book Company, 1990. 2. C. H. Meyer and S. M. Matyas, "Cryptography: A new dimension in computer data security" John Wiley & Sons Inc., 1982. 3. A. G. Konheirm, "Cryptography: A Primer" John Wiley & Sons Inc., 1981.
Title: 8x8 Multicast Knockout Switch In this project we take the general design of a Knockout Switch with the following parameters: # of inputs = 8 # of outputs = 8 cell length = 41 bits = 32 bits data + 9 bits header These parameters may be changed if we deem to do so. Our implementation will have multicast capabilities (a cell at one input may go to many outputs) and possibly a priority scheme. The probabities of cell loss due to the internal switching as well as the output buffering will be discussed. Our synthesized FPGA circuit will then be simulated in order to determine the various performance paramaters (max clock frequency, max packet rate, etc.) References: 1. Y. Yeh, M. Hluchyj, and A. Acampora, "The Knockout Switch: A Simple Modular Architecture for High-Performance Packet Switching", IEEE J. Selected Areas Commun., vol. SAC-5, no.8, pp.1274-1283, October 1987. 2. M.Prycker, "Asynchronous Transfer Mode", Prentice Hall, 1995 3. T. Robertazzi, "Performance Evaluation of High Speed Switching Fabrics and Networks", IEEE Press, 1993. 4. G. Biran, "Introduction to ATM Switching", ftp.technion.ac.il/teach/topnet/cnpp94/gbiran/atm_switch.html, October 1996.
Back to top

Fall 1997


Title: Implementation of Database Operations in Computer Hardware One of the most popular application of computers is storage and retrieval of large information databases. Currently most database operations are implemented using software. This project implements most of the common database operations directly into hardware and attempts to improve efficiency in terms of speed compared to software based approaches. The four primary operations implemented by most database programs are storage, retrieval, sort and search through data. The 'run length encoding' compression algorithm is used as the primary data compression tool in order to conserve storage space requirements. A separate decompression unit restores the original data during the retrieval process. In order to sort data the quick sort algorithm is implemented in hardware. This is followed by implementing a binary search operation on the sorted data. In conclusion this paper evaluates design objectives and discusses the relative advantages and disadvantages of a hardware based approach for database operations. References: 1. Gilbert Held, 'Data Compression: Techniques and Applications', John Wiley & Sons Ltd., 1991. 2. Mulford, J. B., Ridall, R. K., 'Data compression techniques for economic processing of large commercial files', ACM Symposium of Information Retrieval and Storage. 3. Gonnet, G. H., Baeza-Yates, 'Handbook of Algorithms and Data Structures', Addison-Wesley. 4. Silvester, P. P., 'Data Structures for Engineering Software', Computational Mechanics Publications, 1993.
Title: A Simple ATM Switch using the Batcher-Banyan Network Topology Asynchronous Transfer Mode (ATM) is a network standard that allows a wide range of applications (voice, multimedia, computer data) to be carried on a single network. This capability has generated much interest in ATM and made it the standard for the Broadband ISDN network. The success of this new standard depends on the availability of switches that can handle the high speeds required by new applications. This project will attempt to design a simple ATM switch using the Batcher-Banyan network topology. The switch operates only at the ATM layer (multiplexing cells onto a single stream, translation of VPIs) assuming that there is a physical layer interface underneath that supports the UTOPIA interface. The switch will only handle simple point-to-point connections, ignoring broadcast and multicast issues; thus the design shall use the Batcher-Banyan network topology as a switching fabric. A simple 2x2 switching element with limited support for output contention will be designed. References: 1. M. Boisseau, M. Demange, J-M. Munier, 'ATM Technology an Introduction', Thomson Computer Press, 1996. 2. M. Prycker, 'Asynchronous Transfer Mode Solution for Broadband ISDN', Prentice Hall, 1995. 3. W.J. Goralski, 'Introduction to ATM Networking', McGraw Hill, 1995.
Title: Polygon transformer and clipper The purpose of this project is to implement graphic clipping routines and polygon transformation routines. As it is well know that hardware implementation is faster than its software counterpart, this design will attempt to increase the speed of drawing polygons so that graphic intensive applications will benefit from this improvement in speed. The proposed design takes as input a polygon and performs the following operations on it: Rotation, Translation, Scaling. The design will also support window clipping and viewport mapping. For these, two pairs of coordinates will specifiy the diagonal of the viewing window, while another two coordinates will specify the diagonal of the viewport. Once the window and the viewport are specified, the transformer will perform the operations mentioned above. Once the polygon has been transformed, the clipper will clip the part of the polygon outside the viewing window and will then map the clipped polygon into the viewport window. Hence, the final output coordinates of the polygon will be the result of the three transformation operations, a clipping operation to fit the polygon into the clipping window and finally a mapping operation to map the polygon into the viewport window. The final coordinates sent as output will be relative to the viewport's local coordinate system. References: 1. Hearn, Donald - Computer graphics, C version (2nd Ed.), Upper Saddle River, N.J. : Prentice Hall, 1997. 2. Mortenson, Michael E. - Geometric modeling, 2nd ed., New York : Wiley, c1997. 3. Computer graphics : principles and practice, 2nd ed. in C, Reading, Mass. : Addison-Wesley, 1996.
Title: A hardware implementation for the DES algorithm. Networking and secure distributed systems are major research areas at the Digital Equipment Corporation's System Research Center (SRC). The Data Encryption Standard (DES) was issued by the National Bureau of Standards (NBS) in 1977 and was adopted by th e U.S. government. The DES is essentially an algorithm that uses permutations, rotations and substitutions in order to encipher 64-bit data blocks using a 56-bit secret key. Plaintext can easily be recovered from the cipher once the secret key is available. Cryptographic m odules may implement this standard in software, firmware or hardware or any combination thereof. The work discussed here is a hardware implementation of the DES on FPGA chips, using Altera's products. VHDL language will be used to describe the design while the MAXPLUSII CAD software will be used to synthesize it. However, due to time constraints, t he complexity of the algorithm has been scaled-down by reducing the size of data blocks to 16-bits. This means that the key's length and the number of iterations needed shall be modified accordingly. References: 1. Federal Information Processing Standards Publication, "Publication 46-2", December 1993. 2. Jennifer Seberry and Josed Pieprzyk, "Cryptography: An Introduction to Computer Security.", Prentice-Hall, 1989. 3. Man Young Rhee, "Cryptography and Secure Data Communications.", McGraw-Hill, 1994. 4. R. Rivest, A. Shamir, and L. M. Adleman, "Cryptographic Communications System and Method", US Patent 4,405,829, 1983. 5. "The RSA Frequently Asked Questions document", RSA Data Security Inc., 1995. 6. Mark Gibbs, "Guide To Networking", second edition, SAMS 1995.
Back to top

Winter 2000


Title: Electrical Storm Locator The intended project is to build an electrical storm locator controller, which will use electromagnetic detectors and a microphone to measure the distance, intensity and direction of an electrical storm system. Using loop antennae to detect the presence of a bolt of lightning, the sources direction can be determined. Once a bolt of lightning has been detected a delay time between its detection and the perception of its associated thunder clap can be used to calculated the distance between the location of the lightning and the detector. The intensity of each bolt of lightning can be measured by comparing the delay time to the intensity of the magnetic flux since the magnetic flux dissipates in strength as it propagates away from its source. Also, the intensity of the electrical system can be calculated by counting the number of bolts of lightning occurring within a given space of time. Attached is one file: abstract.dpf diagram which could not be included in ASCII file.
Title: VHDL Analysis of the Effects of Associativity on Cache Performance The goal of this project is to illustrate the effects of associativity on cache performance, via miss rate and access time measurements. This goal will be accomplished by implementing two independent data caches, one direct-mapped and the other four-way associative. Aside from their degree of associativity, both caches will have the exact same policies. The writing policy will be write through with no-write allocation and the replacement strategy will be LRU(least recently used). The caches will be fed "virtual" signals in order to simulate the CPU's behavior. The measurements will be taken from simulation results and registered performance analysis. References: 1. Jim Handy, The Cache Memory Book, Academic Press. 2. Patterson, Hennessy, Computer Organization And Design, The Hardware/Software Interface, Morgan Kaufmann. 3. Patterson, Hennessy, Computer Architecture, A Quantitative Approach, Morgan Kaufmann. 4. David Loshin, Efficient Memory Programming, McGraw Hill
Title: A Simple Low Cost VHDL implementation of Data Encryption Standard (DES) Abstract: The Data Encryption Standard (DES) was selected by the National Bureau of Standards (NBS) in 1977 as the standard for modern data computing encryption. It is a mathematical algorithm commonly used cryptographic protection of modern computing data. A '64 bit key' is used to encrypt 64 bit binary coded data. The whole idea of encryption lies on this '64 bit key', since this unique set of 64 bit key will generate a unique set of 64 bit cipher text from 64 bit of plain text. Since only 56 bit of the 64 bit key is used in the encryption process, there are over 70,000,000,000,000,000 possible combinations to find a specific key to decrypt the data. This paper describes the VHDL implementation of the Data Encryption Standard on the Altera FLEX10K series FPGA. The overview of the design process includes initial permutation module, product transformation module, and inverse initial permutation module. When the 64 input data bits (plain text) are passed through the modules described, 64 output data bits (Cipher text) are formed. References: 1. Denning, Dorothy E., "Cryptography and data security", Addison-Wesley Publishing Company, Inc., 1982. 2. Harry Katzan Jr., "The Standard Data Encryption Algorithm", Petrocelli Books, Inc. 1977. 3. Jan C. A. Van Der Lubbe, "Basic Methods of Cryptography", Cambridge University Press, 1994. 4. Ingrid Verbauwhede, Frank Hoornaert, Joos Vandewalle, "Security and Performance Optimization of a New DES Data Encryption Chip", IEEE Journal of Solid-State Circuits, Vol.23, No. 3, June 1988.
Title: H100 Timeslot Interchange This project implements the interchanging of constant bit rate (CBR) data bytes on an H100 TDM bus. The switching of timeslot/streams (TSSTs) is programmable by a Motorolla CPU interface module within the timeslot interchange (TSI), used to access an internal memory which keeps the configuration/routing of each TSST. Each TSST can either be routed to another TSST or simply left untouched. For testing purposes, any TSST exiting the TSI can be programmed to force a data byte onto the H100 bus. The outgoing TSST forced to this value does not have to be programmed to route the data of an incoming TSST. The TSI will act as an H100 slave on the bus, not driving any clocks. The TSI will be capable of switching from the master clock to the slave clock if an error is detected on the master clock. This functionality is desired by the H100 specification. The module will use the following H100 bus signals for operation: CT_C8_A, CT_C8_B, CT_FRAME_A, CT_FRAME_B, and the data bus CT_D[31::0]. The remaining clock signals described in the H100 specification are used for backwards-compatibility and are not necessary for the operation of the TSI. The architecture and all sub-modules in the TSI are modeled and simulated in VHDL. An attempted to compile the TSI into an Altera device, of any size, will be made. However given the scope of this project, this last goal could be difficult to meet. References: 1. Enterprise Computer Telephony Forum, H.100 Revision 1.0, Hardware Compatibility Specification: CT Bus
Title: Carrier Sense Multiple Access with Collision Detection Abstract: The purpose of our project is to design and test a media access control device based on the Carrier Sense Multiple Access with Collision Detection or CSMA/CD. The latter is an improvement of the CSMA protocol which permits halting transmission as soon as a collision is detected during the communication of data between stations. This protocol with collision detection is the most popular MAC protocol to-date. Ethernet is an example for a CSMA/CD network. The Ethernet system includes a mechanism that detects when a collision occurs (collision detect). The system also includes "listen before talk," in which stations listen for activity (carrier sense) before transmitting, and supports access to a shared channel by multiple stations (multiple access). Putting all these components together, one can see why the Ethernet channel access protocol is called Carrier Sense Multiple Access with Collision Detect (CSMA/CD). Moreover, Ethernet is a baseband system that uses a Manchester encoding of high and low voltages to place bits on a wire pair. In our design, the data to be transmitted will be encoded using Manchester encoding : each bit time contains a transition in the middle: a transition from low to high represents a 0 bit and a transition from high to low represents a 1 bit. In conclusion, we will design and test the CSMA/CD protocol and we will discuss its advantages and disadvantages in the interconnection Networks field. The architecture and all sub-components in the CSMA/CD protocol will be modeled and simulated inVHDL using the Altera Max Plus II software. References: 1. Ethernet and CSMA/CD: This article describes how and why CSMA/CD works on an Ethernet network. Updated on Aug 5, 1998 - www.pcwebopedia.com/TERM/C/ Carrier_Sense_Multiple_Access_Collision_Detection html. 2. Charles Spurgeon's Ethernet Web Site - www.ots.utexas.edu/ethernet/. 3. David A. Patterson/ Jhon L. Hennessy, Computer Architecture: A quantitative approach, 2nd Edition - Chapter 7. Interconnection Networks. 4. "Carrier Sense Multiple Access with Collision Detection (CSMA/CD)," IEEE Std 802.3-1990 Edition (ISO/DIS 8802-3), IEEE, New York (1990). 5. Ethernet Applet - www.seas.upenn.edu/~ross/applets/daniel_brushteyn/enet.html.
Title: Design of a PCI bus interface for FPGAs Abstract: The Peripheral Component interconnect (PCI) was developed in 1993 by Intel and now is overseen by a Special Interest Group. Its main purpose is to provide a high band width to devices like monitors, video cards, sound cards, USB interfaces etc. The PCI bus allows large data transactions directly with the system memory and peripheral devices. The purpose of this project is to implement a PCI bus interface in VHDL for generic add-in cards, with the inclusion of the JTAG pins. The design of the PCI interface has implemented in the previous years, but we aim to offer an on-board testing mechanism for FPGAs. This design would include a master, target and controllers for each. The basic cycle simulations would include, master read/write, target read/write, system error and parity check. References: 1) PCI Local Bus Specification, revision 2.1, PCI Special Interest Group
Title: Error Detection Design using Residue Arithmatic Method for Multiplication High speed processing architectures are widely used in many computing applications. As the VLSI circuitry gets smaller, the amounts of multiply operations per second increases rapidly and error check and detection process becomes more and more important. The error detection circuitry is to assure that each bit is multiplied correctly. In this project, the speed and hardware will be considered as two main factors to optimise. To achieve maximum speed with minimum hardware, the residue arithmetic codes will be implemented to verify the result of multiplication through the error detection unit. Two types of error checking will be implemented in VHDL: top-level and bit-level. In terms of top-level, error detection is performed at the end, where the residues are compared with the final result. While bit-level error detection is performed for every serial bit shifted into the multiplier, so as to detect the error immediately. The operating frequency and the resource usage will also be investigated in this report. Low cost and high speed is essential for this error detection unit to be practical and economical. References: 1 Thomas J. Brosnan and Noel R. Strader II,"Modular Error Detection for Bit-Serial Multiplication," IEEE Trans. Comput., Vol. 37 pp. 1043-1052, Sep. 1988. 2 T.R.N. Rao, "Error Coding for Arithmetic Processors." New York Academic, 1974.
Title: A Hardware implementation of Carrier Sense Multiple Access with Collision Detection (CSMA/CD) The purpose of this project is to design a network card based on the CSMA/CD protocol. A network card/adapter is a media access control device for a computer or terminal,it is a very essential component of a computer network. Sharing a common transmission medium between multiple stations is one of the advantage of CSMA/CD. Each terminal will listen to the medium, then transmit the data packet in serial form when the medium is quiet. Although this method is simple and straight forward, the only problem is transmission collision. It occurs when two stations send out data packet at the same time. The stations involved will 'backoff' for a period of time before re-transmission. Therefore, collision detection is very important during transmission. Since we are implementing a media control device, we assume the header information, data and some control signals are provided by other devices. At the other end, the receiver will pass the data to a upper level components. Due to the time constrain, the IEEE Std 802.3 will be reduced in complexity. The basic format of the data packet include synchronization preamble,destination address, destination address and data body. The 'type' field and 'CRC' field will be removed. Cyclic Redundancy Check (CRC) is very useful in error detection, but it will significantly increase the complexity of this project. Moreover, the data is encoded using Manchester Encoding, which will allow synchronization between sender and receiver. References: 1. Larry L. Peterson and Bruce S. Davie, 'Computer Networks: A system approach', 2nd ed, Morgan Kaufmann, 2000 2. Gilbert Held,'Ethernet Networks: Designs, Implementation, Operation and Management', John Wiley & Sons Canada,1996
Title: FAST FPGA IMPLEMENTATION OF THE ATM LAYER CELL TRANSPORT FUNCTIONS Abstract: Asynchronous transfer mode (ATM) is a popular and extremely important switching technique. Its small cell size and scaleability allow for the transmission of isochronous and time-insensitive data from a wide range of services (such as text, speech, or video) over the same network. FPGA's are often used for the development of rapid prototypes of hardware implementing ATM. However, the FPGA rapid prototypes are often much slower than the ASIC designs that they are used to validate. The aim of this project is to implement the ATM protocol's cell transport functions using FPGA's while employing a deep-pipeline design methodology that would permit the FPGA circuit to function at high operating frequencies while minimizing the resulting latency. References: [1] D.E. McDysan, D.L. Spohn, ATM: Theory and Application, McGraw-Hill Inc. 1994 [2] M.P. Clark, ATM Networks: Prnciples and Use, John Wiley and Sons, Ltd. 1996 (Additional references may be added during the course of this project.)
Title: A VHDL Implementation of Differential Pulse Code Modulation (DPCM) Algorithm. In our present society, the telecommunication field has changed our lives considerably. Where would we be without: Internet, Networking, Cellular phones etc. This project consists of implementing the Differential Pulse Code Modulation (DPCM) Algorithm in telecommunication theory. On a side note, PCM (Pulse Code Modulation) is a sampling technique for digitalizing analog signals, especially audio signals (speech coding). The advantage of the DPCM over the PCM is that it predicts the next sample based on the last few decoded samples. If the predictions are effective, then the error signal will be minimized, allowing a more accurate representation of the analog signal (speech) in digital format. The error signal between the predicted samples and the actual samples will be smaller than the error signal between the actual samples and the original signal. Therefore, we are able to quantize the error signal using fewer bits than the original signal. This is the basis of the Differential Pulse Code Modulation (DPCM) algorithm; to quantize the difference between the original and predicted signals. The DPCM is an algorithm that uses, quantizers, predictors, DPCM encoders and DPCM decoders in order to convert an analog signal into a digital signal. The analog signal is sampled and the difference between the actual sample value and the predicted value is encoded to form a brand new digital value. References 1.Simon Haykin , "Communication Systems", John Wiley & Sons Inc., 1994 2. J. G. Proakis and D. G. Manolakis, "Digital Signal Processing", Prentice-Hall, 1996 3. Majid Rabbani and Paul W.Jones, "Digit Image Compression Techniques" www.rasip.fer.hr/research/compress/algorithms/index.html
Back to top

Winter 2001


Title: Substitution Ciphers - new and old Abstract: Data security and cryptography are critical fields of research in today's information technology world. As the world becomes increasingly interconnected, the need for privacy and access control is also even more emphasised. To illustrate the concepts of cryptography, a cipher can be described as a method of transforming a message in order to conceal its meaning from unauthorized parties. As a major objective of this project, we will be concentrating on the implementation and comparison of the most popular subsitution ciphers. Some of the ciphers we will consider are caeser's cipher, K shift cipher, and Monoalphabetic substitution. These ciphers are used frequently and although easy to implement, are quite useful in understanding the underlying concepts of cryptography. As a final objective of this project, we will devise and implement a new substitution cipher. The cipher will be implemented and extensively tested using VHDL, and we will endeavour to improve on the standards in use today. References: - Meyer, Carl H. "Cryptography : a new dimension in computer data security : a guide for the design and implementation of secure systems": Wiley, 1982. - A. G. Konheirm, "Cryptography: A Primer" John Wiley & Sons Inc., 1981. - Denning, Dorothy Elizabeth Robling: "Cryptography and data security" : Addison-Wesley, c1982. - Cryptography: www.cs.wpi.edu/~cs513/f99cew/week12-crypt/week12-crypt.html
Title: Develop a high speed FPGA based fuzzy logic inference engine Abstract: Fuzzy logic systems are becoming much more widespread because of their ability to simply model complex systems, where the relationship between the inputs and the outputs is not well defined. These systems can be implemented using software and a general-purpose computer or a fuzzy processor. However, these implementations are general and do not always offer the best solution for real-time, embedded systems, and other applications. This project will attempt to implement a fuzzy logic inference engine. The design will utilize a pipelined approach and will allow several identical engines to be operated in parallel to increase performance. Since practical systems use external memory modules to hold the rule parameters and membership values, this design will take into account the need for sharing these resources. References: 1. Aranguren, G.; Barron, M.; Arroyabe, J.L.; Garcia-Carreira, G. "A pipe-line fuzzy controller in FPGA." Fuzzy Systems, 1997., Proceedings of the Sixth IEEE International Conference on, Volume: 2 , 1997 Page(s): 635 -640 vol.2. 2. Bin Qiu; Pak L Woon. "The hardware implementation of a generic fuzzy rule processor." Signal Processing Proceedings, 1998. ICSP '98. 1998 Fourth International Conference on , Volume: 2 , 1998. Page(s): 1343 -1346 vol.2. 3. Hung, D.; Zajac, W. "Implementing a fuzzy inference engine using FPGA." ASIC Conference and Exhibit, 1993. Proceedings., Sixth Annual IEEE International , Sept. 1993 Page(s): 349 -352. 4. Masmoudi, N.; Hachicha, M.; Kamoun, L. "Hardware design of programmable fuzzy controller on FPGA." Fuzzy Systems Conference Proceedings, 1999. FUZZ-IEEE '99. 1999 IEEE International , Volume: 3 , 1999 Page(s): 1675 -1679 vol.3. 5. Parris, C.P.; Haggard, R.L. "An architecture for a high speed fuzzy logic inference engine in FPGAs." System Theory, 1997., Proceedings of the Twenty-Ninth Southeastern Symposium on , 1997. Page(s): 179 -182. 6. Ungering, A.P.; Goser, K. "Architecture of a 64-bit fuzzy inference processor." Fuzzy Systems, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the Third IEEE Conference on , 1994 Page(s): 1776 -1780 vol.3.
Title: A VHDL Implementation of a User-Monitored Diabetes Insulin Pump Abstract: As Diabetes becomes a more pronounced and widespread ailment in today's society, the use of more accurate, sophisticated medical technology related to insulin injection and dosage becomes a necessity. This advanced technology, among others, takes the form of automated insulin injectors: battery-operated pumps whose functioning is regulated not only by the user, but also by a programmable computer chip. The chip regulating an insulin pump is responsible for managing two insulin dosage rates, namely the basal rate, which injects minute quantities of insulin on a regular frequency, and the bolus rate, the user-controlled insulin dosage. In our case this chip, the intelligence behind the pump apparatus, sends instructional signals to the reservoir and three pumps, while accurately keeping track of time through a real-time clock. The pumps are placed at different locations on the patient's body, in order to ensure adequate distribution. The chip's inputs are as follows: Initial inputting of a user's physical parameters (such as weight and age) Setting of the basal rate of insulin dosage (done once and updated infrequently) Setting of the bolus rate, dosage request undertaken after meals or when needed from the following two dosage options: 1.one-time: user specifies the exact insulin dosage to be injected 2.preset: user chooses the dosage >from a list of predefined quantities Inputs allowing the user to define presets The outputs are the quantities of insulin to be injected by each pump at a given instant. The VHDL implementation of this chip comprises a structural architecture of the real-time clock, as well as behavioural code describing the chip's control module, with structural code linking the two. References: 1.Davidson, Mayer B. "Diabetes mellitus : diagnosis and treatment ", Saunders, 1998 2.Watkins, Peter J., Diabetes and its management, Blackwell Science, 1996. 3.Canadian Diabetes Association, www.diabetes.ca/ 4.MiniMed, Inc., www.minimed.com/
Title: An FPGA Implementation of a MP3 Decoder Abstract: MPEG Layer-3, otherwise known as MP3, has generated a phenomenal interest among Internet users, or at least among those who want to download highly compressed digital audio files at near-CD quality. MP3 is a standard for transmitting and recording compressed audio. The MP3 algorithm achieves compression by exploiting the perpetual limitations of the human ear. The standard defines the decoding process and also the syntax of the coded bitstream. The purpose of this project is to design a high performance FPGA implementation of an MP3 audio decoder using VHDL. Such a device would have a widespread use in upcoming technologies such as portable MP3 audio players and handheld computers. References: 1. S. Hacker, "MP3: The Definitive Guide", 1st Edition, O'Reilly, 2000. 2. P. J. Ashenden, "The Designer's Guide to VHDL", Morgan Kaufmann, 1996. 3. M. Robertson, "Top Ten Things Everyone Should Know About MP3", MP3.com Inc., www.mp3.com/news/070.html?lang=eng, 1998. 4. "MPEG Audio Layer-3", Fraunhofer Institut, www.iis.fhg.de/amm/techinf/layer3/index.html, 1998.
Title: Dynamic Instruction Scheduling Of A DLX Machine Using Tomasulo Algorithm During past years, microprocessor speed has been doubled every 18 months. For designers, one way to archieve this performance gain is to improve the processor's architecture. Tomasulo's algorithm could be found in most modern processors and it allows rescheduling of instruction in order to minimize hazards and stalls during the execution. There are mainly two goals in this project, the first one is to achieve correctness of the out of order execution, whereas the second is to optimize the performance of the CPU by reducing hazards and stalls. The unit will contain a hazard detection module to handle structual hazards, control hazards, WAW , WAR and RAW. A specialized register file and register control module will be used to implement register renaming that eliminates name dependences. Reservation stations will store dispatched instructions with their tag, data or data pointer. They will be issue to an execution unit once they are ready. Since the implementation of execution units (adders, multiplier, memory,etc.) is out of the scope of this project, test vectors for the common data bus will be used to simulate executions of the instructions. References: 1 Hennessy & Patterson, "Computer Architecture: A Quantitative Approach," 2nd edition, Morgan Kaufmann, 1996. 2 Daniel Krning, "Design and Evaluation of a RISC Processor with a Tomasulo Scheduler", www-wjp.cs.uni-sb.de/~kroening/tomasulo/diplom/
Title: The Blowfish Encryption Algorithm "Blowfish" is a symmetric block cipher designed in 1993 by Bruce Scheiner as a fast and free alternative to existing encryption algorithms such as DES or IDEA. Unlike DES, whose 56-bit key length is vulnerable to brute-force attacks, the Blowfish key length is variable from 32 bits to 448 bits (and as such can be exported per U.S. export laws). Blowfish is unpatended and license-free for all, and the source code is available in numerous programming languages. As with DES, a data stream is encrypted using a known "public" key. This data stream can then only be decrypted using a protected "private" key. Blowfish has many improvements over DES that make it faster and easier to code. In this project we will implement the Blowfish algorithm in VHDL. Due to time constraints and limited space in the FPGA, small keys will have to be used. References: 1. B. Schneier, "Fast Software Encryption", Cambridge Security Workshop Proceedings (December 1993), Springer-Verlag, 1994, pp. 191-204 2. B. Schneier, "The Blowfish Encryption Algorithm -- One Year Later", Dr. Dobb's Journal, September 1995 3. B. Schneier, "Applied Cryptography, Second Edition", John Wiley & Sons, 1996
Title: Fully Pipelined RISC Microcontroller CPU Core in VHDL Abstract: The PIC12C508 is a popular microcontroller that is fully programmable so that it can adapt to a specific application. It is small enough to be fitted into a FPGA and is hence very cost effective. It can effectively be used as part of a system on chip. Our goal is to implement the functionality of the PIC12C508 microcontroller and to meet the suggested operating speed of 4MHz. This microcontroller is originally targeted towards an ASIC, but we will raise the challenge to obtain the same performance using a cost effective FPGA. The PIC12C508 is an 8-bit, fully static, EEPROM/EPROM/ROM-based CMOS microcontroller. It has a RISC architecture using 33 single word, single cycle instructions. This family of microcontrollers delivers a higher performance than the rest of its competitors in the same price range. Additionally, it reduces the client development time due to the simplicity of its instruction set. This microcontroller allows the customization of applications ranging from personal care appliances to low-power remote transmitters-receivers. Its main characteristics are low cost, low power, high performance, ease of use and I/O flexibility. References: "Microchip Technology Inc.: PIC12C508 Device" www.microchip.com/10/lit/pline/picmicro/families/12c5xx/devices/ 12c508/index.htm "PIC12C5XX, 8-Pin, 8-Bit CMOS Microcontrollers" PIC12C508 Datasheet 40139e.PDF Microchip Technology 3/4/1999
Title: A circuit that interprets the results of ECG data to determine patient risk of arrhythmia. Abstract: The electrocardiogram (ECG) is a non-invasive, routine test used by cardiologists to record the electrical activity of the heart. Abnormalities in this electrical signature can indicate a variety of heart problems. In a normal ECG, 12 leads are attached to a patient's chest and back and the electric signal of the patient's heart rate is recorded as a continuous wave (see Fig1.pdf(3)). A physician visually interprets the features of the wave. This qualitative interpretation can miss important features. Our project will be to develop a digital circuit that monitors and records the heart rate but also determines, via QT dispersion, whether or not there is a risk of arrhythmia. The concept of determination is that a heart waveform is divided into named regions (see Fig2.pdf(3)). The distance between the Q point ant the T point is an indication of the speed of ventricular recovery time after each heartbeat. This distance is measured independently by each of the 12 leads attached to the patient. The QT dispersion is the difference between the maximum QT interval and the minimum QT interval. It indicates the level of homogeneity in with regards to ventricular recovery time. QT dispersion is a proven an indicator of those at risk of sudden death due to arrhythmia. (Fig3.pdf(3) shows the ECG of a patient with a long QT interval.) References: 1. Zaret, Barry L., Marvin Moser, Laurence S. Cohen eds. "Yale Heart Book" New York:Hearst Books:1992. 2. Pye M., A.C Quinn, S.M Cobbe. "QT interval dispersion: a non-invasive marker of susceptibility to arrhythmia in patients with sustained ventricular arrhythmias." British Heart Journal 1994; 71:511-514. 3. 12-lead ECG library (online): homepages.enterprise.net/djenkins/ecghome.html 4. ECG Learning Center (University of Utah School of Medicine, Salt Lake City): www-medlib.med.utah.edu/kw/ecg/
Title: Blowfish encryption algorithm Abstract: This article discusses the design and implementation of the Blowfish encryption algorithm. Blowfish is a symmetric block cipher with the capability of taking a variable-length key from 32bits to 448bits. Unpatented and license-free, Blowfish was designed by Bruce Schneier in 1993 as a fast, free alternative to existing encryption algorithms, making it an ideal encryption system for both domestic and exportable use. Index Terms: AES, NIST, DES, block cipher, cryptography, RSA Reference: www.counterpane.com/bfsverlag.html www.rsasecurity.com/# csrc.nist.gov/encryption/aes/rijndael/ *NEURO/FUZZY COMPUTING*
Title: Motion Detection using VLSI Neural Arhitecture Abstract: The project implEments a motion detector using a Neural network strategy. The goal is build a Multilayer Perceptron: one input layer, one hidden, and one output layer, and predict the motion of objects. The Neural Net algorithm will use the technique of updating the weights. Neurons will be built which will be provided some information about the environment. Based on the information received the neural net will display a result. We will respond back to the network telling it whether or not the answer was correct. Depending on the result produced, an error will be caculated which will simply be the difference between the target output and the output produced by the neural net. The error will then be used to update the weights between the input, hidden and output layers. Also to keep things simple, the neural net will be a fully connected feed forward. The output will be looked on at a pixel to pixel basis. The input will contain about 32 neurons to interpret a 256 X 256 pixels image. At every clock cycle we interpret 16 bits, arranged as a 4 X 4 matrix. The final solution will be a 64 X 64 pixel image. Each neuron will contain the following: - a ROM memory, where net weights are stored; - a sum device and a storage register for the pre-synaptic activation storage; - a circuit for the activation function coding Testing will be done using simple objects. There will be a training set to train the neural net, and there will be a testing set to test the network. All simulations will be done in VHDL References: [1] www.csai.unipa.it/people/alumni/condemi/vhdlweb/neuralchip2/ AACHEN_95_new.html>www.csai.unipa.it/people/alumni/condemi/ vhdlweb/neuralchip2/AACHEN_95_new.html and 18 other refs!
Title: Elliptic Curve Scalar Multiplier for Use in Cryptosystems Abstract: Elliptic curve cryptosystems are emerging as an alternative to RSA encryption and other public-key cryptosystems because of their small key sizes. This is critical to some applications, including smartcards and handheld devices. In such a cryptosystem, the basic building block for secret key exchange, authentification and certification is the elliptic curve scalar multiplier. The set of points which satisfies an elliptic curve equation have some useful properties. In particular, the addition of two points on the curve gives another point on the curve. This is not addition in the usual sense; it is a set of calculations involving the coordinates of the points and calculated in the finite field over which the curve is described. An addition of two elliptic curve points consist of a series of underlying field additions, squarings, multiplications and inversions. Elliptic curve scalar multiplication involves a series of additions and subtractions of that kind. Our goal is to write a VHDL description of a scalar multiplier based on the paper in reference [1]. The elliptic curve used in this paper is defined as y^2 + xy = x^3 + ax^2 + b. References [1] Elliptic Curve Scalar Multiplier Design Using FPGAs, Lijun Gao, Sarvesh Shrivastava and Gerald E. Sobelman, Proceedings, First International Workshop on Cryptographic Hardware and Embedded Systems, Lecture Notes in Computer Science, Vol. 1717, Springer, 1999. [2] Elliptic Curve Public Key Cryptosystems - An Introduction, Erik De Win and Bart Preneel, ftp://ftp.esat.kuleuven.ac.be/pub/cosic/dewin/coursetext.ps.gz [3] Efficient Algorithms for Elliptic Curve Cryptosystems, Jorge Guajardo, www.ece.wpi.edu/Research/crypt/theses/documents/ms_guajardo.pdf [4] A Practical Implementation of Elliptic Curve Cryptosystems over GF(p) on a 16-bit Microcomputer, Toshio Hasegawa, Junko Nakajima and Mitsuru Matsui, www.security.ece.orst.edu/documents/Elliptic/main.pdf
Title: Obstacle-Avoiding Navigational Control Unit Abstract: This project implements a wheelchair navigational control unit in VHDL to provide motion in the direction opposed to any obstacles. Obstacle Avoidance algorithms, namely the vector field histogram (VFH) and the vector force field (VFF) methods, will be employed to allow the chair to move around any barricades. A virtual square grid of a specific area set by the range of the ultrasonic sensors is established around the wheelchair. The grid is subdivided into cells. As the chair moves in the direction of an obstacle, the cells are incremented with certainty values indicating the probability of an obstacle in that cell. The unit processes the cell values and computes a direction vector that not only is close to the users intended direction but also avoids all nearby obstacles. The navigational control unit is an interesting design project because it has direct practical applications to aid persons with mental and physical disabilities. The challenge of this project will be to implement the complex mathematical computations of the VFH and VFF algorithms and coordinate the different elements in the navigational unit to provide real-time directional changes. The testing of the unit is performed using a matrix representing an environment scenario with obstacles. The success of the project is proven by the unit's ability to provide a suitable direction vector that avoids any obstacles in the path of motion. Index Terms: Obstacle Avoidance, Vector Force Field, Vector Field Histogram, Certainty Value. References: [1] D.Bell, S. Levine, Y. Koren, L. Jaros, R. Simpson, and J. Borenstein, "The NavChair Assistive Wheelchair Navigation System" in IEEE Trans. Rehab. Eng. Vol. 4, pp. 443-451, Dec. 1999. [2] Y. Koren and J. Borenstein, "The Vector Field Histogram - Fast Obstacle Avoidance for Mobile Robots" in IEEE Journal of Robotics and Automation, Vol. 7, No.3, pp. 278-288, June 1999.
Title: Token Ring Network Card The purpose of this project is to design a media access control device based on the Token Ring Network protcol. The Token Ring media access method allows multiple stations (nodes) to share a common bus transmission medium(ie a network). The nodes are connected from one to another to form a ring and a specific pattern of data - a token - is used to decide which node can use the ring at one time. The token is a unique series of bits or frames that continuously circulate on the ring, passing from one station to another. In order to transmit data from one node to another, the sender must first seize the token. Therefore only one node may transmit at any one time. The medium can be n actual ring or a hub based system. Tokens and data will be transmited in packets on the medium to be read by other stations. Both the token and the data will require manchester encoding that will allow for synchronized communication. This project will take an existing design and optimise it's speed,minimize the hardware requirements, and modify the design such that it supports variable length data packets. References: 1) Page created byProf. Bernd-Peter Paris -thalia.spec.gmu.edu/~pparis/classes/notes_101/node78.html 2)Other LAN Technologies -home.ubalt.edu/abento/650/otherlan/sld001.htm 3)Cisco Systems Inc. -www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/tokenrng.htm
Title: ATM - PHY Interface UTOPIA Level 2, Version 1.0 compliant 8-bit Interface, Multi-PHY, Single ATM Abstract: This project will implement one variation of the UTOPIA level 2 protocol. The UTOPIA protocol defines the interface between an ATM network and the physical layer. The interface cores that we will design will support an ATM (or link) interface speed of 25 Mhz. Since the speed of the PHY layer is PHY dependant, and can vary, the data received on the PHY layer will have to go through a rate adaption FIFO so that it is ready for the physical layer. The ATM layer will be represented by a "master" core that has transmit and receive capabilities. The PHY layer will be represented by a "slave" core that will also have transmit and receive capabilities. Up to 31 slave may be connected to one master. The master controls the data bus in both directions. The cores will be optimized and tested for Altera Flex10K devices. Cores of this type are used in industry indside phone switching networks. The protocol is also becoming a popular data transmission standard for data transfer between chips, boards, and networks. References: The ATM Forum, UTOPIA Level 2, Version 1.0 Specification ftp://ftp.atmforum.com/pub/approved-specs/af-phy-0039.000.pdf The ATM Forum, UTOPIA Level 1, Version 2.01 Specification ftp://ftp.atmforum.com/pub/approved-specs/af-phy-0017.000.pdf
Title: A VHDL Implementation of Triple DES Encryption Abstract: In the early 1970s, due to a significant rise in computer activity, it was decided that a strong cryptographic algorithm was needed to protect non-classified information. The algorithm needed to be cheap, widely available and very secure. These requirements were satisfied in 1977 with the release of DES, the Data Encryption Standard. DES operates on 64 bit data, including 56-bit key with 8-bit parity, input data and initial vector. This algorithm has been in use now for over 25 years but with a key size of only 56 bits, it is believed to be vulnerable to attack. A minor variation of this standard and a step above it, is the Triple DES which enjoys much wider use since it is not so easy to break. Like DES, data is encrypted and decrypted in 64-bit chunks. It takes a 192-bit key (24 characters) as input however and breaks it into three keys. In fact, the Triple DES is simply three rounds of DES. This increases security but at the cost of increased run time. Each round uses a different permutation of your plaintext word. The data is encrypted in the first key, decrypted with the second key and finally encrypted again with the third key. Presently, Triple DES is an excellent and reliable choice for the security needs of highly sensitive information. The purpose of this project is to design and implement a hardware version of the Triple DES algorithm for the Altera Flex10K series FPGA. This will include a detailed description of the key features mentioned above which make use of both permutation and substitution methods. The various modules will be developed using the VHDL language. References: 1) Aho, Hopcroft & Ullman. "The Design and Analysis of Computer Algorithms", Addison-Wesley, 1974. 2) Harry Katzan Jr., "The Standard Data Encryption Algorithm", Petrocelli Books, Inc. 1977. 3) Jan C. A. Van Der Lubbe, "Basic Methods of Cryptography", Cambridge University Press, 1994. 4) Schneier, B. "Applied Cryptography" JohnWiley & Sons, 1996.
Title: A VHDL implementation of the Tiny Encryption Algorithm (TEA). Abstract: Security nowadays has become a very important issue. When we evoke the term security in the computer world, we think immediately about viruses and ways to protect ourselves from them, the passwords we need for so many things and so forth. But we forget that in our modern world we also need to protect our data from unscrupulous eyes. For that reason encryption exists so that we can "hide" some of our data. In our project, we will implement the TEA, The Tiny Encryption Algorithm developed by David Wheeler and Roger Needham at the Computer Laboratory of Cambridge University. The Tiny Encryption Algorithm is one of the simplest, fastest and most efficient cryptographic algorithms in existence because it uses a round function with simple operations such as addition with exclusive or and shifting. It encrypts 64 data bits at a time using a 128-bit key. Also, the algorithm is optimised for machines with 32-bit words. For our project, we will implement the TEA in VHDL so that we will be able to encrypt and decrypt a message. Included is a pdf file describing the C code of the TEA and a simple diagram describing our VHDL design. References: 1. David A. G. Gillies, "The Tiny Encryption Algorithm (TEA)", vader.brad.ac.uk/tea/tea.shtml 2. Computer Security Group, Computer Laboratory, University of Cambridge, "Cryptographic Algorithms", www.cl.cam.ac.uk/Research/Security/studies/st-alg.html 3. David A. G. Gillies, "TEA Source Code", vader.brad.ac.uk/tea/source.shtml
Title: Distributed Station Network Router - Using a Simplified IEEE Std. 802.5 Packet Protocol and Topology Abstract: IBM originally developed the Token Ring network in the 1970s. It is generally a primary local area network (LAN) technology second only to Ethernet/IEEE 802.3 in LAN popularity. In this project, we will implement a network router following the Token Ring Protocol, in the form of a multistation access unit (MSAU). This unit will perform the scheduling and routing of packets through the use of carrier detection techniques, packet error detection, as well as address polling and address recognition. The simplified data packet will consist of both a frame and control field in addition to the usual data and address fields. A starting and ending delimiter will be appended to the data packet in order to facilitate error detection. A priority scheme will be implemented using a priority field and reservation bit, supplemental to the actual circulation of an available or reserved token. Other differences between the original 802.5 protocol and our simplified design will only extend to the form of the token and data packets. Filed sizes will be determined in the preliminary phases of the project. References: Alberto Leon-Garcia, Indra Widjaja "Communication Networks Fundamental Concepts and Key Architectures", McGraw-Hill 1999 Dimitri Bertsekas, Robert Gallager, "Data Networks", (2nd edition), Prentice Hall, 1992, ISBN 0-13-200916-1. Andrew S. Tanembaum, "Computer Networks" (3rd edition), Prentice Hall, 1996. ISBN 0-13-349945-6
Title: Hardware Implementation of Convolutional Encoder and Viterbi Decoder Abstract: A primary objective of any digital communication system is to transmit information at the highest possible rate. More importantly, the information should be received with the minimum distortion of errors. At the transmitter, data is encoded to protect them up aginst channel noise or other forms of distortion. The encoder adds redundant bits to the original stream of data. The additional redunancy allows the encoded data to be decoded at the receiver by providing clues to decipher this code word. Essentially, the encoder maps the data stream to a code word and the decoder must map the code word back into the original message. The goal of this project is to implement a convolutional encoder (k = 2, n = 5) and the Viterbi decoder which are found in second-generation as well as third generation wireless communication systems. References: 1. Simon Haykin , "Communication Systems", John Wiley & Sons Inc., 1994 2. G. David Forney, "The Viterbi Algorithm", Proceedings of the IEEE, March 1973
Title: Digital Security controller with video display Our project implements a digital security controller which can be used in many environments such as a door, a safety deposit box or a locker. To improve the liability of The controller, a simple encryption method is used. All data such as user id and passwords are encrypted and can be decrypted internally. A storage mechanism is built internally which allows to store up to 10 users' data. Upon successful login of Administrator, a simple display unit is also implemented which allows to connect to a monitor or simple display device. The purpose of the display unit is to display necessary information such as current user id and password. More features are shown below: Criteria: - Administrator Controllability - Password Encryption - Time Stamp - Storage up to 10 user info - Alarm Equip - Timeout - Limit Trial - Display Ability Reference: 1.)VHDL code for Digital lock example : venus.ece.ndsu.nodak.edu/~joeljorg/ee423/notes/ 2.)Programmable Devices : www.circellar.com/pastissues/TOCMar2k.htm 3.)Synchronous FSM Design Using VHDL : www.ee.calpoly.edu/courseware/ee359/cp359x03.html 4.)Digital Combination Lock : www.redbrick.dcu.ie/academic/CLD/chapter8/chapter08.doc5.html 5.)The real work : lesbos.esat.kuleuven.ac.be/courses/HJ03/exercises/lock/node10. html 6.)Digital Lock FSM example : lesbos.esat.kuleuven.ac.be/courses/HJ03/exercises/lock/node10. html 7.)Rapid Prototyping of Digital System (A tutorial approach) by Jamar O.Hambien and Michael D.Furman 8.)VHDL for Logic Synthesis by Andrew Rushton
Back to top

Winter 2002


Title: VHDL Implementation of RSA Cryptosystem Using Montgomery Algorithm Abstract: Due to the large amount of sensitive data that is transmitted across the Internet, cryptography has undoubtedly become one of the most important sectors of research and development. The RSA (Rivest, Shamir, and Adleman) public key encryption algorithm remains one of the most popular such schemes, due to its strength and applicability to both messages and digital signatures. However, the nature of the algorithm requires some extensive computation, meaning it can benefit greatly from an efficient implementation geared for minimized latency. In addition, a faster implementation would make it suitable for new applications, such as real-time systems. In this paper, we will investigate the RSA cryptosystem by implementing it in VHDL, intended for the Altera Flex 10K series FPGA. The Montgomery Algorithm for modular multiplication is well suited to the RSA algorithm, and using this as part of our implementation, we can increase the system's speed significantly. Since this algorithm is asymmetric (decryption is not simply the reverse of the encryption), we will implement a circuit capable of both encryption and decryption. References: [1] M. Shand, J. Vuillemin, "Fast Implementations of RSA Cryptography", Proc. IEEE 11th Symposium on Computer Arithmetic, 1993. [2] J.H. Gwo, C.L. Wang, H.C. Hu, "Design and Implementation of an RSA Public-Key Cryptosystem", Proceedings of the 1999 IEEE International Symposium on Circuits and Systems, Volume 1, 1999. [3] P.A. Wang, W.C. Tsai, C.B. Shung, "New VLSI architectures of RSA public-key cryptosystem", Proceedings of the 1997 IEEE International Symposium on Circuits and Systems, Volume 3, 1997. [4] A. G. Konheirm, "Cryptography: A Primer" John Wiley & Sons Inc., 1981.
Title: Hardware implementation for the Svoboda Algorithm (BOOLEAN ANALYZER). Abstract: Svoboda method is one of the important methods used for logic minimization in electrical engineering. There exists software that is used to find min-terms using the Svoboda algorithm for minimization. The time taken by the software to find the min-terms increase exponentially with the number of inputted variables. For example, an 8 variable min-term takes between 15-20 min to be processed using the software; a 16 variables input will take at least 1 hour to get the output. The Hardware on the other hand takes less time to process the min-terms. In this project will be implementing the Boolean Analyzer in hardware using VHDL. The circuit will take an n-input, in our case n=8, then it will store them in the memory to be processed through the formal divisor and the formal multiplier, and then the final input will be stored back in theme memory. This implementation was done before. In the old design, the circuits were not synchronized together and the design required a lot of hardware to implement. Our task now is to improve the old implementations by introducing a better design that will control and synchronize the circuit and the goal also is to reduce hardware consumption. References: Bhasker,J. Upper Saddle River, N.J.: PTR Prentice Hall, 19992- Stephen D. Brown, Zvonko G. Vranesic., Fundamentals of digital
Title: ATM Switch Design Abstract: Asynchronous Transfer Mode (ATM) has emerged as the promising technology for broadband communication including data, audio and video transfer. In this project, an ATM switch is implemented using an Altera FPGA. The chip will decipher addressing information, check the priorities of incoming packets, and route packets to their corresponding destinations. Each packet is also checked for its communication type, which determines the order in which each packet will be re-transmitted. The goal was to implement a working ATM switch model that can handle packet routing and is scalable for use over many ports. This will be tested by using a host module which will send both addressing information, as well as data packets, to the switch. The working switch should determine the destination address, and route the packet to the appropriate port. The host will then be used to ensure that the packet was routed to the correct output port. References: [1] A. Garcia-Leon, Communication Networks. New York, NY: McGraw-Hill Companies, Inc., 2000. [2] L. Peterson, Computer Networks, a Systems Approach. San Francisco, CA: Morgan Kaufmann, 2000. [3] A. Tanenbaum, Computer Networks. Upper Saddle River, NJ: Prentice Hall. 1996. logic with VHDL design McGraw-Hill, c2000.
Title: A Real Time Implementation of an ATM Abstract: Automatic teller machines have become an important part of the consumer's everyday life. The mere concept of an automatic teller machine raises many different design issues, including data-basing, security, reliability, encryption, and user friendliness. Our project will attempt to implement a functional automatic teller machine. To simplify the design, our machine will implement the following functions: withdraw deposit account balance We will concentrate our design on processing the information inputted by the user: verifying whether the pin number is a valid one and that it matches the card inserted, determining the tasks to be completed, verifying whether the task can be completed( enough funds in the client's account), re-calculating the new balance, returning the money (if required) The user interface will be simplified so as to not to take away from the focus of our design (as mentioned above). We will use a simplified SSL (secure socket layer) type encryption system to provide secure information, and the overall network will consist of three parts: the ATM machine, the host computer, and the bank computer. The host computer will be responsible for routing the client file to the right bank computer. References: 1. Bowen, J. (1998). How ATMs Work. How Stuff Works. http://www.howstuffworks.com/atm1.htm (2002, February 14). 2. (2000). Secure Socket Layer. Netscape. home.netscape.com/security/techbriefs/ssl.html (2002, February 16). 3. Dion, B., Dissoubray, S. (2000). Modeling and Implementing Critical Real Time Systems with Esterel Studio. Esterel Technologies. www.esterel-technologies.com/esterel/article1.pdf> (2002, February 15). 4. Tyson, J. (1998). How Encryption Works. How Stuff Works. http://www.esterel-technologies.com/esterel/article1.pdf (2002, February 17).
Title: HARDWARE IMPLEMENTATION OF THE BRESENHAM LINE AND CIRCLE ALGORITHMS Performance of most Graphics systems can be traced down to issues related to drawing of primitives (line, circles etc.) and polygon filling routines. Today's high quality video systems place big performance demands on their graphics packages. In order to increase rendering speeds, more and more hardware processing power is being added to the Graphical Processor Unit through the implementation of software routines into hardware. The Bresenham Line and Cricle drawing algorithms are amongst the most widely used routines in Graphics packages for their simplicity and efficiency. One of their main features is the fact that they rely solely on integer arithmetic avoiding the use of time and resource consuming floating point operations. In this project, we will explore the speed difference between hardware and software implementations by comparing the performance of the corresponding C and VHDL routines. REFERENCES 1. Macvicar & Singh, "Accelerating DTP with Reconfigurable Computing Engines", http://www.xilinx.com/labs/satnam/macvicar-singh-fpl98-final.pdf 2. Hearn & Baker "Computer Graphics: C Version", 2nd Edition, Prentice-Hall 1994. 3. Wright, W.E., "Parallelization of Bresenham's line and circle algorithms", IEEE Computer Graphics and Applications, Volume:10 Issue:5, Sept. 1990, Page(s): 60 -67
Title: The Solitaire Encryption Algorithm. Abstract: The Solitaire Encryption Algorithm was developed by Bruce Schneier. It provides a high level of cryptographic strength since it relies upon the secrecy of a key, rather than the secrecy of the algorithm itself. The key for a message is the initial arrangement of a deck of cards. A keystream, a stream of numbers between 1 and 26, is generated using one of the 54! possible permutations of the deck. This algorithm was initially described in Neal Stephenson's Cryptonomicon so that it could be used by secret field agents that only had access to a deck of cards. The algorithm assigns values to cards of a deck as the numbers 1-52, with two Jokers as 53-54. Letters of the alphabet are numbered 1-26. Using the arrangement of the cards, a specific multi-step process of reassembling cards and performing cuts to the deck, generates the keystream. Adding a keystream number to a letter number, modulo 26, enciphers the letter. Solitaire is reversible only if the key, which was used to encrypt the message, is known. The goal of this project is to implement Solitaire using VHDL in order to be able to successfully encrypt short messages. References: [1] B. Schneier, "Applied Cryptography, Second Edition", John Wiley & Sons, 1996. [2] D. Kahn, "The Codebreakers; The Comprehensive History of Secret Communication from Ancient Times to the Internet", Scribner, 1996. [3] D. R. Stinson, "Cryptography : Theory and Practice", CRC Press, 1995. [4] N. Stephenson, "Cryptonomicon", Avon Books, 1999.
Title: Implementation of the Geometry Stage of the Graphics Pipeline Abstract: The basic architecture of the graphics rendering pipeline is composed of three main stages. The application stage is executed entirely through software development, consisting mainly of animations, collision detections and other higher-level operations. The geometry stage, which can be implemented through hardware, is responsible for geometric transformations, lighting, projections and clipping. "this stage computes what is to be drawn, how it should be drawn, and where it should be drawn."[2] The final stage is the rasterizer, responsible for the correct rendering of images into pixels onto the screen. This project will focus on the implementation of the second stage of this graphics pipeline. The geometry stage will be broken down into three sub-stages, namely transformations, clipping, and viewport projections. In real-time graphics, the simulation of real lighting conditions is not allotted very much time and is thus implemented in the rasterizer section using light maps. Further, the models used are composed of triangles, since these are the geometric primitives employed by most graphics hardware. The primary goal will be to establish a hardware architecture that will optimize for maximum throughput, by heavily pipelining the design. The coordinate output from this stage will be composed of the proper coordinates, ready to be passed on to the final stage. References: [1] Blinn, Jim, A Trip Down The Graphics Pipeline, Morgan Kaufmann, California, 1996. [2] Haines, Eric, Moller, Tomas, Real-Time Rendering, A K Peters, Massachusetts, 1999. [3] Hearn, Donald, Baker, M. Pauline, Computer Graphics, C Version (2nd edition), Prentice-Hall, 1994.
Title: The Wallace Tree Multiplier Abstract: Integrated circuit design continuously evolves to improve speed of execution, area on chip, power dissipation and other factors. One operation that is common to a large number of applications and that can be implemented in a number of different ways is multiplication. This project examines one of the best algorithms for multiplication, the Wallace tree multiplier. The natural way of performing multiplication is by generating partial sums, shifting them, and adding them up. A Wallace tree is a tree of full adder layers that takes in bits of the same weight (aligned bits) and adds them up. Each layer reduces 3 input bits into 2 output bits. After as many stages as necessary, every tree is left with a carry out and sum bit. Those are combined to yield the result. In this project, we design a Wallace tree multiplier in VHDL. The inputs will be two unsigned 32-bit integers. We will report the latency of several multiplications and will analyse them to confirm the efficiency of the algorithm. References: 1. H. Schmit, "The Wallace Tree", http://www-2.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15828-s98/lectures/012 6/sld002.htm, 1998. 2. S. Warde, "Unsigned Multiplication", http://www.unf.edu/~swarde/Execution_Units/Unsigned_Multiplication/unsigned_ multiplication.html, 1999. 3. D. Wilde, "Tree and Array Multipliers", http://www.ee.byu.edu/ee/class/ee621/Lectures/L11.PDF, 2002.
Title: RNA Sequence to Amino Acid Chain Translation Unit Abstract: Bioinformatics is an exciting field in which multiple disciplines combine to form a bridge between the traditional sciences and emerging computer technology. Proteins, large organic molecules, are directly involved in the chemical processes essential for life.1 Proteins are composed of amino acid subunits, which RNA encodes for. This project will implement in hardware an efficient algorithm for mapping an RNA sequence of length n to all possible amino acid chains, based on known base pair codons. 2 As the RNA to amino acid mapping is only one portion of the genomic process3, this project can be integrated into a larger design. The serial input of an RNA sequence will be parsed to determine all possible amino acid sequences. The output will include some termination marker (as yet undetermined) to denote distinct amino acid sequences. The sequence will be stored in memory and will be processed by an FSM. Pointers to the identified sub-sequences will be stored in the output stage of the design; when the sequence has been fully processed, all identified sequences will be output serially, in order of discovery. Because Altera MaxPlus II will not be able to meet the requirements for simulation of our design, the project will be implemented using ModelSim or Altera Quartus. The results will be verified using existing web-based genetic software tools. References: 1. Dalton, Mark. Codon Table. http://www.cbc.umn.edu/~mwd/cell_www/chapter2/codon_table.html 2. Protein. Encyclopedia Britannica. http://www.britannica.com/eb/article?eu=119721&tocid=0&query=protein 3. Translate Tool. ExPASy Molecular Biology Server. http://ca.expasy.org/tools/dna.html 4. Canadian Bioinformatics Resource. http://www.cbr.nrc.ca/newdocs/home/home.html?cbrlang=eng+navbar=/newdocs/home/ 1 2 Amino Acid Codons. http://www.bmrb.wisc.edu/referenc/codons.html 3 DNA is transcribed to RNA. RNA is translated to amino acid sequences, which are assembled into proteins.
Title: A Simple FPGA implementation of an Elliptic Curve Cryptosystem. Abstract: Cryptography is a necessary feature of modern computing, as more and more transactions are being carried out everyday which require high-end security over a possibly transparent medium such as the Internet. Contemporary cryptography is based on the use of public key cryptosystems, which greatly simplify the problem of establishing a secure communication between two users, without the need to exchange secret private keys. One recently emerging public key cryptosystem is based on Elliptical Curve Cryptography (ECC). ECC is particularly well suited for hardware implementations on FPGAs due to its smaller key size as opposed to other cryptosystems such as RSA (Rivest, Shamir, Adelman). In fact it has been said to offer more security per bit then any other known public key cryptosystem. Our project will tackle the design of an ECC based cryptosystem in VHDL that can be implemented in regular Altera (Flex10K) FPGAs. References: 1. A.J. Menezes, "Elliptic Curve Public Key Cryptosystems.", Kluwer Academic Publishers, 1993. 2. I. Blake, G. Seroussi, and N. Smart. "Elliptic Curves in Cryptography.", Cambridge University Press, 1999. 3. M. Rosner, Elliptic Curve Cryptosystems on Reconfigurable Hardware, Master's Thesis, Worcester Polytechnic Institute, May 1998. 4. K. H. Leung, K. W. Ma, W. K. Wong, and P. H. W. Leong., "FPGA Implementation of a Microcoded Elliptic Curve Cryptographic Processor.", Proceedings of Field-Programmable Custom Computing Machines (FCCM'00). (pg. 68-76), 2000.
Title: New AES (Rijndael) FPGA Implementation Abstract: This project will design and implement the AES (Rijndael) encryption algorithm. Recently chosen as the National Institute of Standards and Technology's (NIST) official Advanced Encryption Algorithm (AES), the Rijndael algorithm is now widely used by US Government and other organizations for data security. Rijndael is a symmetric block cipher with variable block length and key length ranging up to 256-bits each. AES is license-free, with implementations in several languages widely available. The goal of the project will be to produce a functional implementation of Rijndael in VHDL. The focus of the development will be correctness, thus our results will be compared to results generated by the NIST's official C implementation. The implementation will be design for use with relatively small blocks and keys as proof of concept given the time constraints of the project. References: J. Daemen and V. Rijmen, "AES Proposal: Rijndael," http://csrc.nist.gov/encryption/aes/rijndael/Rijndael.pdf NIST AES Homepage http://csrc.nist.gov/encryption/aes/rijndael/ Rijndael Homepage http://www.esat.kuleuven.ac.be/~rijmen/rijndael/ http://rijndael.com/
Title: Tomasulo's Algorithm for Dynamic Instruction Scheduling Abstract: Modern processors use pipelining to increase throughput. Nevertheless, this can introduce hazards and stalls during the processor's execution. In order to achieve better performance, Tomasulo's algorithm can be used. This algorithm allows the system to reschedule instructions dynamically. This project will simulate the implementation of Tomasulo' algorithm in a DLX machine. The main goals are to obtain correct outputs and to achieve better CPU performance. The latter requires that hazards and stalls be reduced. Also the instruction execution must continue in the presence of hazards and stalls. In order to handle structural, control, WAW, WAR and RAW hazards, a hazard detection unit will be included in the system architecture. Dispatched instructions will be stored, along with their tag, data or data pointer, to reservation stations. Once they are ready, they will be issued to an execution unit. To eliminate name dependencies, a specialized register file and register control unit will be used to implement register renaming. Finally, LPM modules will be used to implement multiplication, addition, shifting, division and other operations. R eferences: 1. Hennessy & Patterson, "Computer Architecture: A Quantitative Approach," 2nd edition, Morgan Kaufmann, 1996.
Title:! Cryptography and the RC2 algorithm Abstract: Over the recent years, cryptography has played an increasingly important role in both research areas and in actual implementation as a way to provide a secure method of exchanging information over communication channels.! In fact, the Advanced Encryption Standard (AES), which will become effective May 26th, 2002, has sparked great interest in the cryptographic community. Cryptography, in its most simple form, can be viewed as a method of manipulating data in order to conceal its meaning.! The process of ciphering and deciphering may take on many forms and for the purpose of this project, the RC2 algorithm, developed in 1987 by cryptography pioneer Ron Rivest, will be implemented in hardware. RC2 is in widespread use in a number of commercial software packages, including Lotus Notes, Microsoft Windows, Internet Explorer and Netscape Communication's Navigator and Communicator. The RC2 algorithm is a secret-key block encryption with input and output block sizes of 64 bits each and with a variable key size, from one byte up to 128 bytes. RC2 encryption outperforms that of the DES and is much easier to implement. In this project, the RC2 encryption/decryption algorithm will be implemented in VHDL and it will be simulated on a FLEX 10K device.! Given the flexibility of the algorithm to use small keys and the restricted memory of the FPGA, an eight byte key will be used.! References: 1. RSA Security Inc., "FAQ" http://www.rsasecurity.com/rsalabs/faq/3-6-2.html 2. S. Vaudenay, "Fast software encryption: 5th international workshop, FSE 98, Paris, France, March 23-25, 1998: proceedings",Springer, 1998. 3. R. Housley, "Triple-DES and RC2 Key Wrapping" !! http://www.networksorcery.com/enp/rfc/rfc3217.txt 4. R. Rivest, "A Description of the RC2 Encryption Algorithm" !! http://www.sevillaonline.com/ActiveX/RFC2268.txt
Title: Optical Character Recognition Using Neural Networks Optical character recognition (OCR) refers to a process whereby printed documents are transformed into ASCII files for the purpose of compact storage, editing, fast retrieval, and other file manipulations through the use of a computer. The objective of this project is to perform a VHDL implementation that will enable us to recognize pre-determined letters in dot matrix form. The system shall accept as input a single character in dot-matrix form. The presence of basic line patterns in the image will be determined and passed on to a feed forward fully connected neural network, which will attempt to recognize the letter. The main tool that will be used to accomplish this task is Backpropagation Neural Networks. There will be a training set to train the network. This generation of the training set will be accomplished with the help of MATLAB. An attempt will be also be made to recognize imperfectly drawn, noisy or deformed letters to a fair amount of accuracy. References: 1. Donald R. Tveter, The Pattern Recognition Basis of Artificial Intelligence, IEEE Computer Society, 1998. 2. Donald R. Tveter, The Backprop Algorithm, http://www.cim.mcgill.ca/~jer/courses/ai/readings/backprop.pdf 3. Carl G. Looney, Pattern Recognition Using Neural Networks, Oxford University Press, 1997. 4. Christopher Bishop, Neural Networks for Pattern Recognition, Oxford University Press, 1995. 5. Jason Tiscione, Optical Character Recognition Using Neural Networks in Java, http://www.geocities.com/SiliconValley/2548/ochre.html 6. The Mathworks, Inc., Neural Network Toolbox, http://www.mathworks.com/access/helpdesk/help/pdf_doc/nnet/nnet.pdf 7. Warren S. Sarle, Neural Network FAQ, ftp://ftp.sas.com/pub/neural/FAQ.html
Title : FPGA Implememntation of a Character Recognition device Character input recognition is the natural way to interface between humans and the ever expanding computer universe. The current character input technologies are slow, inconvenient and error-prone. Several designs of character recognition hardware have been implemented and they are characterized by their differing algorithms, reliability, hardware efficiency and processing speed. In this project we will implement a character recognition unit. Some of the techniques of recognizing characters include Fuzzy Logic and Hidden Markov Models. We, however choose not to use these methodologies due to the fact that they require input buffering, and are thus non-real time input methodologies. Since real-time recognition is more of a requiremnet in today's computing world, we decided to use Tangent Distance method. In this project, we will undertake the design of a character recognition unit based on an FPGA based system. A character is to be written on a 16 by 16 bit matrix input. This character will be transformed into a number of slopes, fed in sequentially to a recognizer (ROM). Though the character recognition has numerous applications, our main purpose behind its design is to facilitate the character input into modern PDAs. Thus, users can simply write out word documents and short memos without the need of a keyboard. To test our design we will generate test vectors from hand written replicas and feed these vectors to our unit. Above all, we will focus mainly on the accuracy and reliability of our system. References: 1)Patrice Y. Simard, Yann A. Le Cun, "Transformation Invariance in Pattern recognition - Tangent Distance and tangent propagation". http://research.microsoft.com/~patrice/PDF/tricks.pdf 2) A. Kosmala, D. Willett, G. Rigoll "A new hybrid approach to large vocabulary cursive handwriting recognition". IEEE online Library, catalog number: 98EX170, volume number 2, August 1998 3) IEE Colloquium on 'Handwriting and Pen-Based Input' (Digest No.1994/065). Total Pages: 28 Accession Number: 4682435 4) Hurst, W.; Jie Yang; Waibel, A. 'Error repair in human handwriting: an intelligent user interface for automatic online handwriting recognition'. IEEE Catalog Number: 98EX174
Title: Scan Conversion: A hardware implementation of a 2-d Line-drawing unit. Abstract: Line drawing is at the core of all computer graphics applications. It is the building block of computer graphics from which polygons and more complex shapes are made. Scan conversion is the conversion of a geometric figure (line, circle, ellipse, font, etc) into the set of pixels that best represent it on a screen. It is typically hardware accelerated on all modern graphics subsystems, and constitutes a very important step in the graphics pipeline. For this project, we will synthesize a low-latency line-drawing unit on an FPGA using VHDL. The unit will accept 4 16-bit inputs, indicating the unisgned coordinates of the line's beginning and end (x1, y1, x2, y2). The unit will generate one pair of coordinates per clock cycle to describe the line, until the line has been completely "drawn". Having such a large range of coordinates allows this unit to be used on screens of very high resolutions. Many high-end graphics subsystems also work internally on a resolution 4 times bigger than the screen resolution then average out the color of 4 internal pixels to yield one screen pixel. This is used to anti-alias the image. Our unit will therefore be able to handle this situation. References: 1. Hearn, Baker "Computer Graphics C version" Second Edition, Prentice-Hall 1994 2. McAllister, David K. "Line Drawing Algorithms", http://www.cs.unc.edu/~davemc/Class/136/Lecture9/Lines.html , September 1998. 3. Foley, James D. and Van Dam, Andries, "Computer Graphics: Principles and Practice", Second Edition (C version), Addison-Wesley, 1996.
Title: The Blowfish Algorithm "The cryptographic community needs to provide the world with a new encryption standard. DES, the workhorse encryption algorithm for the past fifteen years, is nearing the end of its useful life. Its 56-bit key size is vulnerable to a brute-force attack, and recent advances in differential cryptanalysis and linear cryptanalysis indicate that DES is vulnerable to other attacks as well." Quote from author Bruce Schneier from his book "Fast Software Encryption" Security is of pivotal importance in a world where more and more services are being offered online. Since the early days of computers and internet, there have been hackers who use whatever methods required to decipher sensitive data for benevolent or more serious purposes. It is for this reason we have selected the Blowfish Algorithm as our project design: to better understand ciphering. This algorithm is of variable-security size block cipher that is based on a Fiestel Network implementation. It is fast, compact and simple according to it's creator Bruce Schneier. The reason for choosing this algorithm is because according to many sources, by using a non-invertible function as apart of it's algorithm, Blowfish "is perhaps one of the most secure algorithms available. There are no known attacks against Blowfish." (according to www.kremlinencrypt.com) Our project will implement this algorithm structurally using VHDL code, with design decisions (speed vs hardware, optimized for hardware, latency or throughput) to be decided as we proceed. The algorithm belongs to the public domain. A diagram depicting this algorithm has also been attached and a copy of the C code for this algorithm as written by Schneier himself is available at: http://www.counterpane.com/blowfish-download.html References Counterplane Labs: The Blowfish Algorithm http://www.counterpane.com/blowfish.html Description of a New Variable-Length Key, 64-Bit Block Cipher (Blowfish) http://www.counterpane.com/bfsverlag.html Concepts of Cryptography : An Introduction http://www.kremlinencrypt.com/crypto/concepts.html Concepts of Cryptography: Algorithms http://www.kremlinencrypt.com/crypto/algorithms.html B. Schneier, "Applied Cryptography, Second Edition", John Wiley & Sons,1996 B. Schneier, "The Twofish Encryption Algorithm", John Wiley & Sons, 1999
Title: Pipelined FPGA implementation of the 2-D Discrete Cosine Transform Abstract: The 2-D Discrete Cosine Transform (DCT) is at the core of JPEG, MPEG and HDTV multimedia compression. It is similar to the DFT in that it provides a mapping from the spatial domain to the frequency domain. This project aims to efficiently implement the 8x8 DCT with maximal throughput. The design is based on the row-column decomposition method, and it consists of three optimally pipelined stages. In the first stage, the 1-D DCT of each input row is computed. The second stage provides a matrix transposition buffer. And the final stage generates the 1-D DCT of the resulting columns. The system's inputs and outputs are defined as follows: Inputs clk - Clock row_in[] - Input row: 8 8-bit unsigned integers Outputs col_out[] - Output column: 8 11-bit signed DCT coefficients When the pipeline is full, the circuit will accept a new row and output a new column at every clock cycle. References: [1] Pipelined fast 2-D-DCT architecture for JPEG image compression, Agostini, L.V.; Silva, I.S.; Bampi, S., Integrated Circuits and Systems Design, 2001, 14th Symposium on., 2001 Page(s): 226 -231 [2] Efficient realization of FPGA-based two dimensional discrete cosine transform Helal, H.I.; Salama, A.E.; Mashali, S. ASIC/SOC Conference, 1999. Proceedings. Twelfth Annual IEEE International , 1999 Page(s): 186 -190 [3] Minimum Multiplicative Complexity Implementation of the 2-D DCT using Xylinx FPGAS, Dick C. Proc. of SPIE's Photonics East '98, Configurable Computing: Technology and Applications, Boston, MA, USA pp. 190-201, November 1-6 1998
Title: Associative Memory Array Processor for Character Recognition Abstract: Parallel processing can play a key role in speeding up image processing computations. An "Associative Processing System" comprises of parallel 'content-addressable' memory cells. In this project, we design and implement an Associative Memory Array Processor (AMAP) that can be used for character recognition. We use the "bit-parallel organization" form of associative memory. !!! !!! Our system will comprise of an input unit, an output unit, an AMAP and a central control unit. As inputs, we will consider 8x8 bit representations of characters. The AMAP will comprise of n slices (one for each character), each having 8x8 identical processing elements. The operation of each processing element can be divided into two processes: the "learning process" and the "recognition process". During the learning process, different typefaces of the same character are fed into the system. In the recognition process, a decision is taken as to whether the given inputpattern matches any of the AMAP patterns, and if so, a decoder is used to output the equivalent ASCII code. The AMAP uses special comparison logic to access stored data in parallel according to its contents, and it employs an algorithm !and architecture that are simple as compared to neural networks. Please note that though we would like to allow for recognizing the characters A-Z and 0-9, considering the time required for the learning process, and also the amount of memory required, we will implement our design such that it will allow only for the recognition of the characters 0-9. References: 1. Nabeel El-Nadi, Gamal M. Aly, Zaki T. Fayed and H. M. Faheem, "AMAP for typecasting," IEEE Potentials, Volume: 20 Issue: 2, April-May 2001, Page(s): 28 -30 2. L. Chisvin and! R.J. Duckworth, "Content-addressable and associative memory: alternatives to the ubiquitous RAM," IEEE, Computer, Volume: 22, Issue:7, July 1989
Title: Implementing 2-D Median Filter in VHDL The two dimensional (2-D) median filter is a non-linear image enhancement technique that effectively removes salt & pepper noise (shot/impulse noise) from grayscale intensity images. With this technique, it replaces each pixel by the median of the gray levels in a neighbourhood of that pixel. The pixel block size to determine the medians are 8x8. The filter is designed to operate on images with 8-bit per pixel. The objective of the project is to write a VHDL description of the median filter implemented on Alteras 10K10 chip. The 2-D filter is built using a 1-D filter (constructed by a compare/swap algorithm), memory units and transposing to create a 2-D application. Testing and verification of the design will be done on several 400 x 400 images using Matlab. References: Palaniswamy, Krishna J., Rizkalla, Maher E., Sinha, Akhouri C., El-Sharkawy, Mohamed, and Slama, Paul, VHDL Implementation of 2-D median filter, IEEE 1999 Fisher, Bob, Perkins, Simon, Walker, Ashley, and Wolfart, Erik, Median Filter, University of Edingurgh, UK, 1994 Available on URL http://www.cee.hw.ac.uk/hipr/html/median.html MATLAB Image Processing Toolbox, The Math Works Inc., 1997
Title: A Hardware Implementation of the Private-Key Cryptosystem RC6 Abstract: Created for RSA Data Security (now RSA Security), RC6 is a private-key cryptosystem that was created as an evolutionary step after the RC5 algorithm, which is still known as a very effective encryption system. It implements a block cipher algorithm similar to that of RC5 and has been considered as a candidate for the basis for the Advanced Encryption Standard (AES) and a replacement for the aging DES system. It is a fairly simple cipher consisting of generating a key component, then using the key to perform encryption and decryption functions. RC6 excels in certain areas such as security and performance. However, it was considered somewhat cumbersome to be implemented in hardware, due to the algorithms involved, though it is possible. The goal of this project is to implement this system in hardware, by simulating it in VHDL, and analyzing its performance compared to software as to determine performance difference and the applicability of RC6 as a hardware implemented cryptosystem. References: 1. RSA Laboratories. "RC6 Block Cipher." http://www.rsasecurity.com/rsalabs/rc6/ 2. Abdul Ragab, Nabil Ismael, and Osama Allah. "Enhancements and Implementation of RC6 Block Cipher for Data Security." http://www.ieee.org 3. Scott Contini, Ronald L. Rivest, M.J.B. Robshaw, and Yiqun Lisa Yin. "The Security of the RC6 Block Cipher." http://www.rsasecurity.com/rsalabs/rc6/security.pdf
Title: An FPGA Implementation of the Cyclic Redundancy Check (CRC) Algorithm Abstract: Transmission errors are rare on the digital segments of wired communication links but they are common on the analog segments - the so-called "last mile". In addition, wireless communication, with its typically high error rates, is becoming commonplace. It is important to detect transmission errors so that remedial action may be taken. High-speed detection is necessary to accommodate increasing data transmission speeds, hence the need for a hardware implementation. In this paper, we present an FPGA implementation of the Cyclic Redundancy Check (CRC) algorithm. Also known as the polynomial code, this error detection method is in widespread use due to its effectiveness and efficient use of error detection overhead bits. Bit strings are treated as representations of polynomials: the bit string is the coefficient list and a bit's position within the string indicates the power. The CRC algorithm is based on binary long division and uses a fixed generator polynomial mutually agreed upon by the transmitter and receiver. Our design, implemented in VHDL using Altera Max+Plus II software, has two modes of operation: Calculate (for the transmitter) and Check (for the receiver). In Calculate mode, the circuit takes two inputs: a message (arbitrary data) and a generator polynomial, and uses them to calculate the CRC. In Check mode, the inputs are the message, the CRC calculated previously and the generator polynomial; the circuit checks the CRC and signals the result via an Errors/No Errors output. References: 1. Tanenbaum, Andrew S., "Computer Networks", Third Edition, Prentice Hall PTR, 1996. 2. Leon-Garcia, Alberto and Widjaja, Indra, "Communication Networks - Fundamental Concepts and Key Architectures", McGraw-Hill Higher Education, 2000. 3. Brown, Steven and Vranesic, Zvonko, "Fundamentals of Digital Logic with VHDL Design", McGraw-Hill Higher Education, 2000.
Title: Artificial Neural Network for Pattern Recognition Abstract: Artificial Neural Network is a system modeled on the human brain. A neural network attempts to simulate the layers of simple processing elements called neurons. Each neuron is linked to other neurons with varying coefficients of connectivity (weights) that represent the strengths of these connections. Learning is accomplished by adjusting these strengths to cause the overall network to produce the appropriate output. The most successful applications of neural network systems are in pattern categorizing and pattern recognition. Such systems can be used to identify an entity such as an illness or a chemical compound. Another application of neural networks is character recognition and handwriting recognition. This application is very useful in banking where it is crucial to correctly read and recognize handwriting. In this project we will attempt to implement a neural network that recognizes characters and numbers, using simple logic gates. The computation performed in a neural network can require a considerable amount of hardware. For this reason we will use stochastic pulse sequences, and probabilities to represent the inputs to the system and the weights of different connections. References: 1. Pao Yoh-Han, "Adaptive Pattern Recognition and Neural Networks", Reading mass: Addison-Wesley, 1989. 2. Haykin Simon, "Neural Networks: A comprehensive foundation", Upper Saddle River, N.J., 1999. 3. Sriram, Ram D., "Intelligent systems for Engineering: A knowledge-based approach", London, New York, 1997. 4. Jeffrey A. Dickson, Robert D. McLeod, Howard C. Card, "Stochastic Arithmetic Implementations of Neural Networks", CCVLSI, 1992.
Name: Nguyen Minh Tu , Vichetr Lim. Title: Token Ring The purpose of this project is to implement a Token Ring. IBM originally did this token ring. In this project we will implement a simplified version of the token ring. The idea is that multiple stations share a common bus. Whichever station possessed to token has the right to transmit. If a station receiving the token has nothing to send, it has to pass the token to the next end station. Else the station seized the token and sends the information. While the package (information) is sent, no token is on the network. Thus no collision occurs. New token will be released, once the information is successfully sent to its destination. Each station has a limited amount of time to hold the token. This project will be implemented from scratch. Therefore, it will focus on the correctness, not the speed nor hardware optimization. Reference: Alberto Leon-Garcia, Indra Widjaja communication Networks Fundamental Concepts and Key Architecture http://www.rad.com/networks/1997/nettut/token_ring.html http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/tokenrng.htm http://www.webopedia.com/TERM/T/token_ring_network.html
Title:VHDL Implementation of the Go-Back-N ARQ Protocol Automatic Request(ARQ) protocols provide reliable transfer of information over a connection-oriented network or a data link. These protocols are essential when transmitting over channels that are prone to errors. ARQ techniques provide synchronization and timing recovery,coordinating the actions of two or more separate machines with different state information. These capabilities are very important for applications that involve voice, audio and video data. As a result, ARQ forms the basis for peer-to-peer protocols that provide for the reliable transfer of information. In this project, we will implement Go-Back-N ARQ using VHDL. The Go-Back-N ARQ is an example of a Sliding-Window protocol, where the transmitter maintains a window of available packet sequence numbers and slides forward as packets are sent. The transmitter sends a number of frames into the channel, optimistic that the frames will be received correctly and not require transmission. If things turn out as expected, an ACK for each frame will arrive in due time while the transmitter is still busy sending frames into the channel. According to this scheme, when the system is done with the first frame, the handling of the subsequent frames is already well underway. In the case when a frame undergoes transmission errors, the receiver ignores this frame and all subsequent frames. Eventually, when the transmitter reaches the maximum allowable number of outstanding frames, it is forced to "go back N" frames and begins retransmitting all packets from the lost frame onwards. Go-Back-N ARQ ensures more efficient channel utilization relative to the Stop-and-Wait protocol wherein the transmitter always waits for an ACK before sending a subsequent frame. Therefore, Go-Back-N ARQ provides flow control and error-free transmission, achieving a better performance at the expense of complexity. References: 1. Leon-Garcia, Alberto and Indra Widjaja, "Communication networks: Fundamental Concepts and Key Architectures", McGraw-Hill Companies, Inc., 2000. 2. Larry L. Peterson and Bruce S. Davie, "Computer Networks: A System Approach", 2nd ed, Morgan Kaufmann, 2000.
Title: Implementation of a Neural Network Based Path Planning Algorithm using VHDL Abstract: Path planning is one of the important problems in motion control of a mobile robot. Given a starting position and a destination, the problem consists of finding a continuous path for the robot from the starting position to the goal position avoiding obstacles along the way. Software implementations are common in the artificial intelligence field, however the growth of the problem inreases exponentially, as the distance between the source and destination increases. Therefore, the software becomes ineffective, given a large number of nodes hence the proposal of hardware to perform computations faster. The implementation in VHDL will consist of searching the graph for a sequence of nodes that connect the start position to the goal position, avoiding any obstacles. The graph will be represented by an 8X8 matrix consisting of '1's and 0's. A '1' indicating an obstacle and a '0' indicating free space. The input will be a given starting coordinate and a destination coordinate. The output will be a sequence of coordinates of the optimum path from the source to the destination avoiding all obstacles. References: Hardware implementation of a neural network based path planning algorithm by using the VHDL, Chan, R.H.T.; Tam, P.K.S.; Kwok, D.P.; Cheung, P.W.M. Industrial Electronics, Control, and Instrumentation, 1993. Proceedings of the IECON '93., International Conference on ,1993 Stuart Russell and Peter Norvig. Artifical Intelligence: A Modern Approach. Prentice Hall, 1995.
Title: A VHDL Implementation of the International Data Encryption Algorithm (IDEA) Abstract: The International Data Encryption Algorithm (IDEA) is a state-of-the-art data encryption algorithm that permits effective protection of transmitted and stored data against unauthorized access by third parties. The algorithm was developed by Xuejia Lai and James Massey in 1992 and is currently used in many commercial software programs, including the popular e-mail encryption and decryption program Pretty Good Privacy (PGP). IDEA is a symmetrical block cipher algorithm that operates on a 64-bit block and uses a 128-bit key. This is twice the length of the key used in Data Encryption Standard (DES), the first official U.S. government cipher intended for commercial use, and still the most popular cryptosystem in use today. Therefore, it is much more secure. In fact, while researchers have succeeded in breaking DES encoded messages by brute-force, no such attempts on breaking the IDEA algorithm have succeeded. In this project, we will implement IDEA in VHDL. Our goal is to implement both encryption and decryption, since the algorithm itself is structured so that the two processes are almost identical. We anticipate that this project will allow us to demonstrate a wide range of skills in VHDL programming, as the algorithm requires the use of operations from three different algebraic groups and uses 8-encryption rounds followed by an output transformation to produce the cipher text. References: 1. Mitchell Shnier, Dictionary of PC Hardware and Data Communications Terms, OReilly & Associates, November 1995. 2. A. G. Konheirm, "Cryptography: A Primer" John Wiley & Sons Inc., 1981. 3. C. H. Meyer and S. M. Matyas, "Cryptography: A new dimension in computer data security" John Wiley & Sons Inc., 1982. 4. T. H. Cormen, C. E. Leiserson and R. L. Rivest, "Introduction to Algorithms" McGraw-Hill Book Company, 1990. 5. Media Crypt, Highest Security With Top Performance IDEA International Data Encryption Algorithm, Product Sheet/IDEA, October 2001. http://www.mediacrypt.com/engl/Downloads/IDEAOctFinal.pdf 6. Media Crypt, IDEA Technical Description, http://www.mediacrypt.com/engl/Content/technical_description.htm 7. John J. G. Savard, IDEA International Data Encryption Algorithm, 1998. http://home.ecn.ab.ca/~jsavard/crypto/co0404.htm
Title: A hardware implementation of Fast Phong Shader Today's Computer graphics applications require fast, efficient and realistic image generation of graphic objects. These graphic objects are usually polygon mesh approximations of curved surface objects. Currently there are two popular methods for rendering (drawing) these polygons: Gourad Shading and Phong Shading. Phong Shading renders more realistic images than Gourad Shading. However, Phong Shading requires much more calculations than Gourad Shading and is consequently slower. We will be implementing Fast Phong Shading, which is a speeded up modification of Phong shading by approximating the calculations using Taylor Series approximations. Hence, our implementation will produce quick and realistic images. A polygon surface is rendered using Phong Shading by carrying out the following steps: 1. Determine the average unit normal vector at each polygon vertex. 2. Linearly interpolate the vertex normals over the surface of the polygon. 3. Calculate projected pixel intensity for all the surface points. References: Donald Hearn & M. Pauline Baker, "Computer Graphics" 2nd edition. Foley & VanDamn, " Computer Graphics" 2nd edition.
Title: LZW Compression Abstract: Lempel-Ziv-Welch, or LZW compression is a lossless, 'dictionary based' all purpose data compression technique that is most often used in PDF, TIFF and GIF file compression. Dictionary based algorithms scan their input for sequences of data that occur more than once. These sequences are then stored in a table and references to the table are used to represent repetitive data within the compressed file. Compression occurs when a single code is output instead of a string of characters. LZW compression is fast and works best for files containing lots of repetitive data. In this project we will design an FPGA implementation of LZW compression and decompression. The circuit will be described using VHDL and synthesized with the Altera MaxPlus II software. References: - Welch, Terry A. "A Technique for High Performance Data Compression". IEEE Computer, Vol. 17, No. 6, 1984, pp. 8-19 http://www.cs.cf.ac.uk/Dave/Multimedia/node214.html - Nelson, Mark. "LZW Data Compression". Dr. Dobb's Journal. October, 1989 http://dogma.net/markn/articles/lzw/lzw.htm - Leurs, L. "LZW Compression". http://users.belgacom.net/prepresspanic/techno/compressionlzw.htm - Blackstock, Steve. "LZW and GIF explained". http://www.la.unm.edu/~mbrettin/algorithms/lzexplained.html
Title: Digital Signal Processing - IIR/FIR Filter Bank The tentative goal of the project is to implement a complete filter bank consisting of an assortment of finite impulse response and infinite impulse response systems. For each type of impulse response system, a low-pass, band-pass and high-pass filter will be implemented using various windowing techniques (ie: Bartlett, Blackman, Hamming, Kaiser...etc.). This will provide different precision, complexity and speed implementations that will allow a user to find a suitable filter for the application requirements. The filter bank may require a user input for the windowing parameters, or may have these dimensions preset for best performance and precision. Some applications may not require the sharp cutoff spectrum that is offered by more complex filter designs, which may involve longer processing latencies. In this case, a simpler implementation with higher throughput may be desired. The filter bank will also have adjustable parameters to allow for maximum customization abilities (ie: cutoff frequency, bandwidth, band location...etc.). Since most DSP applications are continuous, real-time processes, the filter bank will be optimized as much as possible for maximum throughput. If time permits, several extra features can be implemented. Extra Features: Narrowband scaling to allow for signal equalization capabilities, antialiasing filtering, simple A/D & D/A converters, sampling rate adjustment capabilities...etc. References: 1. Alan V. Oppenheim, Alan S. Willsky, "Signals And Systems", 2nd ed, Prentice Hall. 2. Alan V. Oppenheim, Ronald W. Schafer, "Discrete-Time Signal Processing", 2nd Ed. Prentice Hall. 3. "TMS320C6000 Code Composer Studio Tutorial", Literature Number: SPRU301C, copyright 2000, Texas Instruments Inc.
Title: Floating Point Manipulations Abstract: Floating point computations have always been of interest in the field of research in today's information technology world. Many great engineering and scientific advances of recent decades would not have been possible without the floating-point capabilities of digital computers. The complexity of handling various cases of floating point has baffled even the most experienced mathematicians. When representing floating point numbers the result might be inexact but the question arises as to how much error is acceptable. Our interest in floating point developed a lot over this semester because of its excessive usage in this semester (Micro processor lab and Computer Architecture lab). Since we have worked on it a lot already and realize all the issues and cases that might arise due to the floating point manipulations we came out to the conclusion that this would be an interesting as well as a challenging project. As part of our project, we will be implementing the addition, subtraction, multiplication and division of two IEEE standard 32 bit Floating point numbers. Also, one of the major objective of this project will be to consider all the special cases invovled in floating point manipulations which includes representing zero, infinity, not-a-number (NaN), overflow, normalized numbers as well as denormalized numbers. We would be dealing with cases where one of the two number is denormalized and the other one is normalized and both numbers being either normalized or denomarlized. The final deliverable will be a well documented and organized working model of two 32-bits floating point manipulations. The model will be implemented and tested in VHDL and we hope to gain a better understanding of floating point numbers and the hidden mystery behind their operation. References: - Patterson, David A., and Hennessy, John L., Computer Organization & Design - The Hardware/Software Interface - MicroProcessor Systems 304-426 (Author: Zilic Zeljko) http://www.ece.mcgill.ca/~ee426/ (MicroProcessor Systems) - IEEE-754 FLoating Point conversion (Author: Kevin J. Brewer) http://babbage.cs.qc.edu/courses/cs341/IEEE-754.html - Contemporary Logic Design, Benjamin/Cummings, Redwood City, 1994 - Single Precision Floating Point Unit (Author: Mani Sudha Yalamanachi) Email: sudhadimpy@yahoo.com http://www.ee.pdx.edu/~mperkows/CLASS_VHDL/VHDL_CLASS_2001/Floating/projreport. html
Back to top

Winter 2003


Title: Simple and Symmetric: the RC4 Encryption Algorithm Abstract: Since the advent of the Internet and of network transactions, [...] protect sensitive information. The objective is to find a simple, fast, reliable and easy to implement solution which is strong enough for security needs. The RC4 encryption algorithm proves to be an ideal choice for those wishing to ally strength, simplicity and efficiency. RC4 is a symmetrical algorithm, which means the same method, or key, is used for encryption and decryption; thus, the key does not need to be transmitted with the message. Furthermore, RC4 is a strong algorithm: each key can only be used once, increasing the level of protection without making the algorithm more complex, while only a small set of keys are considered weak. Finally, RC4 is ten times faster than the DES algorithm. These advantages make the RC4 algorithm interesting to study and implement, and this is what we propose to do here. In this project, we will develop an RC4 encryptor and decryptor on an Altera FPGA chip, using VHDL. We intend to illustrate the inner workings of the RC4 mechanism, by streaming in unencrypted bits, encrypting them using a key and showing the encrypted result at the output. This output will then be decrypted using the same key using a distinct module. We will build scalable n bit models which minimize latency; these modules can be combined to form larger modules, which would then maximize throughput. [1] M. Shaffer, "RC4 Encryption Using ASP & VBScript," January 2000, [2] C. Grogans, J. Bethea , I. Hamdan , North Carolina Agricultural and Technical State University, "RC4 Encryption Algorithm," March 2000, [3] A. Roos "Weak Keys in RC4," Posted on newsgroup sci.crypt, 22 Sep 1995. [4] R.J. Jenkins Jr., "ISAAC and RC4," 1996
Title: Hardware implementation of the Data Encryption Standard algorithm Cryptography is the process of maintaining and communicating in secret writings or ciphers, which is essential nowadays for organizations to maintain privacy and data security. Several cryptography algorithms exist, with the more advanced ones requiring the use of a "key" to control the encryption and decryption processes. These algorithms can be grouped into Public (asymmetric), or Secret (symmetric) key algorithms. Asymmetric algorithms make use of different keys for encryption and decryption as opposed to symmetric algorithms which use the same key for both processes. The Data Encryption Standard (DES) symmetric algorithm is suitable for larger messages because it is a fast block-encryption cipher; it encrypts entire data blocks at a single time. The proposed project implementation will allow for 64-bit data block encryption using a 56-bit randomly generated key. This means that the data blocks, which are operated on as bits, can be transformed into 2^64 possible combinations. A key used for a specific application can only generate unique encryption of the data; hence unauthorized recipients cannot recover the ciphertext without the key. This project design involves the implementation of the DES algorithm using VHDL. The design will be tested and simulated on the Altera MaxPLUS II software. References: [1] D. Stinson, ?Cryptography: Theory and Practice?. CRC Press, 1996. [2] R. A. Mollin, ?An introduction to cryptography?. Chapman & Hall/CRC, 2001. [3] D. E. Denning, ?Cryptography and data security?. Addison-Wesley, 1982. [4] J. O. Grabbe, ?The DES Algorithm Illustrated?
Title: Design of the RSA public-key cryptosystem Cryptography, simply defined, is the process of combining some input data, called plaintext, with a user-specified password, called a key, to generate an encrypted output, called ciphertext, in such a way that, given the ciphertext, no one can recover the original plaintext without the encryption password in a reasonable amount of time. This project shall concentrate on the design a encryption/decryption system based on the RSA algorithm (named after Ron Rivest, Adi Shamir and Len Adleman who invented it in 1977) using VHDL. The goal is to be able to encrypt and decrypt a message being sent between two communicating parties, using both a public and private key. The basic algorithm is illustrated in the attached file. Delfs, Hans, "Introduction to cryptography: Principles and applications" Springer, 2002. Stinson, Douglas R, "Cryptography:Theory and practice" Chapman & Hall, 2002. Graham, Ian G, "A Brief Introduction to Cryptography",
Title: Data Encryption Standard (DES) algorithm The algorithm will be designed to encrypt and decrypt blocks of data consisting of 64 bits under control of a 64-bit key. Decrypting will be accomplished by using the same key as for encrypting, with the exception that it will be accessed in the reverse order. A block to be encrypted will first be subjected to an initial permutation (IP). Subsequently, it will iterate 16 times through a complex key-dependent computation and finally it will be subjected to an inverse permutation (IP-1). The IP and IP-1 units will alter the order of their inputs. The key-dependent computation will be simply defined in terms of a function f, called the cipher function, and a function KS, called the key schedule. The input block will be split into two blocks L and R where L corresponds to the left half of the input block and R to the right half. At each of the 16 iterations, a new K, L and R will be computed according to the following formulas: Kn = KS(n, KEY), Ln = Rn-1 and Rn = Ln1 (+) f(Rn-1, Kn) where n represents the iteration and (+) represents a bit by bit addition modulo 2. * [1] B. Schneier, "Applied Cryptography", 2nd Edition, Wiley 1996. [2] J. Gilmore, "Cracking DES: Secrets of Encryption Research, Wiretap Politics & Chip Design", May 1998. [3] R. K. Nichols, "ICSA Guide to Cryptography", McGraw-Hill, 1999. [4] Federal Information Processing Standards Publications, "Announcing the Standard for Data Encryption Standard", http://www.itl.nist.gov/fipspubs/fip46-2.htm. [5] Tropical Software, "DES Encryption", http://www.tropsoft.com/strongenc/des.htm.
Title: Implementation of RC6 Block Cipher Cryptography System In the ever-evolving world of data security, technological methods are constantly being improved upon to better protect information. Encryption and decryption ciphers are powerful tools in the field of data protection. While many different algorithms exist, we have chosen to investigate and realize the RC6 algorithm. Designed by Ron Rivest in collaboration with associates from RSA laboratories, the RC6 block cipher is an evolutionary improvement of RC5. It was designed to meet the requirements of the Advanced Encryption Standard (AES). In order to be AES compliant, a block cipher must utilize 128-bit input/output blocks. The RC6 cipher employs four 32-bit registers for operations that have the advantages of doing two rotations per round and using more bits of data to determine rotation amounts in each round. Also, it includes integer multiplication as an additional primitive operation, a feature not found in RC5. Overall, RC6 maximizes the diffusion achieved per round, allowing for greater security, fewer rounds and increased throughput. Therefore, we will implement the RC6 cryptosystem algorithm in VHDL intended for use on the Altera Flex 10k series FPGA. References: 1. RSA Laboratories | RC6 Block Cipher http://www.rsasecurity.com/rsalabs/rc6/ 2. Ronald L. Rivest, M.J.B. Robshaw, R. Sidney, Y.L. Yin, "The RC6 --- Block Cipher" ftp://ftp.rsasecurity.com/pub/rsalabs/rc6/rc6v11.pdf 3. "The RC6 Block Cipher: A simple fast secure AES proposal" http://csrc.nist.gov/encryption/aes/round1/conf1/rc6-slides.pdf
Title: A Hardware Implementation of the International Data Encryption Algorithm (IDEA) Data encryption has been an increasingly important concept in the develompent of secure communications. There is a vast array of different algorithms that exist to encrypt and decrypt data. IDEA (International Data Encryption Algorithm) was developed at ETH (the Swiss Federal Institute of Technology) and patented by the Swiss firm Ascon.Compared to the DES (Data Encryption Standard) algorithm, IDEA has been shown to be faster and more secure. For IDEA, the same algorithm is used for encryption and decryption. It is a block cipher algorithm that operates on 64-bit plaintext blocks, using a 128-bit key. The 64-bit input block is divided into four 16-bit blocks, which become input blocks to the first round of the algorithm. There are eight rounds in total, and in each round the four blocks are XORed, added, and multiplied with subkeys based on the original key. Between each round, the second and third subblocks are swapped. The algorithm design will be coded in VHDL and implemented on an Altera Flex10K FPGA. In order to implement the algorithm within time and hardware constraints, the block size will be 16 bits rather than 64 bits. References: [1] Crypto Systems Incorporated (2002). [Online]. The IDEA Encryption Algorithm. Available: http://www.finecrypt.net/idea.html. February 17, 2003 (accessed). [2] Frame Technology (1994). [Online]. International Data Encryption Algorithm (IDEA). Available: http://www.cs.nps.navy.mil/curricula/tracks/security. February 17, 2003 (accessed). [3] A. J. Menezes. Handbook of Applied Cryptography. Boca Raton: CRC Press, 1997. [4] J. G. Savard (2000). [Online]. IDEA (International Data Encryption Algorithm). Available: http://home.ecn.ab.ca/~jsavard/crypto/co040302.htm. February 17, 2003 (accessed). [5] B. Schneier. Applied cryptography :protocols, algorithms, and source code in C. New York: Wiley,1996. [6] TechTarget (2003). [Online]. International Data Encryption Algorithm. Available: http://searchsecurity.techtarget.com. February 17, 2003 (accessed).
Title: RSA (Rivest, Shamir, and Adelman) public-key Cryptosystem. RSA is a popular algorithm used for encrypting and decrypting data. The algorithm was developed by three mathematicians Rivest, Shamir, and Adleman. The difficulty in cracking RSA encryption is the reason why RSA is implemented in many communication and security signatures today. There is an increased need for secure transmission of data (online banking, intranet communication, digital cellular phones) and a reliable means to encrypt data is required. Developed in 1977, the system uses a private and public key. Every user has a pair of keys: a public key, which is provided to all partners and a private key. A message is encrypted with a public key and can only be decrypted with the private key and vice-versa. the security of RSA is based on the difficulty tht lies in factoring large numbers as opposed to multiplying two large numbers. The same hardware can be used for encryption as well as decryption. For our project, we will implement an RSA encryption/decryption system using VHDL. 1. T. Cormen, C. Leiserson, R, Rivest, Introduction to Algorithms, McGraw Hill Publishers, 1991. 2. Hennessy,J.L., Patterson,D.A., Computer architecture: a quantitative approach, Morgan Kaufman Publishers, Inc., 1990. 3. T. ElGamal, A public key cryptosystem and a signature scheme based on discrete logarithms, IEEE Transactions on Information Theory, 31(4):469-472, July 1985. 4. R. L. Rivest, A. Shamir, and L. Adelman, A method for obtaining digital signatures and public-key cryptosystems, Commun. ACM, vol. 21, pp.120-126, 1978. 5.Cetin Kaya Koc, RSA Hardware Implementation, RSA Laboratories, Version 1.0, 1995.
Title: FPGA Implementation of the RSA Cryptography Algorithm Cryptography is a strong security measure to protect information, maintain privacy and confidentiality. As we move into an information society, cryptography has become one of the main tools for access control, electronic payments, corporate security on the internet and various types of networks. Many cryptography algorithms are currently available but RSA algorithm has become a standard for industrial-strength encryption, especially for data sent over the internet. The RSA algorithm was named after Ron Rivest, Adi Shamir and Len Adleman, who invented it in 1977. The basic ingredient in the RSA cryptosystem algorithm is its computational complexity. Our project presents the mathematical analysis of the RSA algorithm, where the public key and private key are used to encrypt and decrypt messages. Also, the analyzed RSA algorithm will be designed in VHDL and synthesized for implementation on a MAX+PLUS II Altera?s FPGA chip. [1] Denning, Dorothy E., "Cryptography and data security", Addison-Wesley Publishing Company, Inc., 1982. [2] Jennifer Seberry and Josed Pieprzyk, "Cryptography: An Introduction to Computer Security.", Prentice-Hall, 1989. [3] C. H. Meyer and S. M. Matyas, "Cryptography: A new dimension in computer data security", John Wiley & Sons Inc., 1982. [4] http://www.ssh.com/support/cryptography [5] http://oregonstate.edu/dept/honors/makmur/ [6] http://www.di-mgt.com.au/rsa_alg.html [7] http://www.rsasecurity.com/
Title: FPGA Implementation of the RSA Encryption Algorithm With todays rapidly evolving communications market the need for secure transmission of data is of critical importance. The RSA encryption algorithm (Rivest, Adi Shamir, Adleman), invented in 1977, is used in most public-key cryptography systems. Its security is based on the difficulty of factoring large numbers. In this project, a VHDL implementation of the RSA cryptographic system is presented based on the modular multiplication proposed by P. L. Montgomery. RSA is an asymmetric cryptosystem that uses one key (public key) to encrypt a message and a different key (private key) to decrypt it; therefore, both encryption and decryption circuits will be implemented. In order to optimize the speed of the circuits, the RSA encryption algorithm employed should be able to carry out modular exponentiation efficiently. For this purpose, the Montgomery multiplication will be used to perform the modular multiplication and squaring during the exponentiation process. [1] A. J. Menezes, P. van Oorschot, and S. Vanstone, Handbook of Applied Cryptology.,CRC Press, 1996 [2] Bruce Schneir, Applied Cryptography. Second Edition, John Wiley & Sons, 1996. [3] M. Shand and J. Vuillemin, Fast Implementations of RSA Cryptography., Proc. 11th IEEE Symposium on Computer Arithmetic, Canada, July 1993. [4] Richard Holowczak, RSA Demo Applet., http://cisnet.baruch.cuny.edu/holowczak/classes/9444/rsademo/#overview [5] RSA Laboratories, ?PKCS #1 v2.1: RSA Encryption Standard.? http://www.rsasecurity.com/rsalabs/pkcs/pkcs-1/index.html
Title: Anti-Aliasing for Computer Generated Images Anti-Aliasing is a very important part of generating high quality 3D images. There are various approaches to perform this task. The most common approach is a supersampling approach, where the image is generated at 2 or 4 times its resolution, and average the intensity of each pixel that is actually drawn. The main problem with this approach is that it is computationally very expensive. The goal of our project is to implement an anti-aliasing which is much less computationally expensive. Our apporach will be one which does not use supersampling, at least not on the whole scene. By detecting where the edges (jaggies) occur, we will calculate some weighting ratio to smooth out the jaggies. This approach is simple, but much less expensive than the supersampling approach mentionned above, thus making it an ideal candidate for real-time graphics processing. References: 1. Tsuyoshi Isshiki & Hiroaki Kunieda, "Efficient Anti-Aliasing Algorithm for Computer Generated Images" 2. Yuan-Hau Yeh & Chen-Yi Lee, "A New Anti-Aliasing Algorithm for Computer Graphics Images" 3. Foley, van Dam, Feiner, Hughes, "Computer Graphics: Principles and Practice" 2nd edition in C, Addison-Wesley, July 1997
Title: Hardware Implementation of the 2D Graphics Pipeline for Polygon Rendering The performance of most graphics system is largely dependant on the speed at which it can render primitives, such as circles and polygons.To be completely rendered, primitives have to go through the 2D graphics pipeline, which consists of three conceptual stages: the application program, which feeds commands to the CPU, the geometry subsystem, and the rasterization subsystem. Although all three parts may be implemented in software, the amount of calculation involved in rendering a primitive is very significant, and since the number of primitives in one image is typically very large, it is usually beneficial to accelerate parts of the pipeline in hardware. Thus, this project will focus on the hardware implementation of the geometry and rasterization subsystems of the graphics pipeline for polygon rendering. More specifically, we will consider triangle primitives because they are the most often used polygons in graphics. The geometry stage will perform per-vertex operations such as geometric transformations (rotations, translations) and clipping. The latter will be done using the Sutherland-Hodgeman clipping algorithm, which is easily implementable in hardware. The rasterization component, on the other hand, will work on individual pixels and its main function will be scan conversion. Since the polygons considered here are formed of three lines, they will be rasterized with the Bresenham's line algorithm. Finally, the main goal of this project will be to optimize throughput, i.e. to render as many polygons per second as possible. This objective will be achieved by further pipelining each component of the graphics system, which is itself already pipelined. [1] D. Hearn and M.P. Baker, "Computer Graphics, C Version", New Jersey: Prentice Hall, 1997. [2] J.D. Foley, A. van Dam, S.K. Feiner, J.F. Hughes, "Computer Graphics: Principles and Practice in C", second edition, Addison-Wesley, 1996. [3] E. Angel and D. Morrison, "Speeding Up Bresenham's Algorithm", IEEE Computer Graphics and Applications, vol. 11, no. 6, pp. 16-17, [4] M. Olano and T. Greer, "Triangle Scan Conversion using 2D Homogeneous Coordinates", presented at the 1997 SIGGRAPH / Eurographics Workshop on Graphics Hardware, [5] F. Pfenning, "Clipping and Scan Conversion", class notes for 15-462 Computer Graphics I, School of Computer Science, Carnegie Mellon University in Pittsburgh PA, March 2002
Title: Single-Color Texture-Mapping Unit with MIP-mapping and Bilinear Interpolation. Texture mapping is a fundamental technique for rendering high quality images inexpensively into a frame buffer by mapping an image, or a texture, to a polygon. Because different dimensions of textures and polygons may cause "sparkling", "moir" or "blocky" artifacts to appear, a texture mapping unit uses a "Texture Magnification Filter" when the texture is considerably smaller than the polygon and a "Texture Minification Filter" when a texture is considerably larger than the polygon. This paper describes a texture mapping unit that uses bilinear interpolation for texture magnification and the "Nearest MipMap Nearest" MIP-mapping algorithm for texture minification. This 2-D unit supports a frame buffer resolution of 256x256 pixels, a texture resolution up to 256x256 texels, and a single 8-bit color channel. It can be cascaded with other chips to generate a 3-color system, and can be arranged in parallel with other chips to support a larger frame buffer resolution. This design attempts to maximize throughput at the expense of resources; a given texture is immediately filtered and its corresponding MIP-maps are generated in order to optimize its mapping time into polygons. References: [1] D. Hearn and M. P. Baker, "Computer Graphics, C Version", 2nd edition, Prentice-Hall, 1994. [2] P. Heckbert, "Fundamentals of Texture Mapping and Image Warping", 1989, http://www-users.itlabs.umn.edu/classes/Fall-2001/csci5980/notes/ Texture_Mapping1.ppt#1. [3] M. Woo, J. Neider, and T. Davis, "OpenGL Programming Guide", 2nd edition, Addison-Wesley, 1998. [4] P. Haeberli and M. Segal, Silicon Graphics Inc. (SGI), "Texture Mapping as a Fundamental Drawing Primitive", 1993, http://www.sgi.com/grafica/texmap/.
Title: 2D Antialiased Line-Drawing Unit A fundamental operation on any raster graphics system is line drawing. Due to the inherent limitations in the resolution of the display device, lines tend to have a jagged or stair-step appearance. This distortion of information due to low-frequency sampling is known as aliasing. This attracts unwanted attention in both static and dynamic graphics. In the case of a moving image, aliased lines flicker and have crawling edges. Anti-aliasing seeks to compensate for this undersampling by varying the intensity of the border pixels to reduce the jagged effect. We are implementing a line-drawing unit that uses the Bresenham algorithm to draw the line, and Wu?s algorithm for anti-aliasing. The inputs to the unit will be the endpoints of the line, and the output will be the coordinates of the pixels as well as their intensity. The unit will be described in VHDL and synthesized on an FPGA. References: 1. Hearn, Donald - Computer graphics, C version (2nd Ed.), Upper Saddle River, N.J. : Prentice Hall, 1997. 2. Thaddeus, Jude - "Wu's Anti-aliasing Line Drawing" http://www.sanctuary.org/~azimuth/coding/aaline.html 3. Elias, Hugo - "Wu-Anti-aliased Lines" http://freespace.virgin.net/hugo.elias/graphics/x_wuline.htm
Title: A VHDL implementation of the MultiScale Retinex with Color Restoration Algorithm Subtle differences exist between the direct human observation of scenes and that of recorded color images. One of the big differences is the fact that the human visual system is able to distinguish details and vivid colors in shadows and in scenes that contain illuminant shifts. A number of algorithms have been proposed to bridge this gap. The Multiscale Retinex with Color Restoration (MSRCR) has shown itself to be one of the best algorithms in this field. This project will attempt to implement the MSRCR in hardware. This will allow photographers to get automatic image enhancement directly from their digital cameras. The algorithm can be summarized by * where Ii(x,y) denotes the image distribution in the i'th color spectral band, * the convolution operation, Fn(x,y) the surround function, N the number of scales, G is the weight associated with each scale, and Ci(x,y) is given by * where alpha is the color restoration constant and S=3 for images in the RGB color scheme. There are mainly two goals in this project; the first one is to achieve good approximation of this algorithm in hardware, whereas the second is to examine the performance of our hardware approach and compare it to the software based approaches. Simulations of an FPGA programmed using VHDL will be presented and compared to its matlab counterpart. 1 Jobson, D.J.; Rahman, Z.; Woodell, G.A.; Properties and performance of a center/surround retinex Image Processing, 2 Jobson, D.J.; Rahman, Z.; Woodell, G.A.; A multiscale retinex for bridging the gap between color images and the human observation of scenes Image Processing 3 Rahman, Z.; Jobson, D.J.; Woodell, G.A.; Multi-scale retinex for color image enhancement Image Processing, 1996. 4 Marina Balabanov and Yaron Zalika ; Image enhancement method based on Multiscale Retinex 5 A. K. Jain, Fundamentals of Digital Image Processing.
Title: Maximal Throughput Implementation of the Sobel Edge Detection Algorithm Image processing and recognition techniques have garnered a remarkable amount of interest over the past few years. Due to the complexity in such analysis, feature detection and extraction is the first step in most image segmentation techniques. One such method of accomplishing this extraction is thru edge detection. The primary goal of this process is to develop sets of closed pixel contours surrounding regions of interest in the image. The Sobel edge algorithm uses two convolution kernels to detect intensity differences along the x and y axis. The result of the operation can be represented as a 2D spatial gradient. This allows for the emphasis of high spatial frequency areas corresponding the edges of an image. As the processing of a video stream has substantial bandwidth requirements, the hardware implementation of the Sobel algorithm is quite useful in embedded and intelligent camera systems. This project will strive for a maximal throughput implementation, to allow for maximum frame rates and resolutions. Thus, the processing can occur in real-time without dropping any frames. 1. R.C. Gonzalez and R.E. Woods, "Digital Image Processing". 2nd Edition, Addison-Wesley, 1992. 2. Russ, John C. The Image Processing Handbook. CRC Press, Boca Raton, Florida, 1995. 3. James Matthews, "An Introduction to Edge Detection: The Sobel Edge Detector", 2002 4. R. Fisher, S. Perkins, A. Walker and E. Wolfar, "Sobel Edge Detector" 2000
Title: 2D Discrete Cosine Transform (DCT) for JPEG Image Compression The Discrete Cosine Transform (DCT) is currently the most widely used transform in image compression. Although there are other steps required in image compression, the majority of the processing required to encode or decode the images is done by the calculations performed in taking the DCT. The Discrete Cosine Transform is similar to the Discrete Fourier Transform however, it is much simpler because it avoids the use of complex numbers. The work it performs is the main bulk of the work done in image compression. In order to implement the DCT, the mathematical equation must be coded in VHDL. The 2D equation is overly convoluted for VHDL coding, thus as a simplification would be preferred. First it can be observed that the DCT is separable into 2 one-dimensional DCTs; one along the rows and one along the columns. Thus, in order to switch between rows and columns a transposing buffer will be required. This buffer consists of an 8x8 memory array for transposing matrices of partial products (see figure 1 in PDF aattachment). D.S. Taubman and M.W. Marcellin, JPEG2000 : image compression fundamentals, standards, and practice. Boston : Kluwer Academic Publishers, 2002. P. Symes, Video compression demystified. New York : McGraw-Hill, 2001. Pipelined Fast 2-D DCT Architecture for JPEG Image Compression.(September 2001)[Online]. 2003 The Cameron Project: High-Level Programming of Image Processing Applications on Reconfigurable Computing Machines1. (October 1998) R. C. Gonzalez, R. E. Woods,Digital image processing. Upper Saddle River, N.J. : Prentice Hall, c2002.
Title: Implementation and design of an EREC algorithm Most compression algorithms used today comprise some form of variable length coding (VLC).The basic idea behind VLC is to represent frequent data symbols by short bursts of binary digits,while data symbols that are less often encountered are represented by longer bursts of bits, resulting in a smaller average number of bits needed to represent all the symbols from the source. One of the main drawbacks of VLCs is that they are highly sensitive to error propagation.In order to be able to reconstruct a sequence of symbols from a sequence of bits, the receiver requires all bits to be transmitted correctly. The aim of this project is to implement and design an algorithm for adapting VLC schemes to give increased resilience to random and burst errors while maintaining suitable compression performance. Error resiliency will be achieved through the application of an EREC scheme that consists in multiplexing variable length codes into fixed length data blocks. The latter blocks are composed of slots, wherein each slot corresponds to the beginning of a VLC, thereby allowing the decoder to automatically synchronize at the beginning of each slot. The combination of error resiliency and data compression makes the scheme appropriate for data streams requiring high transfer rates. The scheme will be tested under different noise conditions, to assess the trade-off between its compression performance and error-resilience characteristics, in the context of an image compression application. Redmill, D.W.; Kingsbury, N.G.: The EREC: an error-resilient technique for coding variable-length blocks of data. IEEE Transactions on Image Processing, Volume: 5 Issue: 4, April 1996, pp. 565-574 Takishima, Y.; Wada, M.; Murakami, H : Variable length codes. IEEE Transactions on Communications, Volume: 43 Issue: 2 Part: 3 , Feb.-March-April 1995 pp. 158-162
Title: Character Recognition through Neural Networks This project seeks to apply neural networks in recognizing handwritten characters in the English language. A multi-layer Neural Network equipped with a back-propagation learning algorithm is used to enable fast learning. This allows for specific types of handwriting to be recognized faster over time (learning through the training of the neurons), thus allowing the program to be user-specific. The input layer of the MLP (Multi-Layer Perceptron) is a set of image pixels. An X by Y pixel image is mapped onto X by Y neurons. During the training period, the neurons are recursively fed characters to build their fault-tolerance (variance between two different As). The training is done with no pre-programming or bias. Using windowing (breaking-up the characters into four/six/eight equal pieces) expedites the recognition process. For example, if a set of pixels compromising the top left hand corner of the character matches the data in the trained neurons, the code need not do similar computations for the rest of the character. The aim of using Neural Networks coupled with VHDL is to automate and accelerate the process of character recognition. References: 1. E. W. Brown, "Character Recognition by Feature Point Extraction", unpublished paper written at Northeastern University, 1992 2. D. S. Levine; Introduction to Neural & Cognitive Modeling, 1991, Lawrence Erlbaum Associates, Publishers; Hillsdale, NJ 3. Patrick K. Simpson; Artificial Neural Systems, 1990, Pergamon Press, New York, NY 4. John Wester Introduction to Neural Networks http://www.nd.com/neurosolutions/products/ns/whatisNN.html
Title: Parallel String Matching with Broadcasting Scheme String matching is at the core of many modern computationally intensive activities such as DNA identification, internet search and database operation. Generally, string matching is performed at the software level but critical or large volume applications require a dedicated special purpose solution. On a single-CPU architecture the processor must load and then compare the pattern string piece by piece with the reference, and the task is even more complicated when some mismatches are tolerated. While the software approach is limited by its serial nature, the main benefit of an hardware implementation resides in its parallelism. In fact many comparisons can be performed simultaneously on one or many reference strings, speeding up the process by as much as the parallelism degree (number of streams used). In this paper, we present the VHDL design of a parallel string matching unit synthesized for the Altera family of FPGAs and addressing the exact matching and k-mismatches problems. Our approach is based on the method developed by Park and George [1], in which a broadcasting scheme with multiple streams is presented. This parallel scheme has two major advantages: (1) the number of multiple streams can be chosen to gain the maximum benefit of the parallelism and (2) the scheme can work on unknown sized reference strings. In addition, the broadcasting scheme solves the exact matching and the k-mismatches problems with a time complexity of O(n/d), where n is the length of the reference string and d represents the number of different input streams used. This algorithm achieves high performance by exploiting explicit parallelism. We then develop a parameterized VHDL entity in which the degree of parallelism is controlled explicitly by a variable number of input (and output) streams. 1. Jin H. Park and K. M. George, "Parallel String Matching Algorithms Based on Dataflow," Proceedings of the 32nd Hawaii International Conference on System Sciences, pp. 1-10, 1999. 2. Hsi-C. Lee and Fikret Ercal, "RMESH Algorithms for String Matching," Proceedings of the third International Symposium on Parallel Architectures, Algorithms, and Networks, pp. 223-226, 1997. 3. Dan Gusfield, "Algorithms on strings, trees, and sequences : computer science and computational biology", Cambridge, Cambridge University Press, 534 p., 1997.
Title: Implementation of a modified version of MIPS A RISC (Reduced Instruction Set Computer) is a processor whose design is based on the rapid execution of a sequence of simple instructions rather than a large variety of complex ones. The central idea is that by speeding up the most common, simple instructions, one could afford to pay a penalty in the unusual case of a complex instruction and make a large net gain in performance. There are several computer architectures based on this concept. Our project is going to be based on the MIPS (Millions of Instructions Per Second) architecture. Our goal is to implement a subset of MIPS using VHDL that deals with integer operations only (as opposed to floating point ones). This compact version will contain the most basic instructions of MIPS such as: load, store, add, subtract, branch, and set less than. Figure 1 shows the implementation of such an instruction set. In this pipelined architecture, each instruction will take about 4 or 5 clock cycles to compute. References: [1] J. L. Hennessy & D. A. Patterson, ?Appendix A ? Pipelining: Basic and Intermediate Concepts,? in Computer Architecture: A Quantitative Approach, 3rd Edition. San Francisco: Morgan Kaufmann, 2003. [2] G. Kane & J. Heinrich, MIPS RISC Architecture, 2nd Edition. Upper Saddle River, NJ: Prentice Hall, 1992. [3] D. Sweetman, See MIPS Run. San Francisco: Morgan Kaufmann, 1999. [4] A. Hui & J. Zappulla, School of Computer Science, Monash University, Clayton, Australia. ?MIPS Architecture Animation,? February, 2003, http://www.csse.monash.edu.au/packages/mips/.
Title: VHDL Implementation of Floating Point Division Floating point number representations allow us to use real numbers on a computer. Floating point numbers consist of a sign, exponent, mantissa, and base. Many applications use floating point division, hence there is a need to develop a floating point divider.The circuit is modeled using the Very High Speed Integrated Circuits Hardware Description Language(VHDL). Floating Point Numbers IEEE Single Precision Exponent-8 bits, Mantissa-23 bits, sign-1 bit = Total 32 bits Most computers support single (32-bit) precision formats. Single precision format can express numbers from (-3.4 E 38 to 3.4 E 38). The basic algorithm to divide two floating point numbers we are following is: - Divide divisor mantissa from the dividend mantissa - Subtract the exponent of the divisor from the dividend - Compute the sign of the result based on the sign of the two operands - Normalize the result The divide by zero case will be handled separately with IEEE standard to represent a zero will be followed. other exceptions like NaN will be handled as specified in IEEE standard. References: 1. David Goldberg, "Computer Arithmetic" (appendix H), Xerox Palo Alto Research Center, 2003
Title: Verification, Synthesis and Implementation of the CORDIC VHDL Code for a FPGA Device Microprocessors have traditionally used a software approach for implementing algorithms used in digital signal processing (DSP). This approach was primarily used because of the difficulty in mapping these algorithms into hardware. In addition, the software solution provided a cost effective and flexible method but suffered in terms of speed. With the arrival of reconfigurable logic computers, a high-speed hardware solution is possible for implementing these algorithms. The Coordinate Rotation Digital Compute (CORDIC) is a hardware efficient algorithm that can be mapped into FPGAs. It hence provides a more attractive alternative to the traditional software approach. The CORDIC Algorithm has found many applications such as high-end calculators, radars, robotics and DSP transforms. In this project, our main objective is to realize a hardware implementation of the most popular trigonometric and transcendental functions which CORDIC algorithm is known for viz. Sine, Cosine, Arctangent and Square Root. The CORDIC algorithm uses a simple and efficient approach based on shiftaddition techniques together with vector rotations. These techniques are implemented in VHDL and mapped into FPGAs. The performance of the design is then compared to a software alternative. References: 1) J. Abruzzo, Applicability of CORDIC Algorithm to Arithmetic Processing, IEEE, 1985. 2) R. Andraka, A survey of CORDIC algorithms for FPGA based computers, ACM, Monterey, 1998. 3) Behroos, Parhami, Computer arithmetic: algorithms and hardware design,Oxford University Press, 2000. 4) P. Jarvis, Implementing CORDIC Algorithms, Dr. Dobbs Journal, Oct 1990. 5) J. Volder, The CORDIC computing technique, IRE Trans. Computers, 1959. 6) http://www.andraka.com/files/crdcsrvy.pdf 7) http://www.ee.byu.edu/ee/class/ee621/Lectures/L22.PDF
Title: Emulation of quantum circuit elements and Grovers Search Algorithm using FPGA technology This project involves construction of some key elements of quantum circuits namely the Hadamard gates, controlled NOT-gates (CNOT gate) and the Conditional Phase-Shift gates. Since the behavior of these quantum gates does not conform fully to conventional circuit behavior, we would emulate the quantum behavior using traditional subsystems synthesized on a FPGA. In order to measure the usefulness of our emulated quantum gates we would then emulate the Grovers Search algorithm using our gates. The field of quantum computation and cryptography is at its elemental stage where scientists and engineers are researching hard to understand complicated quantum procedures and useful implementations of exciting quantum phenomena. The difficulty currently faced in simulating such experiments in software is that software simulations can not successfully simulate the level of parallelism involved in quantum computations. One solution to this problem is to emulate these quantum systems using non-Von Neumann based hardware technologies (such as FPGAs) which allow us to perform these tasks in hardware where parallelism of quantum systems can be more effectively emulated. References: 1. V. V. Shende, A. K. Prasad, I. L. Markov and J. P. Hayes, `Reversible Logic Circuit Synthesis', in Proc. ACM/IEEE Intl. Conf. Comp.-Aided Design, pp. 353-360, November 2002. 2. G. F. Viamontes, M. Rajagopalan, I. L. Markov and J. P. Hayes, `Gate-level Simulation of Quantum Circuits', Proc. Asia and South-Pacific Design Automation Conf., pp. 295-301, January 2003. 3. Michael Nielson, Quantum Computation and Information
Title: Built-In System-Test for an ALU The increased functionality of todays microelectronic circuits, along with faster, denser, and smaller packaging is making it increasingly difficult to access, test, and verify chip and system functionality using traditional external equipment and instruments. Often, the circuits being tested are more complex than the testing units themselves, and so a complementary testing paradigm is needed. Built-In Self-Test (BIST) is one such paradigm. BIST attempts to move as much of the tester functionality as possible onto the silicon. Embedding the tester equipment functionality into the semiconductor product reduces the burden on and complexity of external test equipment, and increases speed since on-chip access is faster. Our BIST project involves the design and implementation of a Built-In Self Test for a Arithmetic and Logic Unit. The overall design consists of three major parts: ALU, BIST (key Components: linear feedback shift registers / multiple input signature registers ) and a register-based micro-controller. The BIST system will test the correct working of the ALU unit in an efficient and time conserving manner. We will be creating our own ALU in order to ease the simulation of faults. The output signatures of standard non-erroneous ALU will be compared to the ALU under test in order to gauge the correctness of its outputs. REFERENCES: Avra, L.J.; McCluskey, E.J. Synthesizing for scan dependence in built-in self-testable designs. Visited Feb 20, 2003. Kiefer, G.; Wunderlich, H.-J. Using BIST control for pattern generation. Visited Feb 20, 2003. Mohamed, Abdil Rashid. Built-In Self-Test (BIST) http://www.ida.liu.se/~zebpe/teaching/test/lec12.pdf. ; Visited Feb 20, 2003.
Title : General Purpose Processor There is a broad range of engineering concepts that could be implemented on almost all electronic gadgets, home appliances and cars, which in most cases are avoided due to the cost and complexity of the hardware. Thus were planning to develop a simple processor that is fully synthesizable with the simplest instruction set possible, and the least cost to develop it. This processor will have a control unit, an arithmetic logic unit (ALU), register unit, and a memory. The instruction set will be composed of load, store, add, subtract, shift, AND, OR, XOR, jump and halt. Because the objective of this processor is not to be implemented for real time systems we wouldnt worry about its frequency but the challenge is to make it reliable and as small as possible. Also the advantage is that it consumes so little silicon; there will be plenty of room to add other peripherals. [1] Patterson, David A., and Hennessy, John L., Computer Organization & Design - The Hardware/Software Interface, Second Edition, Morgan Kaufmann, San Francisco, 1998, ISBN 1-55860-428-6. [2] Structured Computer organization, Andrew S. Tanenbaum Third edition 1990, Prentice-Hall, Inc. ISBN 0-13-852875-1 [3] http://ciips.ee.uwa.edu.au/~morris/CA406/CA_ToC.html [4] http://www.vautomation.com/Press/cd9707.html
Title: A VHDL implementation of the single cycle MIPS data path RISC processors are simple machines that can execute simple instructions at a high frequency. One of these processors is called MIPS. This project will implement a VHDL model of the single cycle MIPS datapath. This project can serve as a learning tool for beginner computer and electrical engineering students. The processor supports all basic operations including arithmetic, logical and branching instructions, which form the basis for all modern assembly code. The aim of our project is to create a simple VHDL implementation of the single cycle MIPS processor datapath that can be independently used to simulate all MIPS core (non pseudo) instructions. We intend to implement the datapath by writing our own components in VHDL and using Altera LPMs to simulate data and instruction memory. Testing will be done by writing our own machine code which will subsequently be loaded into the instruction memory before the processor is started. We expect to have the results in the data memory. Our design will focus on simplicity and realiability to best suit our target audience. 1. John L. Hennessy, David A. Patterson, "Computer Organization and Design The Hardware/Software Interface" Morgan Kaufmann Publishers, 1998. 2. John. L. Hennessy, David A. Patterson, "Computer Architecture, A Quantitative Approach (3rd Edition)" Morgan Kaufmann, 2002. 3.http://www-2.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15740-f97/public/doc/ mips-isa.pdf Title: MIPS IV Instruction Set, 1995 Author: Charles Price
Title: A 4-way Set-Associative Cache Management Unit Cache is the name generally given to the first level of the memory hierarchy encountered once the address of a block of data leaves a CPU. It plays a very important role in computer performance, and deserves more attention and appreciation from computer users. In order to fully understand the functionality of a cache, one must examine the following four questions: 1. Where can a block be placed in a cache? 2. How is a block found if it is in the cache? 3. Which block should be replaced on a cache miss? 4. What happens on a write? In our project, we will be implementing a 4-way set-associative cache management unit using VHDL. The four questions that were previously mentioned will be our design method ingredients. Our design would take, as inputs, block addresses (from the CPU) and give out, as outputs, control signals to the cache and to the lower level of memory hierarchy. This lower level of memory hierarchy is the random access memory, where the cache receives its blocks of data. In general these control signals will demonstrate how the CPU, cache, and RAM interface with each other. Design methods for performance optimization will also be one of our main objectives. For instance, we will be using the write-back scheme with no-write-allocate as our design method of writing since we believe this will lead to better performance. 1. Patterson, Hennessy, Computer Organization And Design, The Hardware/Software Interface, Morgan Kaufmann. 2. Patterson, Hennessy, Computer Architecture: A Quantitative Approach, Morgan Kaufmann. 3. Hayes, Computer Architecture And Organization, McGraw Hill 4. Denis Howe, "The Free Online Dictionary of Computing", http://foldoc.doc.ic.ac.uk/.
Title: Electric Snowmobile DC Brushless Motor PWM Controller The McGill Electric Snowmobile Team requires an electrically efficient controller for its DC brushless motor. The current controller has many features that are not necessary and lacks features that would be beneficial. Under the recommendation of Professor Ooi, we have chosen to design a motor controller that can adjust voltage levels depending on the throttle with multiple supporting features which will be tuned for direct use with the electric snowmobile. The design has to be energy efficient since the electric snowmobile needs to run in harsh conditions. Our current battery packs are regular lead-acid ones which do not perform well in the cold. We therefore need to keep the design small for minimal power consumption. Furthermore, since the team is fairly new, money is limited so smaller FPGA?s would reduce the cost. A detailed description of PWM is described in the attached PDF file (page 4-6). Since the snowmobile has a DC brushless motor, it requires a 3-phase AC signal to run. This AC signal needs to vary in amplitude to increase or decrease the torque output of the motor. However, the batteries used supply a ?constant? DC voltage. This is where the controller comes in. The controller adjusts the constant DC voltage to a varying DC voltage depending on the throttle by switching two transistors (complements of each other) on and off for specific periods of time (PWM). We will also implement the following convenient features for the snowmobile specifically: -Synchronous switching -Regenerative -Reverse mode -Reverse and brake light control Cruise control with speed resume -Extreme over-under voltage protection -Dead-time tuning to support different transistor technology as well as 7 and 8-bit duty-cycle resolution 1. Adel S. Sedra, Kenneth C. Smith, "Microelectronic Circuits", Fourth Edition, Oxford University Press, 1998. 2. Bus Compatible Digital PWM Controller, IXDP 610 http://www.ixys.com/91600.pdf 3. New Generation Motors. " NGM-EVC-200 series controller Operating Manual", Version 1.10D.
Title: A Missile Interception Model Missile interception is a defensive tactic which consists of launching a defensive blocking missile (the pursuer) in response to an ennemy balistic missile attack (the target). The pursuer's goal is to disable the target missile through a frontal collision in orbit. If the target manages to penetrate the atmosphere, there is nothing that can be done to stop it anymore. The tricky part in intercepting a missile is that the target is pre-programmed to change it's trajectory during it's flight. So the real pursuer is equipped with sensors that helps detect any movement. Our project will focus on a comparison between the hardware based approach (VHDL) and the software based approach (using C and/or Matlab). Each simulation will consist of a target launch followed by a pursuer launch. The progression of both entities will br traced and recorded. The final outcome will detect if the pursuer was able to make contact and intercept the target, or if the pursuer missed the target. REFERENCES: 1- Zarchan, P. (1997) "Tactical and Strategic Missile Guidance" American Institute of Aeronautics and Astronautics 2- Ben-Asher, J. and Yaesh, I. (1998) "Advances in Missile Guidance Theory" American Institute of Aeronautics and Astronautics 3- Welch, G. and Bishop, G. (2002) "An introduction to the Kalman Filter" Department of Computer Science, University of North Carolina
Title Testing and Implementation of Convolutional Encoder and Viterbi Decoder in VHDL. With the advent of wireless communication, an increasing amount of data is being channeled through transmission lines. The occurrence of unwanted noise and interference is therefore a serious issue that must be dealt with since errors are being introduced into the transmission. One way to solve this problem is using an error correcting code scheme such as the convolutional encoder and Viterbi decoder. The Viterbi decoder logically explores in parallel every possible data sequence. It compares each data component with different possible paths and picks up the closest match. For the implementation of the encoder/decoder in VHDL, the specification will be a 2 bit input, 1 bit output and memory order of 3. The secondary goal is to create a test bench to measure the ability of the convolutional encoder and Viterbi decoder to fix errors in the signal. It is necessary to test extensively this coding scheme so that its performance can be compared to simulations done in C++ and MATLAB. Finally, the noise and interference will be simulated on the FPGA chip. [1] Wang, Y et al. DSP Applications for Real Time Equalization, 2002. Note: Paper written for ECSE-494C. [2] Garr, David A. Iterative Decoding for the GSM System, 1998. [3] Huang, Fu-Hua. Chapter 3: Turbo Code Encoder, http://scholar.lib.vt.edu/theses/available/etd-71897-15815/unrestricted/chap3. pdf [4] Sklar, Bernard. Digital Communications Fundamentals and Applications Second Edition. New Jersey: Prentice HALL, 2001.
Title: VHDL Implementation of Selective Repeat Error Recovery In today's rapid advancement and intensive usage of communications, the accuracy and efficiency of data transmission across networking channels is very significant. In order to achieve reliable data transfer between two end systems, an Automatic Repeat Request (ARQ) protocol must be used. Stop-And-Wait, Go-Back-N, and Selective Repeat are the three ARQ protocols in use today, the latter being the most efficient and complex one. These schemes make use of flow control, sequence numbers, acknowledgements, and timers to ensure that the data is delivered from a sending process to a receiving process correctly and in order. Even though the Selective Repeat procedure is one of the most complex ARQ protocol, its 'renowned high channel efficiency' creates our motivation to implement it in VHDL. In this project, a simplified version of the Selective Repeat error recovery will be developed. Nevertheless, the simplifications will not alter the error recovery method used by this ARQ protocol, a method consisting in retransmitting the lost packets only. The SR error recovery operation will be observed through the simulation of a sender-receiver data transaction, and its performance will be evaluated by its protocol throughput and its rate of data delivery. References: [1] J. F. Kurose, K. W. Ross, Computer Networking: A Top-Down Approach Featuring the Internet, Addison-Wesley, 2nd edition, 2003 [2] Srinidhi Varadarajan, Reliable Transmission: A State Machine Reliable Transmission http://courses.cs.vt.edu/~cs5516/spring02/reliable_tx_1.pdf [3] Don Towsley, Chapter 3: Transport Layer http://www.cs.umd.edu/~shankar/417-F01/Slides/chapter3a/sld033.htm [4] http://www.erg.abdn.ac.uk/users/gorry/course/arq-pages/sr.html
Back to top