星载嵌入式系统容错技术及并行处理技术

CORC > 自动化研究所 > 中国科学院自动化研究所 > 毕业生 > 博士学位论文

题名	星载嵌入式系统容错技术及并行处理技术
作者	袁柳
学位类别	工学博士
答辩日期	2016-05-27
授予单位	中国科学院研究生院
授予地点	北京
导师	杨一平 ; 贾品贵
关键词	嵌入式系统 NAND Flash 容错纠错码多处理器并行任务调度
学位专业	计算机应用技术
中文摘要	随着航天遥感技术的发展，星上采集的数据量增大，星载嵌入式系统的处理能力和存储能力必须得到提升。在存储方面，必须采用大容量存储器将采集到的数据可靠存储下来。在处理方面，必须采用多处理器并行处理架构保证处理能力。本文主要从大容量存储器的容错技术和多处理器的并行处理技术两个方面研究嵌入式系统的关键技术。大容量存储器是星载嵌入式系统的重要器件，只有保证采集到的数据可靠存储，才能进行后续的处理工作。NAND Flash是最常用的大容量存储设备之一。由于位翻转效应的存在，NAND Flash必须添加高效的、可移植性高的容错措施才能保证可靠存储。本文的主要研究工作有： 1）本文首先对比分析现有的容错方法，提出了一种能够通过软件快速实现的BCH码译码方法。利用余式比较法求校正子、简化的PGZ方法求错误位置多项式，Zinoviev法求错误位置多项式的根。与现有方法相比能降低复杂度，使软件实现与硬件实现具有相比拟的编译码性能，获得更高的灵活性。 2）提出一种基于Error locality特性的读错误修正方法。Error locality特性告诉我们两次写数据中间有近60%到90%读错误的位置保持不变。本方法在读数据时预先比较修正Error Map中记录的具有Error locality特性的错误位置信息，使得需要纠错码纠正的错误位数大大降低。 3）本文还提出一种用于MLC型NAND Flash的动态ECC纠错方法。包括三个模块：ECC选择模块通过可靠性查找表预估BER，选择合适的ECC；错误修正模块借助2）中提出的读错误修正方法预先修正具有Error locality特性的错误；动态ECC纠错模块提供不同能力的ECC。本方法具有更低的存储冗余和时间开销，基于ADSP-TS201（主频600MHz）的平均读数据吞吐率达到485.4Mbps，满足大部分的实时数据处理需求。且应用灵活，可移植性高，使得纠错控制编码的使用对NAND Flash读写效率的影响大大降低。大容量存储器中存储的数据，需要在星上进行实时快速处理。为了达到这一目的，目前星上嵌入式系统均采用多处理器并行处理架构。因此并行能力的发挥对嵌入式系统的处理能力具有至关重要的作用。这就需要高效的并行任务调度算法提升处理器利用率。本文的主要研究工作有： 1）本文给出一种通用的并行任务调度问题的系统架构，分为处理器模型、任务模型、任务调度策略和评价指标四部分。在分析本课题组嵌入式系统应用环境的特点之后，建立最为常用的并行任务模型：DAG任务模型和Fork-Join任务模型，分别进行任务调度算法研究。 2）针对任务之间依赖性较强的DAG任务模型，本文提出了一种新的任务调度算法。将前驱任务节点分为三类，在保证最早开始时间的前提下，优先考虑关键前驱节点所在处理器是否满足直接插入、通信间隙插入及关键前驱复制条件。然后对自由非关键前驱进行分配，依次考虑能否跳过非关键前驱、直接插入现有处理器或增加新处理器。本方法非关键前驱任务也能得到合并，可以降低处理器消耗，减少调度时长，提高处理器利用率。 3）针对规则性较强的Fork-Join任务模型，本文提出一种通信受限条件下的TSFJ_SC调度算法。针对串行通信场合，可以严格限制处理器的数目，以规定的最少处理器完成任务调度。引入busy-window的概念模拟信道占用情况，在不推迟最早开始时间的前提下优先考虑插入主处理器、通信间隙占用、插入从处理器、插入空白新处理器等分配规则。本方法可以避免通信冲突，在处理器数目、调度长度、处理器利用率等方面均具有十分优越的效果。通过以上研究，大容量存储器的容错技术能够保证星载嵌入式系统可靠高效的存储采集到的遥感数据。多处理器并行任务调度算法的研究保证嵌入式系统具有良好的并行性能，采集的数据能够快速实时处理。以上两项关键技术的突破对星载嵌入式系统的推广实现具有重要的研究意义。
英文摘要	Along with the development of sensor technology, remote sensing data quantity will be more and more increasing, so that the demand of data storage and data processing in on-board embedded system will also be increasing. The large amounts of data need to be stored into the mess storage safely. When it comes to data processing, multiprocessor parallel architecture should be proposed to improve the processing capacity. In this paper, we mainly focus on the research on the fault tolerant technologies for mass storage and parallel task scheduling technologies for multiprocessor. Mass storage is one of the most important components in on-board embedded system. Only if the guaranteeing the security of data storage, could the mass storage take fully used in the embedded system. NAND Flash memories have already been widely used as mass storage. However, they also suffer from bit-flip errors. In order to enhance reliability, the efficient and portable fault tolerant technologies for NAND Flash memories should be designed. The contributions of this paper are as follows: 1) After the comparisons of the existing error correction codes (ECC), a more efficient BCH decoding method is introduced to improve decoding performance. The number of elements in syndrome is equal to the error correction capability. The simplified Peterson-Gorenstein-Zierler (PGZ) method is applied to calculate the error location polynomial. The roots of error location polynomial are calculated by Zinoviev method. Through proposed BCH decoding method, the BCH coding time can be obviously decreased, making it possible to replace hardware methods with software methods. 2) An error merging method is proposed to take advantage of the sustained errors which have ``Error locality" characters. The location of errors which account for 60% to 90% of total errors will remain the same. Through the comparison with the error location information in the error map, the number of errors to be corrected by ECC will be decreased. 3) Based on the observations of NAND Flash memories' error patterns, we propose an error patterns guided ECC system to improve coding performance for MLC NAND Flash memories. The ECC system is composed of three modules: ECC selection module provides a method to assign appropriate ECC for each memory page. Error merging module presents the error merging method to reduce the errors in read mode. Adaptive ECC codec module provides various levels of protection based on the ECC selection result. Low coding latency, low redundancy and high flexibility can be obtained by the upper methods. The data throughput by ADSP-TS201 (600MHz) can be average 485.4Mbps. The data throughput can make the software implementation suitable for real-time demands. The large amounts of data in mass storage need to be processed in real time. The increasing of the data scale will lead to more stress in real-time processing. In order to increase the processing capability, multiprocessors based parallel architecture will be applied in embedded systems. Make full use of the parallelization ability is the main effect factor of the parallel system. So an efficient task scheduling method needs to be designed to assign the tasks into multiprocessors efficiently. The contributions in this paper are as follows: 1) The structure of task scheduling is provided, which is composed of processor model, task model, task scheduling methods and evaluation criteria. Through task scheduling architecture, a common solution of parallel computing will be provided. After the analysis of the application environment for the embedded system, we build the most commonly used task scheduling models: Directed Acyclic Graph (DAG) and Fork-Join models. Then we can find task scheduling methods separately. 2) A new scheduling algorithm is proposed to schedule the DAG task models. The predecessors will be divided into three types, on the premise of the earliest start time of each task, we first provide the strategies as: the insertion of processor with the critical predecessors, interval insertion and task-duplication. Then the freedom non-critical predecessors will also be scheduled, the strategies as skipping, slave processor insertion and task-duplication will be provided. The non-critical predecessors can also be clustered by proposed method, the schedule length and the number of processors can be both reduced; more efficient task scheduling can be obtained. 3) A TSFJ_SC (Task Scheduling of Fork-join tasks with Serial Communication) method is introduced to schedule fork-join tasks on multiprocessors to avoid communication conflict. The number of processors can be strictly controlled. Busy-window is introduced to avoid the communication conflict. Strategies such as P0 insertion strategy, interval insertion strategy, task allocation strategy and so on are introduced to obtain the shorter schedule length. The number of processors can be strictly limited and the efficiency of each processor can be greatly improved by TSFJ_SC method. Through above researches, the fault tolerant technologies for mass storage can ensure the embedded system has high reliable, high memory capability, and rapid read-write throughput. The task scheduling methods for multiprocessors can ensure the embedded system has good performance in parallel, the processing capability can be improved to satisfy the real-time demand. Above two key technologies have great impact on the application of the on-board embedded system.
语种	中文
学科主题	工学
内容类型	学位论文
源URL	[http://ir.ia.ac.cn/handle/173211/11452]
专题	毕业生_博士学位论文
作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	袁柳. 星载嵌入式系统容错技术及并行处理技术[D]. 北京. 中国科学院研究生院. 2016.