dsp - CMSIS DSP Library from CMSIS 2.0. See http://www.…

Users » simon » Code » dsp

CMSIS DSP Library from CMSIS 2.0. See http://www.onarm.com/cmsis/ for full details

Dependents: K22F_DSP_Matrix_least_square BNO055-ELEC3810 1BNO055 ECE4180Project--Slave2 ... more

src/Cortex-M4-M3/FilteringFunctions/arm_correlate_f32.c@0:1014af42efd9, 2011-03-10 (annotated)

Committer:: simon
Date:: Thu Mar 10 15:07:50 2011 +0000
Revision:: 0:1014af42efd9

Who changed what in which revision?

User	Revision	Line number	New contents of line
simon	0:1014af42efd9	1	/* ----------------------------------------------------------------------------
simon	0:1014af42efd9	2	* Copyright (C) 2010 ARM Limited. All rights reserved.
simon	0:1014af42efd9	3	*
simon	0:1014af42efd9	4	* $Date: 29. November 2010
simon	0:1014af42efd9	5	* $Revision: V1.0.3
simon	0:1014af42efd9	6	*
simon	0:1014af42efd9	7	* Project: CMSIS DSP Library
simon	0:1014af42efd9	8	* Title: arm_correlate_f32.c
simon	0:1014af42efd9	9	*
simon	0:1014af42efd9	10	* Description: Correlation for floating-point sequences.
simon	0:1014af42efd9	11	*
simon	0:1014af42efd9	12	* Target Processor: Cortex-M4/Cortex-M3
simon	0:1014af42efd9	13	*
simon	0:1014af42efd9	14	* Version 1.0.3 2010/11/29
simon	0:1014af42efd9	15	* Re-organized the CMSIS folders and updated documentation.
simon	0:1014af42efd9	16	*
simon	0:1014af42efd9	17	* Version 1.0.2 2010/11/11
simon	0:1014af42efd9	18	* Documentation updated.
simon	0:1014af42efd9	19	*
simon	0:1014af42efd9	20	* Version 1.0.1 2010/10/05
simon	0:1014af42efd9	21	* Production release and review comments incorporated.
simon	0:1014af42efd9	22	*
simon	0:1014af42efd9	23	* Version 1.0.0 2010/09/20
simon	0:1014af42efd9	24	* Production release and review comments incorporated
simon	0:1014af42efd9	25	*
simon	0:1014af42efd9	26	* Version 0.0.7 2010/06/10
simon	0:1014af42efd9	27	* Misra-C changes done
simon	0:1014af42efd9	28	*
simon	0:1014af42efd9	29	* -------------------------------------------------------------------------- */
simon	0:1014af42efd9	30
simon	0:1014af42efd9	31	#include "arm_math.h"
simon	0:1014af42efd9	32
simon	0:1014af42efd9	33	/**
simon	0:1014af42efd9	34	* @ingroup groupFilters
simon	0:1014af42efd9	35	*/
simon	0:1014af42efd9	36
simon	0:1014af42efd9	37	/**
simon	0:1014af42efd9	38	* @defgroup Corr Correlation
simon	0:1014af42efd9	39	*
simon	0:1014af42efd9	40	* Correlation is a mathematical operation that is similar to convolution.
simon	0:1014af42efd9	41	* As with convolution, correlation uses two signals to produce a third signal.
simon	0:1014af42efd9	42	* The underlying algorithms in correlation and convolution are identical except that one of the inputs is flipped in convolution.
simon	0:1014af42efd9	43	* Correlation is commonly used to measure the similarity between two signals.
simon	0:1014af42efd9	44	* It has applications in pattern recognition, cryptanalysis, and searching.
simon	0:1014af42efd9	45	* The CMSIS library provides correlation functions for Q7, Q15, Q31 and floating-point data types.
simon	0:1014af42efd9	46	* Fast versions of the Q15 and Q31 functions are also provided.
simon	0:1014af42efd9	47	*
simon	0:1014af42efd9	48	* \par Algorithm
simon	0:1014af42efd9	49	* Let <code>a[n]</code> and <code>b[n]</code> be sequences of length <code>srcALen</code> and <code>srcBLen</code> samples respectively.
simon	0:1014af42efd9	50	* The convolution of the two signals is denoted by
simon	0:1014af42efd9	51	* <pre>
simon	0:1014af42efd9	52	* c[n] = a[n] * b[n]
simon	0:1014af42efd9	53	* </pre>
simon	0:1014af42efd9	54	* In correlation, one of the signals is flipped in time
simon	0:1014af42efd9	55	* <pre>
simon	0:1014af42efd9	56	* c[n] = a[n] * b[-n]
simon	0:1014af42efd9	57	* </pre>
simon	0:1014af42efd9	58	*
simon	0:1014af42efd9	59	* \par
simon	0:1014af42efd9	60	* and this is mathematically defined as
simon	0:1014af42efd9	61	* \image html CorrelateEquation.gif
simon	0:1014af42efd9	62	* \par
simon	0:1014af42efd9	63	* The <code>pSrcA</code> points to the first input vector of length <code>srcALen</code> and <code>pSrcB</code> points to the second input vector of length <code>srcBLen</code>.
simon	0:1014af42efd9	64	* The result <code>c[n]</code> is of length <code>2 * max(srcALen, srcBLen) - 1</code> and is defined over the interval <code>n=0, 1, 2, ..., (2 * max(srcALen, srcBLen) - 2)</code>.
simon	0:1014af42efd9	65	* The output result is written to <code>pDst</code> and the calling function must allocate <code>2 * max(srcALen, srcBLen) - 1</code> words for the result.
simon	0:1014af42efd9	66	*
simon	0:1014af42efd9	67	* <b>Fixed-Point Behavior</b>
simon	0:1014af42efd9	68	* \par
simon	0:1014af42efd9	69	* Correlation requires summing up a large number of intermediate products.
simon	0:1014af42efd9	70	* As such, the Q7, Q15, and Q31 functions run a risk of overflow and saturation.
simon	0:1014af42efd9	71	* Refer to the function specific documentation below for further details of the particular algorithm used.
simon	0:1014af42efd9	72	*/
simon	0:1014af42efd9	73
simon	0:1014af42efd9	74	/**
simon	0:1014af42efd9	75	* @addtogroup Corr
simon	0:1014af42efd9	76	* @{
simon	0:1014af42efd9	77	*/
simon	0:1014af42efd9	78	/**
simon	0:1014af42efd9	79	* @brief Correlation of floating-point sequences
simon	0:1014af42efd9	80	* @param[in] *pSrcA points to the first input sequence.
simon	0:1014af42efd9	81	* @param[in] srcALen length of the first input sequence.
simon	0:1014af42efd9	82	* @param[in] *pSrcB points to the second input sequence.
simon	0:1014af42efd9	83	* @param[in] srcBLen length of the second input sequence.
simon	0:1014af42efd9	84	* @param[out] pDst points to the location where the output result is written. Length 2 max(srcALen, srcBLen) - 1.
simon	0:1014af42efd9	85	* @return none.
simon	0:1014af42efd9	86	*/
simon	0:1014af42efd9	87
simon	0:1014af42efd9	88	void arm_correlate_f32(
simon	0:1014af42efd9	89	float32_t * pSrcA,
simon	0:1014af42efd9	90	uint32_t srcALen,
simon	0:1014af42efd9	91	float32_t * pSrcB,
simon	0:1014af42efd9	92	uint32_t srcBLen,
simon	0:1014af42efd9	93	float32_t * pDst)
simon	0:1014af42efd9	94	{
simon	0:1014af42efd9	95	float32_t pIn1; / inputA pointer */
simon	0:1014af42efd9	96	float32_t pIn2; / inputB pointer */
simon	0:1014af42efd9	97	float32_t pOut = pDst; / output pointer */
simon	0:1014af42efd9	98	float32_t px; / Intermediate inputA pointer */
simon	0:1014af42efd9	99	float32_t py; / Intermediate inputB pointer */
simon	0:1014af42efd9	100	float32_t pSrc1; / Intermediate pointers */
simon	0:1014af42efd9	101	float32_t sum, acc0, acc1, acc2, acc3; /* Accumulators */
simon	0:1014af42efd9	102	float32_t x0, x1, x2, x3, c0; /* temporary variables for holding input and coefficient values */
simon	0:1014af42efd9	103	uint32_t j, k = 0u, count, blkCnt, outBlockSize, blockSize1, blockSize2, blockSize3; /* loop counters */
simon	0:1014af42efd9	104	int32_t inc = 1; /* Destination address modifier */
simon	0:1014af42efd9	105
simon	0:1014af42efd9	106
simon	0:1014af42efd9	107	/* The algorithm implementation is based on the lengths of the inputs. */
simon	0:1014af42efd9	108	/* srcB is always made to slide across srcA. */
simon	0:1014af42efd9	109	/* So srcBLen is always considered as shorter or equal to srcALen */
simon	0:1014af42efd9	110	/* But CORR(x, y) is reverse of CORR(y, x) */
simon	0:1014af42efd9	111	/* So, when srcBLen > srcALen, output pointer is made to point to the end of the output buffer */
simon	0:1014af42efd9	112	/* and the destination pointer modifier, inc is set to -1 */
simon	0:1014af42efd9	113	/* If srcALen > srcBLen, zero pad has to be done to srcB to make the two inputs of same length */
simon	0:1014af42efd9	114	/* But to improve the performance,
simon	0:1014af42efd9	115	* we include zeroes in the output instead of zero padding either of the the inputs*/
simon	0:1014af42efd9	116	/* If srcALen > srcBLen,
simon	0:1014af42efd9	117	* (srcALen - srcBLen) zeroes has to included in the starting of the output buffer */
simon	0:1014af42efd9	118	/* If srcALen < srcBLen,
simon	0:1014af42efd9	119	* (srcALen - srcBLen) zeroes has to included in the ending of the output buffer */
simon	0:1014af42efd9	120	if(srcALen >= srcBLen)
simon	0:1014af42efd9	121	{
simon	0:1014af42efd9	122	/* Initialization of inputA pointer */
simon	0:1014af42efd9	123	pIn1 = pSrcA;
simon	0:1014af42efd9	124
simon	0:1014af42efd9	125	/* Initialization of inputB pointer */
simon	0:1014af42efd9	126	pIn2 = pSrcB;
simon	0:1014af42efd9	127
simon	0:1014af42efd9	128	/* Number of output samples is calculated */
simon	0:1014af42efd9	129	outBlockSize = (2u * srcALen) - 1u;
simon	0:1014af42efd9	130
simon	0:1014af42efd9	131	/* When srcALen > srcBLen, zero padding has to be done to srcB
simon	0:1014af42efd9	132	* to make their lengths equal.
simon	0:1014af42efd9	133	* Instead, (outBlockSize - (srcALen + srcBLen - 1))
simon	0:1014af42efd9	134	* number of output samples are made zero */
simon	0:1014af42efd9	135	j = outBlockSize - (srcALen + (srcBLen - 1u));
simon	0:1014af42efd9	136
simon	0:1014af42efd9	137	while(j > 0u)
simon	0:1014af42efd9	138	{
simon	0:1014af42efd9	139	/* Zero is stored in the destination buffer */
simon	0:1014af42efd9	140	*pOut++ = 0.0f;
simon	0:1014af42efd9	141
simon	0:1014af42efd9	142	/* Decrement the loop counter */
simon	0:1014af42efd9	143	j--;
simon	0:1014af42efd9	144	}
simon	0:1014af42efd9	145
simon	0:1014af42efd9	146	}
simon	0:1014af42efd9	147	else
simon	0:1014af42efd9	148	{
simon	0:1014af42efd9	149	/* Initialization of inputA pointer */
simon	0:1014af42efd9	150	pIn1 = pSrcB;
simon	0:1014af42efd9	151
simon	0:1014af42efd9	152	/* Initialization of inputB pointer */
simon	0:1014af42efd9	153	pIn2 = pSrcA;
simon	0:1014af42efd9	154
simon	0:1014af42efd9	155	/* srcBLen is always considered as shorter or equal to srcALen */
simon	0:1014af42efd9	156	j = srcBLen;
simon	0:1014af42efd9	157	srcBLen = srcALen;
simon	0:1014af42efd9	158	srcALen = j;
simon	0:1014af42efd9	159
simon	0:1014af42efd9	160	/* CORR(x, y) = Reverse order(CORR(y, x)) */
simon	0:1014af42efd9	161	/* Hence set the destination pointer to point to the last output sample */
simon	0:1014af42efd9	162	pOut = pDst + ((srcALen + srcBLen) - 2u);
simon	0:1014af42efd9	163
simon	0:1014af42efd9	164	/* Destination address modifier is set to -1 */
simon	0:1014af42efd9	165	inc = -1;
simon	0:1014af42efd9	166
simon	0:1014af42efd9	167	}
simon	0:1014af42efd9	168
simon	0:1014af42efd9	169	/* The function is internally
simon	0:1014af42efd9	170	* divided into three parts according to the number of multiplications that has to be
simon	0:1014af42efd9	171	* taken place between inputA samples and inputB samples. In the first part of the
simon	0:1014af42efd9	172	* algorithm, the multiplications increase by one for every iteration.
simon	0:1014af42efd9	173	* In the second part of the algorithm, srcBLen number of multiplications are done.
simon	0:1014af42efd9	174	* In the third part of the algorithm, the multiplications decrease by one
simon	0:1014af42efd9	175	* for every iteration.*/
simon	0:1014af42efd9	176	/* The algorithm is implemented in three stages.
simon	0:1014af42efd9	177	* The loop counters of each stage is initiated here. */
simon	0:1014af42efd9	178	blockSize1 = srcBLen - 1u;
simon	0:1014af42efd9	179	blockSize2 = srcALen - (srcBLen - 1u);
simon	0:1014af42efd9	180	blockSize3 = blockSize1;
simon	0:1014af42efd9	181
simon	0:1014af42efd9	182	/* --------------------------
simon	0:1014af42efd9	183	* Initializations of stage1
simon	0:1014af42efd9	184	* -------------------------*/
simon	0:1014af42efd9	185
simon	0:1014af42efd9	186	/* sum = x[0] * y[srcBlen - 1]
simon	0:1014af42efd9	187	* sum = x[0] * y[srcBlen-2] + x[1] * y[srcBlen - 1]
simon	0:1014af42efd9	188	* ....
simon	0:1014af42efd9	189	* sum = x[0] * y[0] + x[1] * y[1] +...+ x[srcBLen - 1] * y[srcBLen - 1]
simon	0:1014af42efd9	190	*/
simon	0:1014af42efd9	191
simon	0:1014af42efd9	192	/* In this stage the MAC operations are increased by 1 for every iteration.
simon	0:1014af42efd9	193	The count variable holds the number of MAC operations performed */
simon	0:1014af42efd9	194	count = 1u;
simon	0:1014af42efd9	195
simon	0:1014af42efd9	196	/* Working pointer of inputA */
simon	0:1014af42efd9	197	px = pIn1;
simon	0:1014af42efd9	198
simon	0:1014af42efd9	199	/* Working pointer of inputB */
simon	0:1014af42efd9	200	pSrc1 = pIn2 + (srcBLen - 1u);
simon	0:1014af42efd9	201	py = pSrc1;
simon	0:1014af42efd9	202
simon	0:1014af42efd9	203	/* ------------------------
simon	0:1014af42efd9	204	* Stage1 process
simon	0:1014af42efd9	205	* ----------------------*/
simon	0:1014af42efd9	206
simon	0:1014af42efd9	207	/* The first stage starts here */
simon	0:1014af42efd9	208	while(blockSize1 > 0u)
simon	0:1014af42efd9	209	{
simon	0:1014af42efd9	210	/* Accumulator is made zero for every iteration */
simon	0:1014af42efd9	211	sum = 0.0f;
simon	0:1014af42efd9	212
simon	0:1014af42efd9	213	/* Apply loop unrolling and compute 4 MACs simultaneously. */
simon	0:1014af42efd9	214	k = count >> 2u;
simon	0:1014af42efd9	215
simon	0:1014af42efd9	216	/* First part of the processing with loop unrolling. Compute 4 MACs at a time.
simon	0:1014af42efd9	217	** a second loop below computes MACs for the remaining 1 to 3 samples. */
simon	0:1014af42efd9	218	while(k > 0u)
simon	0:1014af42efd9	219	{
simon	0:1014af42efd9	220	/* x[0] * y[srcBLen - 4] */
simon	0:1014af42efd9	221	sum += px++ *py++;
simon	0:1014af42efd9	222	/* x[1] * y[srcBLen - 3] */
simon	0:1014af42efd9	223	sum += px++ *py++;
simon	0:1014af42efd9	224	/* x[2] * y[srcBLen - 2] */
simon	0:1014af42efd9	225	sum += px++ *py++;
simon	0:1014af42efd9	226	/* x[3] * y[srcBLen - 1] */
simon	0:1014af42efd9	227	sum += px++ *py++;
simon	0:1014af42efd9	228
simon	0:1014af42efd9	229	/* Decrement the loop counter */
simon	0:1014af42efd9	230	k--;
simon	0:1014af42efd9	231	}
simon	0:1014af42efd9	232
simon	0:1014af42efd9	233	/* If the count is not a multiple of 4, compute any remaining MACs here.
simon	0:1014af42efd9	234	** No loop unrolling is used. */
simon	0:1014af42efd9	235	k = count % 0x4u;
simon	0:1014af42efd9	236
simon	0:1014af42efd9	237	while(k > 0u)
simon	0:1014af42efd9	238	{
simon	0:1014af42efd9	239	/* Perform the multiply-accumulate */
simon	0:1014af42efd9	240	/* x[0] * y[srcBLen - 1] */
simon	0:1014af42efd9	241	sum += px++ *py++;
simon	0:1014af42efd9	242
simon	0:1014af42efd9	243	/* Decrement the loop counter */
simon	0:1014af42efd9	244	k--;
simon	0:1014af42efd9	245	}
simon	0:1014af42efd9	246
simon	0:1014af42efd9	247	/* Store the result in the accumulator in the destination buffer. */
simon	0:1014af42efd9	248	*pOut = sum;
simon	0:1014af42efd9	249	/* Destination pointer is updated according to the address modifier, inc */
simon	0:1014af42efd9	250	pOut += inc;
simon	0:1014af42efd9	251
simon	0:1014af42efd9	252	/* Update the inputA and inputB pointers for next MAC calculation */
simon	0:1014af42efd9	253	py = pSrc1 - count;
simon	0:1014af42efd9	254	px = pIn1;
simon	0:1014af42efd9	255
simon	0:1014af42efd9	256	/* Increment the MAC count */
simon	0:1014af42efd9	257	count++;
simon	0:1014af42efd9	258
simon	0:1014af42efd9	259	/* Decrement the loop counter */
simon	0:1014af42efd9	260	blockSize1--;
simon	0:1014af42efd9	261	}
simon	0:1014af42efd9	262
simon	0:1014af42efd9	263	/* --------------------------
simon	0:1014af42efd9	264	* Initializations of stage2
simon	0:1014af42efd9	265	* ------------------------*/
simon	0:1014af42efd9	266
simon	0:1014af42efd9	267	/* sum = x[0] * y[0] + x[1] * y[1] +...+ x[srcBLen-1] * y[srcBLen-1]
simon	0:1014af42efd9	268	* sum = x[1] * y[0] + x[2] * y[1] +...+ x[srcBLen] * y[srcBLen-1]
simon	0:1014af42efd9	269	* ....
simon	0:1014af42efd9	270	* sum = x[srcALen-srcBLen-2] * y[0] + x[srcALen-srcBLen-1] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]
simon	0:1014af42efd9	271	*/
simon	0:1014af42efd9	272
simon	0:1014af42efd9	273	/* Working pointer of inputA */
simon	0:1014af42efd9	274	px = pIn1;
simon	0:1014af42efd9	275
simon	0:1014af42efd9	276	/* Working pointer of inputB */
simon	0:1014af42efd9	277	py = pIn2;
simon	0:1014af42efd9	278
simon	0:1014af42efd9	279	/* count is index by which the pointer pIn1 to be incremented */
simon	0:1014af42efd9	280	count = 1u;
simon	0:1014af42efd9	281
simon	0:1014af42efd9	282	/* -------------------
simon	0:1014af42efd9	283	* Stage2 process
simon	0:1014af42efd9	284	* ------------------*/
simon	0:1014af42efd9	285
simon	0:1014af42efd9	286	/* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.
simon	0:1014af42efd9	287	* So, to loop unroll over blockSize2,
simon	0:1014af42efd9	288	* srcBLen should be greater than or equal to 4, to loop unroll the srcBLen loop */
simon	0:1014af42efd9	289	if(srcBLen >= 4u)
simon	0:1014af42efd9	290	{
simon	0:1014af42efd9	291	/* Loop unroll over blockSize2, by 4 */
simon	0:1014af42efd9	292	blkCnt = blockSize2 >> 2u;
simon	0:1014af42efd9	293
simon	0:1014af42efd9	294	while(blkCnt > 0u)
simon	0:1014af42efd9	295	{
simon	0:1014af42efd9	296	/* Set all accumulators to zero */
simon	0:1014af42efd9	297	acc0 = 0.0f;
simon	0:1014af42efd9	298	acc1 = 0.0f;
simon	0:1014af42efd9	299	acc2 = 0.0f;
simon	0:1014af42efd9	300	acc3 = 0.0f;
simon	0:1014af42efd9	301
simon	0:1014af42efd9	302	/* read x[0], x[1], x[2] samples */
simon	0:1014af42efd9	303	x0 = *(px++);
simon	0:1014af42efd9	304	x1 = *(px++);
simon	0:1014af42efd9	305	x2 = *(px++);
simon	0:1014af42efd9	306
simon	0:1014af42efd9	307	/* Apply loop unrolling and compute 4 MACs simultaneously. */
simon	0:1014af42efd9	308	k = srcBLen >> 2u;
simon	0:1014af42efd9	309
simon	0:1014af42efd9	310	/* First part of the processing with loop unrolling. Compute 4 MACs at a time.
simon	0:1014af42efd9	311	** a second loop below computes MACs for the remaining 1 to 3 samples. */
simon	0:1014af42efd9	312	do
simon	0:1014af42efd9	313	{
simon	0:1014af42efd9	314	/* Read y[0] sample */
simon	0:1014af42efd9	315	c0 = *(py++);
simon	0:1014af42efd9	316
simon	0:1014af42efd9	317	/* Read x[3] sample */
simon	0:1014af42efd9	318	x3 = *(px++);
simon	0:1014af42efd9	319
simon	0:1014af42efd9	320	/* Perform the multiply-accumulate */
simon	0:1014af42efd9	321	/* acc0 += x[0] * y[0] */
simon	0:1014af42efd9	322	acc0 += x0 * c0;
simon	0:1014af42efd9	323	/* acc1 += x[1] * y[0] */
simon	0:1014af42efd9	324	acc1 += x1 * c0;
simon	0:1014af42efd9	325	/* acc2 += x[2] * y[0] */
simon	0:1014af42efd9	326	acc2 += x2 * c0;
simon	0:1014af42efd9	327	/* acc3 += x[3] * y[0] */
simon	0:1014af42efd9	328	acc3 += x3 * c0;
simon	0:1014af42efd9	329
simon	0:1014af42efd9	330	/* Read y[1] sample */
simon	0:1014af42efd9	331	c0 = *(py++);
simon	0:1014af42efd9	332
simon	0:1014af42efd9	333	/* Read x[4] sample */
simon	0:1014af42efd9	334	x0 = *(px++);
simon	0:1014af42efd9	335
simon	0:1014af42efd9	336	/* Perform the multiply-accumulate */
simon	0:1014af42efd9	337	/* acc0 += x[1] * y[1] */
simon	0:1014af42efd9	338	acc0 += x1 * c0;
simon	0:1014af42efd9	339	/* acc1 += x[2] * y[1] */
simon	0:1014af42efd9	340	acc1 += x2 * c0;
simon	0:1014af42efd9	341	/* acc2 += x[3] * y[1] */
simon	0:1014af42efd9	342	acc2 += x3 * c0;
simon	0:1014af42efd9	343	/* acc3 += x[4] * y[1] */
simon	0:1014af42efd9	344	acc3 += x0 * c0;
simon	0:1014af42efd9	345
simon	0:1014af42efd9	346	/* Read y[2] sample */
simon	0:1014af42efd9	347	c0 = *(py++);
simon	0:1014af42efd9	348
simon	0:1014af42efd9	349	/* Read x[5] sample */
simon	0:1014af42efd9	350	x1 = *(px++);
simon	0:1014af42efd9	351
simon	0:1014af42efd9	352	/* Perform the multiply-accumulates */
simon	0:1014af42efd9	353	/* acc0 += x[2] * y[2] */
simon	0:1014af42efd9	354	acc0 += x2 * c0;
simon	0:1014af42efd9	355	/* acc1 += x[3] * y[2] */
simon	0:1014af42efd9	356	acc1 += x3 * c0;
simon	0:1014af42efd9	357	/* acc2 += x[4] * y[2] */
simon	0:1014af42efd9	358	acc2 += x0 * c0;
simon	0:1014af42efd9	359	/* acc3 += x[5] * y[2] */
simon	0:1014af42efd9	360	acc3 += x1 * c0;
simon	0:1014af42efd9	361
simon	0:1014af42efd9	362	/* Read y[3] sample */
simon	0:1014af42efd9	363	c0 = *(py++);
simon	0:1014af42efd9	364
simon	0:1014af42efd9	365	/* Read x[6] sample */
simon	0:1014af42efd9	366	x2 = *(px++);
simon	0:1014af42efd9	367
simon	0:1014af42efd9	368	/* Perform the multiply-accumulates */
simon	0:1014af42efd9	369	/* acc0 += x[3] * y[3] */
simon	0:1014af42efd9	370	acc0 += x3 * c0;
simon	0:1014af42efd9	371	/* acc1 += x[4] * y[3] */
simon	0:1014af42efd9	372	acc1 += x0 * c0;
simon	0:1014af42efd9	373	/* acc2 += x[5] * y[3] */
simon	0:1014af42efd9	374	acc2 += x1 * c0;
simon	0:1014af42efd9	375	/* acc3 += x[6] * y[3] */
simon	0:1014af42efd9	376	acc3 += x2 * c0;
simon	0:1014af42efd9	377
simon	0:1014af42efd9	378
simon	0:1014af42efd9	379	} while(--k);
simon	0:1014af42efd9	380
simon	0:1014af42efd9	381	/* If the srcBLen is not a multiple of 4, compute any remaining MACs here.
simon	0:1014af42efd9	382	** No loop unrolling is used. */
simon	0:1014af42efd9	383	k = srcBLen % 0x4u;
simon	0:1014af42efd9	384
simon	0:1014af42efd9	385	while(k > 0u)
simon	0:1014af42efd9	386	{
simon	0:1014af42efd9	387	/* Read y[4] sample */
simon	0:1014af42efd9	388	c0 = *(py++);
simon	0:1014af42efd9	389
simon	0:1014af42efd9	390	/* Read x[7] sample */
simon	0:1014af42efd9	391	x3 = *(px++);
simon	0:1014af42efd9	392
simon	0:1014af42efd9	393	/* Perform the multiply-accumulates */
simon	0:1014af42efd9	394	/* acc0 += x[4] * y[4] */
simon	0:1014af42efd9	395	acc0 += x0 * c0;
simon	0:1014af42efd9	396	/* acc1 += x[5] * y[4] */
simon	0:1014af42efd9	397	acc1 += x1 * c0;
simon	0:1014af42efd9	398	/* acc2 += x[6] * y[4] */
simon	0:1014af42efd9	399	acc2 += x2 * c0;
simon	0:1014af42efd9	400	/* acc3 += x[7] * y[4] */
simon	0:1014af42efd9	401	acc3 += x3 * c0;
simon	0:1014af42efd9	402
simon	0:1014af42efd9	403	/* Reuse the present samples for the next MAC */
simon	0:1014af42efd9	404	x0 = x1;
simon	0:1014af42efd9	405	x1 = x2;
simon	0:1014af42efd9	406	x2 = x3;
simon	0:1014af42efd9	407
simon	0:1014af42efd9	408	/* Decrement the loop counter */
simon	0:1014af42efd9	409	k--;
simon	0:1014af42efd9	410	}
simon	0:1014af42efd9	411
simon	0:1014af42efd9	412	/* Store the result in the accumulator in the destination buffer. */
simon	0:1014af42efd9	413	*pOut = acc0;
simon	0:1014af42efd9	414	/* Destination pointer is updated according to the address modifier, inc */
simon	0:1014af42efd9	415	pOut += inc;
simon	0:1014af42efd9	416
simon	0:1014af42efd9	417	*pOut = acc1;
simon	0:1014af42efd9	418	pOut += inc;
simon	0:1014af42efd9	419
simon	0:1014af42efd9	420	*pOut = acc2;
simon	0:1014af42efd9	421	pOut += inc;
simon	0:1014af42efd9	422
simon	0:1014af42efd9	423	*pOut = acc3;
simon	0:1014af42efd9	424	pOut += inc;
simon	0:1014af42efd9	425
simon	0:1014af42efd9	426	/* Update the inputA and inputB pointers for next MAC calculation */
simon	0:1014af42efd9	427	px = pIn1 + (count * 4u);
simon	0:1014af42efd9	428	py = pIn2;
simon	0:1014af42efd9	429
simon	0:1014af42efd9	430	/* Increment the pointer pIn1 index, count by 1 */
simon	0:1014af42efd9	431	count++;
simon	0:1014af42efd9	432
simon	0:1014af42efd9	433	/* Decrement the loop counter */
simon	0:1014af42efd9	434	blkCnt--;
simon	0:1014af42efd9	435	}
simon	0:1014af42efd9	436
simon	0:1014af42efd9	437	/* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.
simon	0:1014af42efd9	438	** No loop unrolling is used. */
simon	0:1014af42efd9	439	blkCnt = blockSize2 % 0x4u;
simon	0:1014af42efd9	440
simon	0:1014af42efd9	441	while(blkCnt > 0u)
simon	0:1014af42efd9	442	{
simon	0:1014af42efd9	443	/* Accumulator is made zero for every iteration */
simon	0:1014af42efd9	444	sum = 0.0f;
simon	0:1014af42efd9	445
simon	0:1014af42efd9	446	/* Apply loop unrolling and compute 4 MACs simultaneously. */
simon	0:1014af42efd9	447	k = srcBLen >> 2u;
simon	0:1014af42efd9	448
simon	0:1014af42efd9	449	/* First part of the processing with loop unrolling. Compute 4 MACs at a time.
simon	0:1014af42efd9	450	** a second loop below computes MACs for the remaining 1 to 3 samples. */
simon	0:1014af42efd9	451	while(k > 0u)
simon	0:1014af42efd9	452	{
simon	0:1014af42efd9	453	/* Perform the multiply-accumulates */
simon	0:1014af42efd9	454	sum += px++ *py++;
simon	0:1014af42efd9	455	sum += px++ *py++;
simon	0:1014af42efd9	456	sum += px++ *py++;
simon	0:1014af42efd9	457	sum += px++ *py++;
simon	0:1014af42efd9	458
simon	0:1014af42efd9	459	/* Decrement the loop counter */
simon	0:1014af42efd9	460	k--;
simon	0:1014af42efd9	461	}
simon	0:1014af42efd9	462
simon	0:1014af42efd9	463	/* If the srcBLen is not a multiple of 4, compute any remaining MACs here.
simon	0:1014af42efd9	464	** No loop unrolling is used. */
simon	0:1014af42efd9	465	k = srcBLen % 0x4u;
simon	0:1014af42efd9	466
simon	0:1014af42efd9	467	while(k > 0u)
simon	0:1014af42efd9	468	{
simon	0:1014af42efd9	469	/* Perform the multiply-accumulate */
simon	0:1014af42efd9	470	sum += px++ *py++;
simon	0:1014af42efd9	471
simon	0:1014af42efd9	472	/* Decrement the loop counter */
simon	0:1014af42efd9	473	k--;
simon	0:1014af42efd9	474	}
simon	0:1014af42efd9	475
simon	0:1014af42efd9	476	/* Store the result in the accumulator in the destination buffer. */
simon	0:1014af42efd9	477	*pOut = sum;
simon	0:1014af42efd9	478	/* Destination pointer is updated according to the address modifier, inc */
simon	0:1014af42efd9	479	pOut += inc;
simon	0:1014af42efd9	480
simon	0:1014af42efd9	481	/* Update the inputA and inputB pointers for next MAC calculation */
simon	0:1014af42efd9	482	px = pIn1 + count;
simon	0:1014af42efd9	483	py = pIn2;
simon	0:1014af42efd9	484
simon	0:1014af42efd9	485	/* Increment the pointer pIn1 index, count by 1 */
simon	0:1014af42efd9	486	count++;
simon	0:1014af42efd9	487
simon	0:1014af42efd9	488	/* Decrement the loop counter */
simon	0:1014af42efd9	489	blkCnt--;
simon	0:1014af42efd9	490	}
simon	0:1014af42efd9	491	}
simon	0:1014af42efd9	492	else
simon	0:1014af42efd9	493	{
simon	0:1014af42efd9	494	/* If the srcBLen is not a multiple of 4,
simon	0:1014af42efd9	495	* the blockSize2 loop cannot be unrolled by 4 */
simon	0:1014af42efd9	496	blkCnt = blockSize2;
simon	0:1014af42efd9	497
simon	0:1014af42efd9	498	while(blkCnt > 0u)
simon	0:1014af42efd9	499	{
simon	0:1014af42efd9	500	/* Accumulator is made zero for every iteration */
simon	0:1014af42efd9	501	sum = 0.0f;
simon	0:1014af42efd9	502
simon	0:1014af42efd9	503	/* Loop over srcBLen */
simon	0:1014af42efd9	504	k = srcBLen;
simon	0:1014af42efd9	505
simon	0:1014af42efd9	506	while(k > 0u)
simon	0:1014af42efd9	507	{
simon	0:1014af42efd9	508	/* Perform the multiply-accumulate */
simon	0:1014af42efd9	509	sum += px++ *py++;
simon	0:1014af42efd9	510
simon	0:1014af42efd9	511	/* Decrement the loop counter */
simon	0:1014af42efd9	512	k--;
simon	0:1014af42efd9	513	}
simon	0:1014af42efd9	514
simon	0:1014af42efd9	515	/* Store the result in the accumulator in the destination buffer. */
simon	0:1014af42efd9	516	*pOut = sum;
simon	0:1014af42efd9	517	/* Destination pointer is updated according to the address modifier, inc */
simon	0:1014af42efd9	518	pOut += inc;
simon	0:1014af42efd9	519
simon	0:1014af42efd9	520	/* Update the inputA and inputB pointers for next MAC calculation */
simon	0:1014af42efd9	521	px = pIn1 + count;
simon	0:1014af42efd9	522	py = pIn2;
simon	0:1014af42efd9	523
simon	0:1014af42efd9	524	/* Increment the pointer pIn1 index, count by 1 */
simon	0:1014af42efd9	525	count++;
simon	0:1014af42efd9	526
simon	0:1014af42efd9	527	/* Decrement the loop counter */
simon	0:1014af42efd9	528	blkCnt--;
simon	0:1014af42efd9	529	}
simon	0:1014af42efd9	530	}
simon	0:1014af42efd9	531
simon	0:1014af42efd9	532	/* --------------------------
simon	0:1014af42efd9	533	* Initializations of stage3
simon	0:1014af42efd9	534	* -------------------------*/
simon	0:1014af42efd9	535
simon	0:1014af42efd9	536	/* sum += x[srcALen-srcBLen+1] * y[0] + x[srcALen-srcBLen+2] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]
simon	0:1014af42efd9	537	* sum += x[srcALen-srcBLen+2] * y[0] + x[srcALen-srcBLen+3] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]
simon	0:1014af42efd9	538	* ....
simon	0:1014af42efd9	539	* sum += x[srcALen-2] * y[0] + x[srcALen-1] * y[1]
simon	0:1014af42efd9	540	* sum += x[srcALen-1] * y[0]
simon	0:1014af42efd9	541	*/
simon	0:1014af42efd9	542
simon	0:1014af42efd9	543	/* In this stage the MAC operations are decreased by 1 for every iteration.
simon	0:1014af42efd9	544	The count variable holds the number of MAC operations performed */
simon	0:1014af42efd9	545	count = srcBLen - 1u;
simon	0:1014af42efd9	546
simon	0:1014af42efd9	547	/* Working pointer of inputA */
simon	0:1014af42efd9	548	pSrc1 = pIn1 + (srcALen - (srcBLen - 1u));
simon	0:1014af42efd9	549	px = pSrc1;
simon	0:1014af42efd9	550
simon	0:1014af42efd9	551	/* Working pointer of inputB */
simon	0:1014af42efd9	552	py = pIn2;
simon	0:1014af42efd9	553
simon	0:1014af42efd9	554	/* -------------------
simon	0:1014af42efd9	555	* Stage3 process
simon	0:1014af42efd9	556	* ------------------*/
simon	0:1014af42efd9	557
simon	0:1014af42efd9	558	while(blockSize3 > 0u)
simon	0:1014af42efd9	559	{
simon	0:1014af42efd9	560	/* Accumulator is made zero for every iteration */
simon	0:1014af42efd9	561	sum = 0.0f;
simon	0:1014af42efd9	562
simon	0:1014af42efd9	563	/* Apply loop unrolling and compute 4 MACs simultaneously. */
simon	0:1014af42efd9	564	k = count >> 2u;
simon	0:1014af42efd9	565
simon	0:1014af42efd9	566	/* First part of the processing with loop unrolling. Compute 4 MACs at a time.
simon	0:1014af42efd9	567	** a second loop below computes MACs for the remaining 1 to 3 samples. */
simon	0:1014af42efd9	568	while(k > 0u)
simon	0:1014af42efd9	569	{
simon	0:1014af42efd9	570	/* Perform the multiply-accumulates */
simon	0:1014af42efd9	571	/* sum += x[srcALen - srcBLen + 4] * y[3] */
simon	0:1014af42efd9	572	sum += px++ *py++;
simon	0:1014af42efd9	573	/* sum += x[srcALen - srcBLen + 3] * y[2] */
simon	0:1014af42efd9	574	sum += px++ *py++;
simon	0:1014af42efd9	575	/* sum += x[srcALen - srcBLen + 2] * y[1] */
simon	0:1014af42efd9	576	sum += px++ *py++;
simon	0:1014af42efd9	577	/* sum += x[srcALen - srcBLen + 1] * y[0] */
simon	0:1014af42efd9	578	sum += px++ *py++;
simon	0:1014af42efd9	579
simon	0:1014af42efd9	580	/* Decrement the loop counter */
simon	0:1014af42efd9	581	k--;
simon	0:1014af42efd9	582	}
simon	0:1014af42efd9	583
simon	0:1014af42efd9	584	/* If the count is not a multiple of 4, compute any remaining MACs here.
simon	0:1014af42efd9	585	** No loop unrolling is used. */
simon	0:1014af42efd9	586	k = count % 0x4u;
simon	0:1014af42efd9	587
simon	0:1014af42efd9	588	while(k > 0u)
simon	0:1014af42efd9	589	{
simon	0:1014af42efd9	590	/* Perform the multiply-accumulates */
simon	0:1014af42efd9	591	sum += px++ *py++;
simon	0:1014af42efd9	592
simon	0:1014af42efd9	593	/* Decrement the loop counter */
simon	0:1014af42efd9	594	k--;
simon	0:1014af42efd9	595	}
simon	0:1014af42efd9	596
simon	0:1014af42efd9	597	/* Store the result in the accumulator in the destination buffer. */
simon	0:1014af42efd9	598	*pOut = sum;
simon	0:1014af42efd9	599	/* Destination pointer is updated according to the address modifier, inc */
simon	0:1014af42efd9	600	pOut += inc;
simon	0:1014af42efd9	601
simon	0:1014af42efd9	602	/* Update the inputA and inputB pointers for next MAC calculation */
simon	0:1014af42efd9	603	px = ++pSrc1;
simon	0:1014af42efd9	604	py = pIn2;
simon	0:1014af42efd9	605
simon	0:1014af42efd9	606	/* Decrement the MAC count */
simon	0:1014af42efd9	607	count--;
simon	0:1014af42efd9	608
simon	0:1014af42efd9	609	/* Decrement the loop counter */
simon	0:1014af42efd9	610	blockSize3--;
simon	0:1014af42efd9	611	}
simon	0:1014af42efd9	612
simon	0:1014af42efd9	613	}
simon	0:1014af42efd9	614
simon	0:1014af42efd9	615	/**
simon	0:1014af42efd9	616	* @} end of Corr group
simon	0:1014af42efd9	617	*/

Repository toolbox

Export to desktop IDE

Repository details

Type:	Library
Created:	10 Mar 2011
Imports:	907
Forks:	1
Commits:	3
Dependents:	5
Dependencies:	0
Followers:	35

src/Cortex-M4-M3/FilteringFunctions/arm_correlate_f32.c@0:1014af42efd9, 2011-03-10 (annotated)

Who changed what in which revision?

Repository toolbox

Repository details

Important Information for this Arm website

Access Warning