Android 筆記

紀錄一些用到 Android 相關功能時,無法在腦袋裡面記住的東西。

使用 fastboot 命令將新的 image 燒錄到系統

  1. 重新開機到 bootloader

    adb reboot-bootloader
    
  2. 查看是否有找到裝置

    fastboot devices
    
  3. 燒錄你的新的 image

    fastboot flash boot boot.img
    fastboot flash system system.img
    
  1. 清除 cache 和 user-data

    fastboot erase userdata
    fastboot erase cache
    
  2. 重新啟動你的手機

    fastboot reboot
    

HTC 手機相關

更新 HTC ONEX 手機的 HBOOT

  • 1. 下載需要的 firmware

    台灣使用的 CID 是 HTC__621,因此要到這邊下載 firmware ,下載完成後,
    將檔案改名為 firmware.zip

  • 2. 將手機上鎖

    fastboot oem lock
    
  • 3. 更新 firmware

    # Enter HTC balck screen
    fastboot oem rebootRUU
    # Flash your hboot.zip file
    fastboot flash zip zipname.zip
    
  • 4. 重新開機回 bootloader

    fastboot reboot-bootloader
    

解鎖 HTC 手機

取得 Unlocked_code.bin 後,進到 fasboot 下,使用

fastboot flash unlocktoken Unlock_code.bin

goldfish kernel 與相對應的版本

  • 2.6.27

    Anything before Gingerbread. (<= Android 2.2)

  • 2.6.29

    For Gingerbread (2.3) until JellyBean MR2 (4.2.x)

  • 3.4

    For KitKat (4.4.x)

  • 3.10

    For L, which isn't completed or named yet (> 4.4.x)

常用的原始碼源的差異

AOSP

CyanogenMod

Code Auroa

Code Auroa 主要提供使用 Qualcomm (高通) SOC 的設備用源碼,可以在 這裡
看到 Code Auroa 不同的分支用途,比如 ics_strawberry_rb2 是提供給
Qualcomm msm8625 的。(msm8225 和 msm8625 不同在於前者僅有 UMTS,後者增加
了 CDMA 支援,詳情 見此 )

在 Code Auroa 的設定中,msm8625 使用和 msm7627a 相同的設定,如下

include device/qcom/msm7627a/msm7627a.mk

PRODUCT_NAME := msm8625
PRODUCT_DEVICE := msm8625

覆蓋 Android repo manifest

使用 repo command 的時候,manifest 是無法有兩個重複的,這時候可以透過
remove-project 來移除原先定義的 project,在定義新的來覆蓋。

<remove-project path="external/tinyalsa" name="platform/external/tinyalsa"/>
<project path="external/tinyalsa" name="platform/external/tinyalsa" groups="pdk" />

ref: http://xda-university.com/as-a-developer/repo-tips-tricks

呼叫 Android AVD Manager / SDK Manager

使以下命令可以呼叫出 Android AVD Manager

android avd

使以下命令可以呼叫出 Android SDK Manager

android sdk

拆解/打包 system.img

如果要修改 Android 的 system.img,首先要下載 simg2img ,
make_ext4fs 這兩個程式,可以到 這裡 取得,或是編譯 Android 系統後,可
以在 host/linux-x86/bin 下面找到。

  • 將 system.img 從 sprse image 變成正常的 ext4 image

    如果你的 device 設定沒有以下的設定,或是你無法直接使用 mount 命令掛載這
    個 system.img,那他大概就是 sparse image。

    # we don't support sparse image.
    TARGET_USERIMAGES_SPARSE_EXT_DISABLED := true
    

    要掛載 sparse image,首先需要透過 simg2img 這個程式來將其轉換為正常
    的 ext4 系統檔案,作法如下

    simg2img system.img system.img.raw
    
  • 掛載 system.img

    接下來就可以使用正常的 mount 命令來掛載這個映像檔,以下命令將其掛載
    到 system 資料夾

    sudo mount -t ext4 -o loop system.img.raw system
    
  • 將 system 資料夾轉換為 system.img 文件

    sudo make_ext4fs -s -l 512M -a system system_new.img system
    

上面說到的工具,如果要自行編譯的話,則要按照以下方法:

編譯 simg2img

  • 下載程式碼

    git clone https://android.googlesource.com/platform/system/core
    
  • 切換 git branch

    cd core
    git checkout origin/jb-mr2.0.0-release
    
  • 編譯 simg2img

    gcc -o simg2img -static\
        -Iinclude simg2img.c sparse_crc32.c backed_block.c \
        output_file.c sparse.c sparse_err.c sparse_read.c -lz
    

編譯 make_ext4fs

  • 下載程式碼

    git clone https://android.googlesource.com/platform/system/extras
    

    注意,要編譯這個的時候,要確認 extras 和 core 資料夾要擺放在一起

  • 切換 git branch

    cd extras
    git checkout origin/jb-mr2.0.0-release
    
  • 編譯 make_ext4fs

    gcc -o make_ext4fs -static \
        -Icore/libsparse/include -lz extras/ext4_utils/make_ext4fs_main.c extras/ext4_utils/make_ext4fs.c \
        extras/ext4_utils/ext4fixup.c extras/ext4_utils/ext4_utils.c extras/ext4_utils/allocate.c \
        extras/ext4_utils/contents.c extras/ext4_utils/extent.c extras/ext4_utils/indirect.c \
        extras/ext4_utils/uuid.c extras/ext4_utils/sha1.c extras/ext4_utils/wipe.c core/libsparse/backed_block.c \
        core/libsparse/output_file.c core/libsparse/sparse.c core/libsparse/sparse_crc32.c core/libsparse/sparse_err.c \
        core/libsparse/sparse_read.c
    

將 uramdisk.img 轉換為 ramdisk.img.gz

dd if=uramdisk.img of=ramdisk.img.gz skip=64 bs=1

使用 adb 縮放模擬器大小

adb emu window scale 0.6

使用命令行進行截圖

在 Android 下有 screencap 命令,可以用他來進行截圖。

screencap -p /data/screenshot.png

掛載 /system 資料夾成 RW 模式

*必須要擁有 root 權限*

mount -o rw,remount /system

repo 命令

根據目前的 project 產生相對應的 repo manifest

repo manifest -r -o default.xml

PATCH 整理

歸類各式各樣的 Android patch,來協助編譯、改照 Android 更方便

Android 編譯系統相關

build-add-TARGET_SPECIFIC_HEADER_PATH-for-device-own.patch

加上這個 patch 後,就可以在 BoardConfig.mk 裡面設額外的 header 位置,
設定方式如下

TARGET_SPECIFIC_HEADER_PATH += device/htc/endeavoru/include
From 9595e56c15c29469bcbae68e969b1ae1062131de Mon Sep 17 00:00:00 2001
From: River Zhou <riverzhou2000@gmail.com>
Date: Mon, 8 Oct 2012 05:52:13 +0800
Subject: [PATCH 1/1] build : add TARGET_SPECIFIC_HEADER_PATH for device own
 headers

Change-Id: I27ba9d0d69182f87b61f994c347bfd160f5fdea2
---
 core/binary.mk | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/core/binary.mk b/core/binary.mk
index 791623b..f3c030e 100644
--- a/core/binary.mk
+++ b/core/binary.mk
@@ -553,6 +553,11 @@ all_objects := \
     $(proto_generated_objects) \
     $(addprefix $(TOPDIR)$(LOCAL_PATH)/,$(LOCAL_PREBUILT_OBJ_FILES))

+## Allow a device's own headers to take precedence over global ones
+ifneq ($(TARGET_SPECIFIC_HEADER_PATH),)
+LOCAL_C_INCLUDES := $(TOPDIR)$(TARGET_SPECIFIC_HEADER_PATH) $(LOCAL_C_INCLUDES)
+endif
+
 LOCAL_C_INCLUDES += $(TOPDIR)$(LOCAL_PATH) $(intermediates)

 ifndef LOCAL_SDK_VERSION
--
1.7.9.5

frameworks

frameworks-base-services-fix-screenshot-rotation.patch

From d8224a14092600d35dc8e7b372a5f6064a16fb9d Mon Sep 17 00:00:00 2001
From: River Zhou <riverzhou2000@gmail.com>
Date: Sat, 24 Nov 2012 21:03:33 +0800
Subject: [PATCH 1/1] frameworks base services fix screenshot rotation

Change-Id: I4c1a0dfef82a4d996874e7f971cec8805a77974e
---
 .../systemui/screenshot/GlobalScreenshot.java | 5 +++-
 .../android/server/wm/ScreenRotationAnimation.java | 27 +++++++++++++++-----
 .../android/server/wm/WindowManagerService.java | 2 ++
 3 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/packages/SystemUI/src/com/android/systemui/screenshot/GlobalScreenshot.java b/packages/SystemUI/src/com/android/systemui/screenshot/GlobalScreenshot.java
index f25ac0d..1a860d9 100644
--- a/packages/SystemUI/src/com/android/systemui/screenshot/GlobalScreenshot.java
+++ b/packages/SystemUI/src/com/android/systemui/screenshot/GlobalScreenshot.java
@@ -381,7 +381,10 @@ class GlobalScreenshot {
         // only in the natural orientation of the device :!)
         mDisplay.getRealMetrics(mDisplayMetrics);
         float[] dims = {mDisplayMetrics.widthPixels, mDisplayMetrics.heightPixels};
- float degrees = getDegreesForRotation(mDisplay.getRotation());
+ int rot = mDisplay.getRotation();
+ // Allow for abnormal hardware orientation
+ rot = (rot + (android.os.SystemProperties.getInt("ro.sf.hwrotation",0) / 90 )) % 4;
+ float degrees = getDegreesForRotation(rot);
         boolean requiresRotation = (degrees > 0);
         if (requiresRotation) {
             // Get the dimensions of the device in its native orientation
diff --git a/services/java/com/android/server/wm/ScreenRotationAnimation.java b/services/java/com/android/server/wm/ScreenRotationAnimation.java
index 8d2e2e8..661cff8 100644
--- a/services/java/com/android/server/wm/ScreenRotationAnimation.java
+++ b/services/java/com/android/server/wm/ScreenRotationAnimation.java
@@ -50,6 +50,7 @@ class ScreenRotationAnimation {
     int mWidth, mHeight;
     int mExitAnimId, mEnterAnimId;

+ int mSnapshotRotation;
     int mOriginalRotation;
     int mOriginalWidth, mOriginalHeight;
     int mCurRotation;
@@ -196,14 +197,26 @@ class ScreenRotationAnimation {
         mExitAnimId = exitAnim;
         mEnterAnimId = enterAnim;

- // Screenshot does NOT include rotation!
- if (originalRotation == Surface.ROTATION_90
+ // Allow for abnormal hardware orientation
+ mSnapshotRotation = (4 - android.os.SystemProperties.getInt("ro.sf.hwrotation",0) / 90) % 4;
+ if (mSnapshotRotation == Surface.ROTATION_0 || mSnapshotRotation == Surface.ROTATION_180) {
+ if (originalRotation == Surface.ROTATION_90
                 || originalRotation == Surface.ROTATION_270) {
- mWidth = originalHeight;
- mHeight = originalWidth;
+ mWidth = originalHeight;
+ mHeight = originalWidth;
+ } else {
+ mWidth = originalWidth;
+ mHeight = originalHeight;
+ }
         } else {
- mWidth = originalWidth;
- mHeight = originalHeight;
+ if (originalRotation == Surface.ROTATION_90
+ || originalRotation == Surface.ROTATION_270) {
+ mWidth = originalWidth;
+ mHeight = originalHeight;
+ } else {
+ mWidth = originalHeight;
+ mHeight = originalWidth;
+ }
         }

         mOriginalRotation = originalRotation;
@@ -313,7 +326,7 @@ class ScreenRotationAnimation {
         // Compute the transformation matrix that must be applied
         // to the snapshot to make it stay in the same original position
         // with the current screen rotation.
- int delta = deltaRotation(rotation, Surface.ROTATION_0);
+ int delta = deltaRotation(rotation, mSnapshotRotation);
         createRotationMatrix(delta, mWidth, mHeight, mSnapshotInitialMatrix);

         if (DEBUG_STATE) Slog.v(TAG, "**** ROTATION: " + delta);
diff --git a/services/java/com/android/server/wm/WindowManagerService.java b/services/java/com/android/server/wm/WindowManagerService.java
index 51edb44..f2f3f8a 100755
--- a/services/java/com/android/server/wm/WindowManagerService.java
+++ b/services/java/com/android/server/wm/WindowManagerService.java
@@ -5831,6 +5831,8 @@ public class WindowManagerService extends IWindowManager.Stub

             // The screenshot API does not apply the current screen rotation.
             rot = getDefaultDisplayContentLocked().getDisplay().getRotation();
+ // Allow for abnormal hardware orientation
+ rot = (rot + (android.os.SystemProperties.getInt("ro.sf.hwrotation",0) / 90 )) % 4;
             int fw = frame.width();
             int fh = frame.height();

bionic 相關

0001-ionic-libc-linaro-ARMv7-optimized-string-handling-r.patch

From 46585a1e55de15b666e0855b986cf12484ee8e27 Mon Sep 17 00:00:00 2001
From: River Zhou <riverzhou2000@gmail.com>
Date: Sat, 6 Oct 2012 14:05:12 +0800
Subject: [PATCH 1/4] bionic: libc: linaro: ARMv7 optimized string handling
 routines

---
 libc/Android.mk | 30 ++-
 libc/arch-arm/bionic/armv7/bzero.S | 102 ++++++++++
 libc/arch-arm/bionic/armv7/memchr.S | 150 ++++++++++++++
 libc/arch-arm/bionic/armv7/memcpy.S | 374 +++++++++++++++++++++++++++++++++++
 libc/arch-arm/bionic/armv7/memset.S | 119 +++++++++++
 libc/arch-arm/bionic/armv7/strchr.S | 77 ++++++++
 libc/arch-arm/bionic/armv7/strcpy.c | 179 +++++++++++++++++
 libc/arch-arm/bionic/armv7/strlen.S | 113 +++++++++++
 8 files changed, 1138 insertions(+), 6 deletions(-)
 create mode 100644 libc/arch-arm/bionic/armv7/bzero.S
 create mode 100644 libc/arch-arm/bionic/armv7/memchr.S
 create mode 100644 libc/arch-arm/bionic/armv7/memcpy.S
 create mode 100644 libc/arch-arm/bionic/armv7/memset.S
 create mode 100644 libc/arch-arm/bionic/armv7/strchr.S
 create mode 100644 libc/arch-arm/bionic/armv7/strcpy.c
 create mode 100644 libc/arch-arm/bionic/armv7/strlen.S

diff --git a/libc/Android.mk b/libc/Android.mk
index 7d31340..1ffc3ca 100644
--- a/libc/Android.mk
+++ b/libc/Android.mk
@@ -178,14 +178,12 @@ libc_common_src_files := \
  stdlib/wchar.c \
  string/index.c \
  string/memccpy.c \
- string/memchr.c \
  string/memmem.c \
  string/memrchr.c \
  string/memswap.c \
  string/strcasecmp.c \
  string/strcasestr.c \
  string/strcat.c \
- string/strchr.c \
  string/strcoll.c \
  string/strcspn.c \
  string/strdup.c \
@@ -362,12 +360,8 @@ libc_common_src_files += \
  arch-arm/bionic/tgkill.S \
  arch-arm/bionic/memcmp.S \
  arch-arm/bionic/memcmp16.S \
- arch-arm/bionic/memcpy.S \
- arch-arm/bionic/memset.S \
  arch-arm/bionic/setjmp.S \
  arch-arm/bionic/sigsetjmp.S \
- arch-arm/bionic/strlen.c.arm \
- arch-arm/bionic/strcpy.S \
  arch-arm/bionic/strcmp.S \
  arch-arm/bionic/syscall.S \
  string/memmove.c.arm \
@@ -394,8 +388,32 @@ libc_arch_static_src_files := \

 libc_arch_dynamic_src_files := \
  arch-arm/bionic/exidx_dynamic.c
+
+ifeq ($(ARCH_ARM_HAVE_ARMV7A),true)
+libc_common_src_files += \
+ arch-arm/bionic/armv7/memchr.S \
+ arch-arm/bionic/armv7/memcpy.S \
+ arch-arm/bionic/armv7/memset.S \
+ arch-arm/bionic/armv7/bzero.S \
+ arch-arm/bionic/armv7/strchr.S \
+ arch-arm/bionic/armv7/strcpy.c \
+ arch-arm/bionic/armv7/strlen.S
+else
+libc_common_src_files += \
+ string/memchr.c \
+ arch-arm/bionic/memcpy.S \
+ arch-arm/bionic/memset.S \
+ string/strchr.c \
+ arch-arm/bionic/strcpy.S \
+ arch-arm/bionic/strlen.c.arm
+endif
+
 else # !arm

+libc_common_src_files += \
+ string/memchr.c \
+ string/strchr.c
+
 ifeq ($(TARGET_ARCH),x86)
 libc_common_src_files += \
  arch-x86/bionic/__get_sp.S \
diff --git a/libc/arch-arm/bionic/armv7/bzero.S b/libc/arch-arm/bionic/armv7/bzero.S
new file mode 100644
index 0000000..5230ba5
--- /dev/null
+++ b/libc/arch-arm/bionic/armv7/bzero.S
@@ -0,0 +1,102 @@
+/* Copyright (c) 2010-2011, Linaro Limited
+ All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without
+ modification, are permitted provided that the following conditions
+ are met:
+
+ * Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in the
+ documentation and/or other materials provided with the distribution.
+
+ * Neither the name of Linaro Limited nor the names of its
+ contributors may be used to endorse or promote products derived
+ from this software without specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ memset Written by Dave Gilbert <david.gilbert@linaro.org>
+ Adapted to bzero and Bionic by Bernhard Rosenkraenzer <Bernhard.Rosenkranzer@linaro.org>
+
+ This memset routine is optimised on a Cortex-A9 and should work on
+ all ARMv7 processors. */
+
+#include <machine/asm.h>
+
+ .syntax unified
+ .arch armv7-a
+ .text
+ .thumb
+
+@ ---------------------------------------------------------------------------
+ .thumb_func
+ .p2align 4,,15
+ENTRY(bzero)
+ @ r0 = address
+ @ r1 = count
+ @ Doesn't return anything
+
+ cbz r1, 10f @ Exit if 0 length
+ mov r2, #0
+
+ tst r0, #7
+ beq 2f @ Already aligned
+
+ @ Ok, so we're misaligned here
+1:
+ strb r2, [r0], #1
+ subs r1,r1,#1
+ tst r0, #7
+ cbz r1, 10f @ Exit if we hit the end
+ bne 1b @ go round again if still misaligned
+
+2:
+ @ OK, so we're aligned
+ push {r4,r5,r6,r7}
+ bics r4, r1, #15 @ if less than 16 bytes then need to finish it off
+ beq 5f
+
+3:
+ mov r5,r2
+ mov r6,r2
+ mov r7,r2
+
+4:
+ subs r4,r4,#16
+ stmia r0!,{r2,r5,r6,r7}
+ bne 4b
+ and r1,r1,#15
+
+ @ At this point we're still aligned and we have upto align-1 bytes left to right
+ @ we can avoid some of the byte-at-a time now by testing for some big chunks
+ tst r1,#8
+ itt ne
+ subne r1,r1,#8
+ stmiane r0!,{r2,r5}
+
+5:
+ pop {r4,r5,r6,r7}
+ cbz r1, 10f
+
+ @ Got to do any last < alignment bytes
+6:
+ subs r1,r1,#1
+ strb r2,[r0],#1
+ bne 6b
+
+10:
+ bx lr @ goodbye
+END(bzero)
diff --git a/libc/arch-arm/bionic/armv7/memchr.S b/libc/arch-arm/bionic/armv7/memchr.S
new file mode 100644
index 0000000..02e59b9
--- /dev/null
+++ b/libc/arch-arm/bionic/armv7/memchr.S
@@ -0,0 +1,150 @@
+/* Copyright (c) 2010-2011, Linaro Limited
+ All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without
+ modification, are permitted provided that the following conditions
+ are met:
+
+ * Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in the
+ documentation and/or other materials provided with the distribution.
+
+ * Neither the name of Linaro Limited nor the names of its
+ contributors may be used to endorse or promote products derived
+ from this software without specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ Written by Dave Gilbert <david.gilbert@linaro.org>
+ Adapted to Bionic by Bernhard Rosenkraenzer <Bernhard.Rosenkranzer@linaro.org>
+
+ This memchr routine is optimised on a Cortex-A9 and should work on
+ all ARMv7 processors. It has a fast past for short sizes, and has
+ an optimised path for large data sets; the worst case is finding the
+ match early in a large data set. */
+
+#include <machine/asm.h>
+
+@ 2011-02-07 david.gilbert@linaro.org
+@ Extracted from local git a5b438d861
+@ 2011-07-14 david.gilbert@linaro.org
+@ Import endianness fix from local git ea786f1b
+@ 2011-12-07 david.gilbert@linaro.org
+@ Removed unneeded cbz from align loop
+
+ .syntax unified
+ .arch armv7-a
+
+@ this lets us check a flag in a 00/ff byte easily in either endianness
+#ifdef __ARMEB__
+#define CHARTSTMASK(c) 1<<(31-(c*8))
+#else
+#define CHARTSTMASK(c) 1<<(c*8)
+#endif
+
+@ ---------------------------------------------------------------------------
+ .thumb_func
+ .p2align 4,,15
+ENTRY(memchr)
+ @ r0 = start of memory to scan
+ @ r1 = character to look for
+ @ r2 = length
+ @ returns r0 = pointer to character or NULL if not found
+ and r1,r1,#0xff @ Don't think we can trust the caller to actually pass a char
+
+ cmp r2,#16 @ If it's short don't bother with anything clever
+ blt 20f
+
+ tst r0, #7 @ If it's already aligned skip the next bit
+ beq 10f
+
+ @ Work up to an aligned point
+5:
+ ldrb r3, [r0],#1
+ subs r2, r2, #1
+ cmp r3, r1
+ beq 50f @ If it matches exit found
+ tst r0, #7
+ bne 5b @ If not aligned yet then do next byte
+
+10:
+ @ At this point, we are aligned, we know we have at least 8 bytes to work with
+ push {r4,r5,r6,r7}
+ orr r1, r1, r1, lsl #8 @ expand the match word across to all bytes
+ orr r1, r1, r1, lsl #16
+ bic r4, r2, #7 @ Number of double words to work with
+ mvns r7, #0 @ all F's
+ movs r3, #0
+
+15:
+ ldmia r0!,{r5,r6}
+ subs r4, r4, #8
+ eor r5,r5, r1 @ Get it so that r5,r6 have 00's where the bytes match the target
+ eor r6,r6, r1
+ uadd8 r5, r5, r7 @ Parallel add 0xff - sets the GE bits for anything that wasn't 0
+ sel r5, r3, r7 @ bytes are 00 for none-00 bytes, or ff for 00 bytes - NOTE INVERSION
+ uadd8 r6, r6, r7 @ Parallel add 0xff - sets the GE bits for anything that wasn't 0
+ sel r6, r5, r7 @ chained....bytes are 00 for none-00 bytes, or ff for 00 bytes - NOTE INVERSION
+ cbnz r6, 60f
+ bne 15b @ (Flags from the subs above) If not run out of bytes then go around again
+
+ pop {r4,r5,r6,r7}
+ and r1,r1,#0xff @ Get r1 back to a single character from the expansion above
+ and r2,r2,#7 @ Leave the count remaining as the number after the double words have been done
+
+20:
+ cbz r2, 40f @ 0 length or hit the end already then not found
+
+21: @ Post aligned section, or just a short call
+ ldrb r3,[r0],#1
+ subs r2,r2,#1
+ eor r3,r3,r1 @ r3 = 0 if match - doesn't break flags from sub
+ cbz r3, 50f
+ bne 21b @ on r2 flags
+
+40:
+ movs r0,#0 @ not found
+ bx lr
+
+50:
+ subs r0,r0,#1 @ found
+ bx lr
+
+60: @ We're here because the fast path found a hit - now we have to track down exactly which word it was
+ @ r0 points to the start of the double word after the one that was tested
+ @ r5 has the 00/ff pattern for the first word, r6 has the chained value
+ cmp r5, #0
+ itte eq
+ moveq r5, r6 @ the end is in the 2nd word
+ subeq r0,r0,#3 @ Points to 2nd byte of 2nd word
+ subne r0,r0,#7 @ or 2nd byte of 1st word
+
+ @ r0 currently points to the 3rd byte of the word containing the hit
+ tst r5, # CHARTSTMASK(0) @ 1st character
+ bne 61f
+ adds r0,r0,#1
+ tst r5, # CHARTSTMASK(1) @ 2nd character
+ ittt eq
+ addeq r0,r0,#1
+ tsteq r5, # (3<<15) @ 2nd & 3rd character
+ @ If not the 3rd must be the last one
+ addeq r0,r0,#1
+
+61:
+ pop {r4,r5,r6,r7}
+ subs r0,r0,#1
+ bx lr
+END(memchr)
diff --git a/libc/arch-arm/bionic/armv7/memcpy.S b/libc/arch-arm/bionic/armv7/memcpy.S
new file mode 100644
index 0000000..f906c78
--- /dev/null
+++ b/libc/arch-arm/bionic/armv7/memcpy.S
@@ -0,0 +1,374 @@
+/* Copyright (c) 2010-2011, Linaro Limited
+ All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without
+ modification, are permitted provided that the following conditions
+ are met:
+
+ * Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in the
+ documentation and/or other materials provided with the distribution.
+
+ * Neither the name of Linaro Limited nor the names of its
+ contributors may be used to endorse or promote products derived
+ from this software without specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ Written by Dave Gilbert <david.gilbert@linaro.org>
+ Adapted to Bionic by Bernhard Rosenkraenzer <Bernhard.Rosenkranzer@linaro.org>
+
+ This memcpy routine is optimised on a Cortex-A9 and should work on
+ all ARMv7 processors. */
+
+#include <machine/asm.h>
+
+@ 2011-09-01 david.gilbert@linaro.org
+@ Extracted from local git 2f11b436
+
+ .syntax unified
+ .arch armv7-a
+
+@ this lets us check a flag in a 00/ff byte easily in either endianness
+#ifdef __ARMEB__
+#define CHARTSTMASK(c) 1<<(31-(c*8))
+#else
+#define CHARTSTMASK(c) 1<<(c*8)
+#endif
+ .thumb
+
+#if defined(__ARM_NEON__)
+@ ---------------------------------------------------------------------------
+ .thumb_func
+ .p2align 4,,15
+ENTRY(memcpy)
+ @ r0 = dest
+ @ r1 = source
+ @ r2 = count
+ @ returns dest in r0
+ @ Overlaps of source/dest not allowed according to spec
+ @ Note this routine relies on v7 misaligned loads/stores
+ pld [r1]
+ mov r12, r0 @ stash original r0
+ cmp r2,#32
+ blt 10f @ take the small copy case separately
+
+ @ test for either source or destination being misaligned
+ @ (We only rely on word align)
+ tst r0,#3
+ it eq
+ tsteq r1,#3
+ bne 30f @ misaligned case
+
+4:
+ @ at this point we are word (or better) aligned and have at least
+ @ 32 bytes to play with
+
+ @ If it's a huge copy, try Neon
+ cmp r2, #128*1024
+ bge 35f @ Sharing general non-aligned case here, aligned could be faster
+
+ push {r3,r4,r5,r6,r7,r8,r10,r11}
+5:
+ ldmia r1!,{r3,r4,r5,r6,r7,r8,r10,r11}
+ sub r2,r2,#32
+ pld [r1,#96]
+ cmp r2,#32
+ stmia r0!,{r3,r4,r5,r6,r7,r8,r10,r11}
+ bge 5b
+
+ pop {r3,r4,r5,r6,r7,r8,r10,r11}
+ @ We are now down to less than 32 bytes
+ cbz r2,15f @ quick exit for the case where we copied a multiple of 32
+
+10: @ small copies (not necessarily aligned - note might be slightly more than 32bytes)
+ cmp r2,#4
+ blt 12f
+11:
+ sub r2,r2,#4
+ cmp r2,#4
+ ldr r3, [r1],#4
+ str r3, [r0],#4
+ bge 11b
+12:
+ tst r2,#2
+ itt ne
+ ldrhne r3, [r1],#2
+ strhne r3, [r0],#2
+
+ tst r2,#1
+ itt ne
+ ldrbne r3, [r1],#1
+ strbne r3, [r0],#1
+
+15: @ exit
+ mov r0,r12 @ restore r0
+ bx lr
+
+ .align 2
+ .p2align 4,,15
+30: @ non-aligned - at least 32 bytes to play with
+ @ Test for co-misalignment
+ eor r3, r0, r1
+ tst r3,#3
+ beq 50f
+
+ @ Use Neon for misaligned
+35:
+ vld1.8 {d0,d1,d2,d3}, [r1]!
+ sub r2,r2,#32
+ cmp r2,#32
+ pld [r1,#96]
+ vst1.8 {d0,d1,d2,d3}, [r0]!
+ bge 35b
+ b 10b @ TODO: Probably a bad idea to switch to ARM at this point
+
+ .align 2
+ .p2align 4,,15
+50: @ Co-misaligned
+ @ At this point we've got at least 32 bytes
+51:
+ ldrb r3,[r1],#1
+ sub r2,r2,#1
+ strb r3,[r0],#1
+ tst r0,#7
+ bne 51b
+
+ cmp r2,#32
+ blt 10b
+ b 4b
+END(memcpy)
+#else /* __ARM_NEON__ */
+
+ .thumb
+
+@ ---------------------------------------------------------------------------
+ .thumb_func
+ .p2align 4,,15
+ENTRY(memcpy)
+ @ r0 = dest
+ @ r1 = source
+ @ r2 = count
+ @ returns dest in r0
+ @ Overlaps of source/dest not allowed according to spec
+ @ Note this routine relies on v7 misaligned loads/stores
+ pld [r1]
+ mov r12, r0 @ stash original r0
+ cmp r2,#32
+ blt 10f @ take the small copy case separately
+
+ @ test for either source or destination being misaligned
+ @ (We only rely on word align)
+ @ TODO: Test for co-misalignment
+ tst r0,#3
+ it eq
+ tsteq r1,#3
+ bne 30f @ misaligned case
+
+4:
+ @ at this point we are word (or better) aligned and have at least
+ @ 32 bytes to play with
+ push {r3,r4,r5,r6,r7,r8,r10,r11}
+5:
+ ldmia r1!,{r3,r4,r5,r6,r7,r8,r10,r11}
+ pld [r1,#96]
+ sub r2,r2,#32
+ cmp r2,#32
+ stmia r0!,{r3,r4,r5,r6,r7,r8,r10,r11}
+ bge 5b
+
+ pop {r3,r4,r5,r6,r7,r8,r10,r11}
+ @ We are now down to less than 32 bytes
+ cbz r2,15f @ quick exit for the case where we copied a multiple of 32
+
+10: @ small copies (not necessarily aligned - note might be slightly more than 32bytes)
+ cmp r2,#4
+ blt 12f
+11:
+ sub r2,r2,#4
+ cmp r2,#4
+ ldr r3, [r1],#4
+ str r3, [r0],#4
+ bge 11b
+12:
+ tst r2,#2
+ itt ne
+ ldrhne r3, [r1],#2
+ strhne r3, [r0],#2
+
+ tst r2,#1
+ itt ne
+ ldrbne r3, [r1],#1
+ strbne r3, [r0],#1
+
+15: @ exit
+ mov r0,r12 @ restore r0
+ bx lr
+
+30: @ non-aligned - at least 32 bytes to play with
+ @ On v7 we're allowed to do ldr's and str's from arbitrary alignments
+ @ but not ldrd/strd or ldm/stm
+ @ Note Neon is often a better choice misaligned using vld1
+
+ @ copy a byte at a time until the point where we have an aligned destination
+ @ we know we have enough bytes to go to know we won't run out in this phase
+ tst r0,#7
+ beq 35f
+
+31:
+ ldrb r3,[r1],#1
+ sub r2,r2,#1
+ strb r3,[r0],#1
+ tst r0,#7
+ bne 31b
+
+ cmp r2,#32 @ Lets get back to knowing we have 32 bytes to play with
+ blt 11b
+
+ @ Now the store address is aligned
+35:
+ push {r3,r4,r5,r6,r7,r8,r10,r11,r12,r14}
+ and r6,r1,#3 @ how misaligned we are
+ cmp r6,#2
+ cbz r6, 100f @ Go there if we're actually aligned
+ bge 120f @ And here if it's aligned on 2 or 3 byte
+ @ Note might be worth splitting to bgt and a separate beq
+ @ if the branches are well separated
+
+ @ At this point dest is aligned, source is 1 byte forward
+110:
+ ldr r3,[r1] @ Misaligned load - but it gives the first 4 bytes to store
+ sub r2,r2,#3 @ Number of bytes left in whole words we can load
+ add r1,r1,#3 @ To aligned load address
+ bic r3,r3,#0xff000000
+
+112:
+ ldmia r1!,{r5,r6,r7,r8}
+ sub r2,r2,#32
+ cmp r2,#32
+ pld [r1,#96]
+
+ orr r3,r3,r5,lsl#24
+ mov r4,r5,lsr#8
+ mov r5,r6,lsr#8
+ orr r4,r4,r6,lsl#24
+ mov r6,r7,lsr#8
+ ldmia r1!,{r10,r11,r12,r14}
+ orr r5,r5,r7,lsl#24
+ mov r7,r8,lsr#8
+ orr r6,r6,r8,lsl#24
+ mov r8,r10,lsr#8
+ orr r7,r7,r10,lsl#24
+ mov r10,r11,lsr#8
+ orr r8,r8,r11,lsl#24
+ orr r10,r10,r12,lsl#24
+ mov r11,r12,lsr#8
+ orr r11,r11,r14,lsl#24
+ stmia r0!,{r3,r4,r5,r6,r7,r8,r10,r11}
+ mov r3,r14,lsr#8
+
+ bge 112b
+
+ @ Deal with the stragglers
+ add r2,r2,#3
+ sub r1,r1,#3
+ pop {r3,r4,r5,r6,r7,r8,r10,r11,r12,r14}
+ b 10b
+
+100: @ Dest and source aligned - must have been originally co-misaligned
+ @ Fallback to main aligned case if still big enough
+ pop {r3,r4,r5,r6,r7,r8,r10,r11,r12,r14}
+ b 4b @ Big copies (32 bytes or more)
+
+120: @ Dest is aligned, source is align+2 or 3
+ bgt 130f @ Now split off for 3 byte offset
+
+ ldrh r3,[r1]
+ sub r2,r2,#2 @ Number of bytes left in whole words we can load
+ add r1,r1,#2 @ To aligned load address
+
+122:
+ ldmia r1!,{r5,r6,r7,r8}
+ sub r2,r2,#32
+ cmp r2,#32
+ pld [r1,#96]
+
+ orr r3,r3,r5,lsl#16
+ mov r4,r5,lsr#16
+ mov r5,r6,lsr#16
+ orr r4,r4,r6,lsl#16
+ mov r6,r7,lsr#16
+ ldmia r1!,{r10,r11,r12,r14}
+ orr r5,r5,r7,lsl#16
+ orr r6,r6,r8,lsl#16
+ mov r7,r8,lsr#16
+ orr r7,r7,r10,lsl#16
+ mov r8,r10,lsr#16
+ orr r8,r8,r11,lsl#16
+ mov r10,r11,lsr#16
+ orr r10,r10,r12,lsl#16
+ mov r11,r12,lsr#16
+ orr r11,r11,r14,lsl#16
+ stmia r0!,{r3,r4,r5,r6,r7,r8,r10,r11}
+ mov r3,r14,lsr#16
+
+ bge 122b
+
+ @ Deal with the stragglers
+ add r2,r2,#2
+ sub r1,r1,#2
+ pop {r3,r4,r5,r6,r7,r8,r10,r11,r12,r14}
+ b 10b
+
+130: @ Dest is aligned, source is align+3
+ ldrb r3,[r1]
+ sub r2,r2,#1 @ Number of bytes left in whole words we can load
+ add r1,r1,#1 @ To aligned load address
+
+132:
+ ldmia r1!,{r5,r6,r7,r8}
+ sub r2,r2,#32
+ cmp r2,#32
+ pld [r1,#96]
+
+ orr r3,r3,r5,lsl#8
+ mov r4,r5,lsr#24
+ mov r5,r6,lsr#24
+ orr r4,r4,r6,lsl#8
+ mov r6,r7,lsr#24
+ ldmia r1!,{r10,r11,r12,r14}
+ orr r5,r5,r7,lsl#8
+ mov r7,r8,lsr#24
+ orr r6,r6,r8,lsl#8
+ mov r8,r10,lsr#24
+ orr r7,r7,r10,lsl#8
+ orr r8,r8,r11,lsl#8
+ mov r10,r11,lsr#24
+ orr r10,r10,r12,lsl#8
+ mov r11,r12,lsr#24
+ orr r11,r11,r14,lsl#8
+ stmia r0!,{r3,r4,r5,r6,r7,r8,r10,r11}
+ mov r3,r14,lsr#24
+
+ bge 132b
+
+ @ Deal with the stragglers
+ add r2,r2,#1
+ sub r1,r1,#1
+ pop {r3,r4,r5,r6,r7,r8,r10,r11,r12,r14}
+ b 10b
+END(memcpy)
+#endif
diff --git a/libc/arch-arm/bionic/armv7/memset.S b/libc/arch-arm/bionic/armv7/memset.S
new file mode 100644
index 0000000..d1bcbe8
--- /dev/null
+++ b/libc/arch-arm/bionic/armv7/memset.S
@@ -0,0 +1,119 @@
+/* Copyright (c) 2010-2011, Linaro Limited
+ All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without
+ modification, are permitted provided that the following conditions
+ are met:
+
+ * Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in the
+ documentation and/or other materials provided with the distribution.
+
+ * Neither the name of Linaro Limited nor the names of its
+ contributors may be used to endorse or promote products derived
+ from this software without specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ Written by Dave Gilbert <david.gilbert@linaro.org>
+ Adapted to Bionic by Bernhard Rosenkraenzer <Bernhard.Rosenkranzer@linaro.org>
+
+ This memset routine is optimised on a Cortex-A9 and should work on
+ all ARMv7 processors. */
+
+#include <machine/asm.h>
+
+ .syntax unified
+ .arch armv7-a
+
+@ 2011-08-30 david.gilbert@linaro.org
+@ Extracted from local git 2f11b436
+
+@ this lets us check a flag in a 00/ff byte easily in either endianness
+#ifdef __ARMEB__
+#define CHARTSTMASK(c) 1<<(31-(c*8))
+#else
+#define CHARTSTMASK(c) 1<<(c*8)
+#endif
+ .text
+ .thumb
+
+@ ---------------------------------------------------------------------------
+ .thumb_func
+ .p2align 4,,15
+ENTRY(memset)
+ @ r0 = address
+ @ r1 = character
+ @ r2 = count
+ @ returns original address in r0
+
+ mov r3, r0 @ Leave r0 alone
+ cbz r2, 10f @ Exit if 0 length
+
+ tst r0, #7
+ beq 2f @ Already aligned
+
+ @ Ok, so we're misaligned here
+1:
+ strb r1, [r3], #1
+ subs r2,r2,#1
+ tst r3, #7
+ cbz r2, 10f @ Exit if we hit the end
+ bne 1b @ go round again if still misaligned
+
+2:
+ @ OK, so we're aligned
+ push {r4,r5,r6,r7}
+ bics r4, r2, #15 @ if less than 16 bytes then need to finish it off
+ beq 5f
+
+3:
+ @ POSIX says that ch is cast to an unsigned char. A uxtb is one
+ @ byte and takes two cycles, where an AND is four bytes but one
+ @ cycle.
+ and r1, #0xFF
+ orr r1, r1, r1, lsl#8 @ Same character into all bytes
+ orr r1, r1, r1, lsl#16
+ mov r5,r1
+ mov r6,r1
+ mov r7,r1
+
+4:
+ subs r4,r4,#16
+ stmia r3!,{r1,r5,r6,r7}
+ bne 4b
+ and r2,r2,#15
+
+ @ At this point we're still aligned and we have upto align-1 bytes left to right
+ @ we can avoid some of the byte-at-a time now by testing for some big chunks
+ tst r2,#8
+ itt ne
+ subne r2,r2,#8
+ stmiane r3!,{r1,r5}
+
+5:
+ pop {r4,r5,r6,r7}
+ cbz r2, 10f
+
+ @ Got to do any last < alignment bytes
+6:
+ subs r2,r2,#1
+ strb r1,[r3],#1
+ bne 6b
+
+10:
+ bx lr @ goodbye
+END(memset)
diff --git a/libc/arch-arm/bionic/armv7/strchr.S b/libc/arch-arm/bionic/armv7/strchr.S
new file mode 100644
index 0000000..370aac9
--- /dev/null
+++ b/libc/arch-arm/bionic/armv7/strchr.S
@@ -0,0 +1,77 @@
+/* Copyright (c) 2010-2011, Linaro Limited
+ All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without
+ modification, are permitted provided that the following conditions
+ are met:
+
+ * Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in the
+ documentation and/or other materials provided with the distribution.
+
+ * Neither the name of Linaro Limited nor the names of its
+ contributors may be used to endorse or promote products derived
+ from this software without specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ Written by Dave Gilbert <david.gilbert@linaro.org>
+ Adapted to Bionic by Bernhard Rosenkraenzer <Bernhard.Rosenkranzer@linaro.org>
+
+ A very simple strchr routine, from benchmarks on A9 it's a bit faster than
+ the current version in eglibc (2.12.1-0ubuntu14 package)
+ I don't think doing a word at a time version is worth it since a lot
+ of strchr cases are very short anyway */
+
+#include <machine/asm.h>
+
+@ 2011-02-07 david.gilbert@linaro.org
+@ Extracted from local git a5b438d861
+
+ .syntax unified
+ .arch armv7-a
+
+ .text
+ .thumb
+
+@ ---------------------------------------------------------------------------
+
+ .thumb_func
+ .p2align 4,,15
+ENTRY(strchr)
+ @ r0 = start of string
+ @ r1 = character to match
+ @ returns NULL for no match, or a pointer to the match
+ and r1,r1, #255
+
+1:
+ ldrb r2,[r0],#1
+ cmp r2,r1
+ cbz r2,10f
+ bne 1b
+
+ @ We're here if it matched
+5:
+ subs r0,r0,#1
+ bx lr
+
+10:
+ @ We're here if we ran off the end
+ cmp r1, #0 @ Corner case - you're allowed to search for the nil and get a pointer to it
+ beq 5b @ A bit messy, if it's common we should branch at the start to a special loop
+ mov r0,#0
+ bx lr
+END(strchr)
diff --git a/libc/arch-arm/bionic/armv7/strcpy.c b/libc/arch-arm/bionic/armv7/strcpy.c
new file mode 100644
index 0000000..ce74de6
--- /dev/null
+++ b/libc/arch-arm/bionic/armv7/strcpy.c
@@ -0,0 +1,179 @@
+/*
+ * Copyright (c) 2008 ARM Ltd
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. The name of the company may not be used to endorse or promote
+ * products derived from this software without specific prior written
+ * permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY ARM LTD ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL ARM LTD BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
+ * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+
+#ifdef __thumb2__
+#define magic1(REG) "#0x01010101"
+#define magic2(REG) "#0x80808080"
+#else
+#define magic1(REG) #REG
+#define magic2(REG) #REG ", lsl #7"
+#endif
+
+#pragma GCC diagnostic push
+/* gcc fails to see the fat that the assembly code
+ * takes care of a return value, causing a
+ * "control reaches end of non-void function"
+ * warning (and, of course, error when building
+ * with -Werror).
+ * Let's disable that warning just for this
+ * function, where we know it's bogus. */
+#pragma GCC diagnostic ignored "-Wreturn-type"
+
+char* __attribute__((naked))
+strcpy (char* dst, const char* src)
+{
+ asm (
+#if !(defined(__OPTIMIZE_SIZE__) || defined (PREFER_SIZE_OVER_SPEED) || \
+ (defined (__thumb__) && !defined (__thumb2__)))
+ "pld [r1, #0]\n\t"
+ "eor r2, r0, r1\n\t"
+ "mov ip, r0\n\t"
+ "tst r2, #3\n\t"
+ "bne 4f\n\t"
+ "tst r1, #3\n\t"
+ "bne 3f\n"
+ "5:\n\t"
+#ifndef __thumb2__
+ "str r5, [sp, #-4]!\n\t"
+ "mov r5, #0x01\n\t"
+ "orr r5, r5, r5, lsl #8\n\t"
+ "orr r5, r5, r5, lsl #16\n\t"
+#endif
+
+ "str r4, [sp, #-4]!\n\t"
+ "tst r1, #4\n\t"
+ "ldr r3, [r1], #4\n\t"
+ "beq 2f\n\t"
+ "sub r2, r3, "magic1(r5)"\n\t"
+ "bics r2, r2, r3\n\t"
+ "tst r2, "magic2(r5)"\n\t"
+ "itt eq\n\t"
+ "streq r3, [ip], #4\n\t"
+ "ldreq r3, [r1], #4\n"
+ "bne 1f\n\t"
+ /* Inner loop. We now know that r1 is 64-bit aligned, so we
+ can safely fetch up to two words. This allows us to avoid
+ load stalls. */
+ ".p2align 2\n"
+ "2:\n\t"
+ "pld [r1, #8]\n\t"
+ "ldr r4, [r1], #4\n\t"
+ "sub r2, r3, "magic1(r5)"\n\t"
+ "bics r2, r2, r3\n\t"
+ "tst r2, "magic2(r5)"\n\t"
+ "sub r2, r4, "magic1(r5)"\n\t"
+ "bne 1f\n\t"
+ "str r3, [ip], #4\n\t"
+ "bics r2, r2, r4\n\t"
+ "tst r2, "magic2(r5)"\n\t"
+ "itt eq\n\t"
+ "ldreq r3, [r1], #4\n\t"
+ "streq r4, [ip], #4\n\t"
+ "beq 2b\n\t"
+ "mov r3, r4\n"
+ "1:\n\t"
+#ifdef __ARMEB__
+ "rors r3, r3, #24\n\t"
+#endif
+ "strb r3, [ip], #1\n\t"
+ "tst r3, #0xff\n\t"
+#ifdef __ARMEL__
+ "ror r3, r3, #8\n\t"
+#endif
+ "bne 1b\n\t"
+ "ldr r4, [sp], #4\n\t"
+#ifndef __thumb2__
+ "ldr r5, [sp], #4\n\t"
+#endif
+ "BX LR\n"
+
+ /* Strings have the same offset from word alignment, but it's
+ not zero. */
+ "3:\n\t"
+ "tst r1, #1\n\t"
+ "beq 1f\n\t"
+ "ldrb r2, [r1], #1\n\t"
+ "strb r2, [ip], #1\n\t"
+ "cmp r2, #0\n\t"
+ "it eq\n"
+ "BXEQ LR\n"
+ "1:\n\t"
+ "tst r1, #2\n\t"
+ "beq 5b\n\t"
+ "ldrh r2, [r1], #2\n\t"
+#ifdef __ARMEB__
+ "tst r2, #0xff00\n\t"
+ "iteet ne\n\t"
+ "strneh r2, [ip], #2\n\t"
+ "lsreq r2, r2, #8\n\t"
+ "streqb r2, [ip]\n\t"
+ "tstne r2, #0xff\n\t"
+#else
+ "tst r2, #0xff\n\t"
+ "itet ne\n\t"
+ "strneh r2, [ip], #2\n\t"
+ "streqb r2, [ip]\n\t"
+ "tstne r2, #0xff00\n\t"
+#endif
+ "bne 5b\n\t"
+ "BX LR\n"
+
+ /* src and dst do not have a common word-alignement. Fall back to
+ byte copying. */
+ "4:\n\t"
+ "ldrb r2, [r1], #1\n\t"
+ "strb r2, [ip], #1\n\t"
+ "cmp r2, #0\n\t"
+ "bne 4b\n\t"
+ "BX LR"
+
+#elif !defined (__thumb__) || defined (__thumb2__)
+ "mov r3, r0\n\t"
+ "1:\n\t"
+ "ldrb r2, [r1], #1\n\t"
+ "strb r2, [r3], #1\n\t"
+ "cmp r2, #0\n\t"
+ "bne 1b\n\t"
+ "BX LR"
+#else
+ "mov r3, r0\n\t"
+ "1:\n\t"
+ "ldrb r2, [r1]\n\t"
+ "add r1, r1, #1\n\t"
+ "strb r2, [r3]\n\t"
+ "add r3, r3, #1\n\t"
+ "cmp r2, #0\n\t"
+ "bne 1b\n\t"
+ "BX LR"
+#endif
+ );
+}
+
+#pragma GCC diagnostic pop
diff --git a/libc/arch-arm/bionic/armv7/strlen.S b/libc/arch-arm/bionic/armv7/strlen.S
new file mode 100644
index 0000000..5b76826
--- /dev/null
+++ b/libc/arch-arm/bionic/armv7/strlen.S
@@ -0,0 +1,113 @@
+/* Copyright (c) 2010-2011, Linaro Limited
+ All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without
+ modification, are permitted provided that the following conditions
+ are met:
+
+ * Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in the
+ documentation and/or other materials provided with the distribution.
+
+ * Neither the name of Linaro Limited nor the names of its
+ contributors may be used to endorse or promote products derived
+ from this software without specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ Written by Dave Gilbert <david.gilbert@linaro.org>
+ Adapted to Bionic by Bernhard Rosenkraenzer <bernhard.rosenkranzer@linaro.org>
+
+ This strlen routine is optimised on a Cortex-A9 and should work on
+ all ARMv7 processors. This routine is reasonably fast for short
+ strings, but is probably slower than a simple implementation if all
+ your strings are very short */
+
+@ 2011-02-08 david.gilbert@linaro.org
+@ Extracted from local git 6848613a
+
+
+@ this lets us check a flag in a 00/ff byte easily in either endianness
+
+#include <machine/asm.h>
+
+#ifdef __ARMEB__
+#define CHARTSTMASK(c) 1<<(31-(c*8))
+#else
+#define CHARTSTMASK(c) 1<<(c*8)
+#endif
+
+@-----------------------------------------------------------------------------------------------------------------------------
+ .syntax unified
+ .arch armv7-a
+
+ .thumb_func
+ .p2align 4,,15
+ENTRY(strlen)
+ @ r0 = string
+ @ returns count of bytes in string not including terminator
+ mov r1, r0
+ push { r4,r6 }
+ mvns r6, #0 @ all F
+ movs r4, #0
+ tst r0, #7
+ beq 2f
+
+1:
+ ldrb r2, [r1], #1
+ tst r1, #7 @ Hit alignment yet?
+ cbz r2, 10f @ Exit if we found the 0
+ bne 1b
+
+ @ So we're now aligned
+2:
+ ldmia r1!,{r2,r3}
+ uadd8 r2, r2, r6 @ Parallel add 0xff - sets the GE bits for anything that wasn't 0
+ sel r2, r4, r6 @ bytes are 00 for none-00 bytes, or ff for 00 bytes - NOTE INVERSION
+ uadd8 r3, r3, r6 @ Parallel add 0xff - sets the GE bits for anything that wasn't 0
+ sel r3, r2, r6 @ bytes are 00 for none-00 bytes, or ff for 00 bytes - NOTE INVERSION
+ cmp r3, #0
+ beq 2b
+
+strlenendtmp:
+ @ One (or more) of the bytes we loaded was 0 - but which one?
+ @ r2 has the mask corresponding to the first loaded word
+ @ r3 has a combined mask of the two words - but if r2 was all-non 0
+ @ then it's just the 2nd words
+ cmp r2, #0
+ itte eq
+ moveq r2, r3 @ the end is in the 2nd word
+ subeq r1,r1,#3
+ subne r1,r1,#7
+
+ @ r1 currently points to the 2nd byte of the word containing the 0
+ tst r2, # CHARTSTMASK(0) @ 1st character
+ bne 10f
+ adds r1,r1,#1
+ tst r2, # CHARTSTMASK(1) @ 2nd character
+ ittt eq
+ addeq r1,r1,#1
+ tsteq r2, # (3<<15) @ 2nd & 3rd character
+ @ If not the 3rd must be the last one
+ addeq r1,r1,#1
+
+10:
+ @ r0 is still at the beginning, r1 is pointing 1 byte after the terminator
+ sub r0, r1, r0
+ subs r0, r0, #1
+ pop { r4, r6 }
+ bx lr
+END(strlen)

0002-bionic-libc-Aurora-NEON-optimized-memmove-bcopy.patch

From f45ed5a0bb47b66f5eab2026dd76a29aa1b32d18 Mon Sep 17 00:00:00 2001
From: River Zhou <riverzhou2000@gmail.com>
Date: Sat, 6 Oct 2012 14:12:12 +0800
Subject: [PATCH 2/4] bionic: libc: Aurora: NEON optimized memmove bcopy

---
 libc/Android.mk | 13 ++-
 libc/arch-arm/bionic/memmove.S | 197 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 207 insertions(+), 3 deletions(-)
 create mode 100644 libc/arch-arm/bionic/memmove.S

diff --git a/libc/Android.mk b/libc/Android.mk
index 1ffc3ca..105f788 100644
--- a/libc/Android.mk
+++ b/libc/Android.mk
@@ -267,7 +267,6 @@ libc_common_src_files := \
  bionic/libc_init_common.c \
  bionic/logd_write.c \
  bionic/md5.c \
- bionic/memmove_words.c \
  bionic/pututline.c \
  bionic/realpath.c \
  bionic/sched_getaffinity.c \
@@ -364,11 +363,19 @@ libc_common_src_files += \
  arch-arm/bionic/sigsetjmp.S \
  arch-arm/bionic/strcmp.S \
  arch-arm/bionic/syscall.S \
- string/memmove.c.arm \
- string/bcopy.c \
  string/strncmp.c \
  unistd/socketcalls.c

+ifeq ($(TARGET_USE_OMAP4_BIONIC_OPTIMIZATION),true)
+libc_common_src_files += \
+ arch-arm/bionic/memmove.S
+else
+libc_common_src_files += \
+ string/bcopy.c \
+ string/memmove.c.arm \
+ bionic/memmove_words.c
+endif
+
 # These files need to be arm so that gdbserver
 # can set breakpoints in them without messing
 # up any thumb code.
diff --git a/libc/arch-arm/bionic/memmove.S b/libc/arch-arm/bionic/memmove.S
new file mode 100644
index 0000000..e6c17c7
--- /dev/null
+++ b/libc/arch-arm/bionic/memmove.S
@@ -0,0 +1,197 @@
+/***************************************************************************
+ Copyright (c) 2009-2012 Code Aurora Forum. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without
+ modification, are permitted provided that the following conditions are met:
+ * Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in the
+ documentation and/or other materials provided with the distribution.
+ * Neither the name of Code Aurora nor the names of its contributors may
+ be used to endorse or promote products derived from this software
+ without specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ POSSIBILITY OF SUCH DAMAGE.
+ ***************************************************************************/
+
+/***************************************************************************
+ * Neon memmove: Attempts to do a memmove with Neon registers if possible,
+ * Inputs:
+ * dest: The destination buffer
+ * src: The source buffer
+ * n: The size of the buffer to transfer
+ * Outputs:
+ *
+ ***************************************************************************/
+
+#include <machine/cpu-features.h>
+
+#ifndef PLDOFFS
+#define PLDOFFS (10)
+#endif
+#ifndef PLDTHRESH
+#define PLDTHRESH (PLDOFFS)
+#endif
+#if (PLDOFFS < 5)
+#error Routine does not support offsets less than 5
+#endif
+#if (PLDTHRESH < PLDOFFS)
+#error PLD threshold must be greater than or equal to the PLD offset
+#endif
+#ifndef PLDSIZE
+#define PLDSIZE (32)
+#endif
+#define NOP_OPCODE (0xe320f000)
+
+ .code 32
+ .align 5
+ .global memmove
+ .type memmove, %function
+
+ .global _memmove_words
+ .type _memmove_words, %function
+
+ .global bcopy
+ .type bcopy, %function
+
+bcopy:
+ mov r12, r0
+ mov r0, r1
+ mov r1, r12
+ .balignl 64, NOP_OPCODE, 4*2
+memmove:
+_memmove_words:
+.Lneon_memmove_cmf:
+ subs r12, r0, r1
+ bxeq lr
+ cmphi r2, r12
+ bls memcpy /* Use memcpy for non-overlapping areas */
+
+ push {r0}
+
+.Lneon_back_to_front_copy:
+ add r0, r0, r2
+ add r1, r1, r2
+ cmp r2, #4
+ bgt .Lneon_b2f_gt4
+ cmp r2, #0
+.Lneon_b2f_smallcopy_loop:
+ beq .Lneon_memmove_done
+ ldrb r12, [r1, #-1]!
+ subs r2, r2, #1
+ strb r12, [r0, #-1]!
+ b .Lneon_b2f_smallcopy_loop
+.Lneon_b2f_gt4:
+ sub r3, r0, r1
+ cmp r2, r3
+ movle r12, r2
+ movgt r12, r3
+ cmp r12, #64
+ bge .Lneon_b2f_copy_64
+ cmp r12, #32
+ bge .Lneon_b2f_copy_32
+ cmp r12, #8
+ bge .Lneon_b2f_copy_8
+ cmp r12, #4
+ bge .Lneon_b2f_copy_4
+ b .Lneon_b2f_copy_1
+.Lneon_b2f_copy_64:
+ sub r1, r1, #64 /* Predecrement */
+ sub r0, r0, #64
+ movs r12, r2, lsr #6
+ cmp r12, #PLDTHRESH
+ ble .Lneon_b2f_copy_64_loop_nopld
+ sub r12, #PLDOFFS
+ pld [r1, #-(PLDOFFS-5)*PLDSIZE]
+ pld [r1, #-(PLDOFFS-4)*PLDSIZE]
+ pld [r1, #-(PLDOFFS-3)*PLDSIZE]
+ pld [r1, #-(PLDOFFS-2)*PLDSIZE]
+ pld [r1, #-(PLDOFFS-1)*PLDSIZE]
+ .balignl 64, NOP_OPCODE, 4*2
+.Lneon_b2f_copy_64_loop_outer:
+ pld [r1, #-(PLDOFFS)*PLDSIZE]
+ vld1.32 {q0, q1}, [r1]!
+ vld1.32 {q2, q3}, [r1]
+ subs r12, r12, #1
+ vst1.32 {q0, q1}, [r0]!
+ sub r1, r1, #96 /* Post-fixup and predecrement */
+ vst1.32 {q2, q3}, [r0]
+ sub r0, r0, #96
+ bne .Lneon_b2f_copy_64_loop_outer
+ mov r12, #PLDOFFS
+ .balignl 64, NOP_OPCODE, 4*2
+.Lneon_b2f_copy_64_loop_nopld:
+ vld1.32 {q8, q9}, [r1]!
+ vld1.32 {q10, q11}, [r1]
+ subs r12, r12, #1
+ vst1.32 {q8, q9}, [r0]!
+ sub r1, r1, #96 /* Post-fixup and predecrement */
+ vst1.32 {q10, q11}, [r0]
+ sub r0, r0, #96
+ bne .Lneon_b2f_copy_64_loop_nopld
+ ands r2, r2, #0x3f
+ beq .Lneon_memmove_done
+ add r1, r1, #64 /* Post-fixup */
+ add r0, r0, #64
+ cmp r2, #32
+ blt .Lneon_b2f_copy_finish
+.Lneon_b2f_copy_32:
+ mov r12, r2, lsr #5
+.Lneon_b2f_copy_32_loop:
+ sub r1, r1, #32 /* Predecrement */
+ sub r0, r0, #32
+ vld1.32 {q0,q1}, [r1]
+ subs r12, r12, #1
+ vst1.32 {q0,q1}, [r0]
+ bne .Lneon_b2f_copy_32_loop
+ ands r2, r2, #0x1f
+ beq .Lneon_memmove_done
+.Lneon_b2f_copy_finish:
+.Lneon_b2f_copy_8:
+ movs r12, r2, lsr #0x3
+ beq .Lneon_b2f_copy_4
+ .balignl 64, NOP_OPCODE, 4*2
+.Lneon_b2f_copy_8_loop:
+ sub r1, r1, #8 /* Predecrement */
+ sub r0, r0, #8
+ vld1.32 {d0}, [r1]
+ subs r12, r12, #1
+ vst1.32 {d0}, [r0]
+ bne .Lneon_b2f_copy_8_loop
+ ands r2, r2, #0x7
+ beq .Lneon_memmove_done
+.Lneon_b2f_copy_4:
+ movs r12, r2, lsr #0x2
+ beq .Lneon_b2f_copy_1
+.Lneon_b2f_copy_4_loop:
+ ldr r3, [r1, #-4]!
+ subs r12, r12, #1
+ str r3, [r0, #-4]!
+ bne .Lneon_b2f_copy_4_loop
+ ands r2, r2, #0x3
+.Lneon_b2f_copy_1:
+ cmp r2, #0
+ beq .Lneon_memmove_done
+ .balignl 64, NOP_OPCODE, 4*2
+.Lneon_b2f_copy_1_loop:
+ ldrb r12, [r1, #-1]!
+ subs r2, r2, #1
+ strb r12, [r0, #-1]!
+ bne .Lneon_b2f_copy_1_loop
+
+.Lneon_memmove_done:
+ pop {r0}
+ bx lr
+
+ .end

0003-bionic-libm-linaro-proper-sincos-implementation.patch

From 5709b6c1152c4f1e98cfbccedca1068147279a08 Mon Sep 17 00:00:00 2001
From: River Zhou <riverzhou2000@gmail.com>
Date: Sat, 6 Oct 2012 14:12:59 +0800
Subject: [PATCH 3/4] bionic: libm: linaro: proper sincos implementation

---
 libm/sincos.c | 144 +++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 131 insertions(+), 13 deletions(-)

diff --git a/libm/sincos.c b/libm/sincos.c
index e9f6dcc..603a00b 100644
--- a/libm/sincos.c
+++ b/libm/sincos.c
@@ -26,29 +26,147 @@
  */
 #define _GNU_SOURCE 1
 #include <math.h>
+#define INLINE_KERNEL_COSDF
+#define INLINE_KERNEL_SINDF
+#include "src/math_private.h"
+#include "src/k_cosf.c"
+#include "src/k_sinf.c"

-// Disable sincos optimization for all functions in this file,
-// otherwise gcc would generate infinite calls.
-// Refer to gcc PR46926.
-// -fno-builtin-sin or -fno-builtin-cos can disable sincos optimization,
-// but these two options do not work inside optimize pragma in-file.
-// Thus we just enforce -O0 when compiling this file.
-#pragma GCC optimize ("O0")
+/* Small multiples of pi/2 rounded to double precision. */
+static const double
+s1pio2 = 1*M_PI_2, /* 0x3FF921FB, 0x54442D18 */
+s2pio2 = 2*M_PI_2, /* 0x400921FB, 0x54442D18 */
+s3pio2 = 3*M_PI_2, /* 0x4012D97C, 0x7F3321D2 */
+s4pio2 = 4*M_PI_2; /* 0x401921FB, 0x54442D18 */

+/* For implementation details, see src/s_sin.c, src/s_cos.c */
 void sincos(double x, double *psin, double *pcos)
 {
- *psin = sin(x);
- *pcos = cos(x);
+ double y[2], z=0.0;
+ int32_t n, ix;
+
+ /* High word of x. */
+ GET_HIGH_WORD(ix, x);
+
+ /* |x| ~< pi/4 */
+ ix &= 0x7fffffff;
+ if(ix <= 0x3fe921fb) {
+ if(ix < 0x3e400000) { /* \x\ < 2**-27 */
+ if((int)x==0) { /* generate inexact */
+ *psin = x;
+ *pcos = 1.0;
+ return;
+ }
+ }
+ *psin = __kernel_sin(x, z, 0);
+ *pcos = __kernel_cos(x, z);
+ return;
+ } else if(ix>=0x7ff00000) { /* sin(Inf or NaN) and cos(Inf or NaN) is NaN */
+ *psin = *pcos = x-x;
+ return;
+ } else {
+ n = __ieee754_rem_pio2(x, y);
+ switch(n&3) {
+ case 0:
+ *psin = __kernel_sin(y[0],y[1],1);
+ *pcos = __kernel_cos(y[0],y[1]);
+ return;
+ case 1:
+ *psin = __kernel_cos(y[0],y[1]);
+ *pcos = -__kernel_sin(y[0],y[1],1);
+ return;
+ case 2:
+ *psin = -__kernel_sin(y[0],y[1],1);
+ *pcos = -__kernel_cos(y[0],y[1]);
+ return;
+ default:
+ *psin = -__kernel_cos(y[0],y[1]);
+ *pcos = __kernel_sin(y[0],y[1],1);
+ return;
+ }
+ }
 }

+/* For implementation details, see src/s_sinf.c, src/s_cosf.c */
 void sincosf(float x, float *psin, float *pcos)
 {
- *psin = sinf(x);
- *pcos = cosf(x);
+ float y[2];
+ int32_t n, hx, ix;
+
+ GET_FLOAT_WORD(hx, x);
+ ix = hx & 0x7fffffff;
+
+ if(ix <= 0x3f490fda) { /* |x| ~<= pi/4 */
+ if(ix < 0x39800000) { /* |x| < 2**-12 */
+ if(((int)x)==0) { /* x with inexact if x != 0 */
+ *psin = x;
+ *pcos = 1.0;
+ return;
+ }
+ }
+ *psin = __kernel_sindf(x);
+ *pcos = __kernel_cosdf(x);
+ return;
+ } else if(ix <= 0x407b53d1) { /* |x| ~<= 5*pi/4 */
+ if(ix <= 0x4016cbe3) { /* |x| ~<= 3pi/4 */
+ if(hx>0) {
+ *psin = __kernel_cosdf(x - s1pio2);
+ *pcos = __kernel_sindf(s1pio2 - x);
+ return;
+ } else {
+ *psin = -__kernel_cosdf(x + s1pio2);
+ *pcos = __kernel_sindf(x + s1pio2);
+ return;
+ }
+ } else {
+ *psin = __kernel_sindf((hx > 0 ? s2pio2 : -s2pio2) - x);
+ *pcos = -__kernel_cosdf(x + (hx > 0 ? -s2pio2 : s2pio2));
+ return;
+ }
+ } else if(ix <= 0x40e231d5) { /* |x| ~<= 9*pi/4 */
+ if(ix <= 0x40afeddf) { /* |x| ~<= 7*pi/4 */
+ if(hx>0) {
+ *psin = -__kernel_cosdf(x - s3pio2);
+ *pcos = __kernel_sindf(x - s3pio2);
+ return;
+ } else {
+ *psin = __kernel_cosdf(x + s3pio2);
+ *pcos = __kernel_sindf(-s3pio2 - x);
+ return;
+ }
+ } else {
+ *psin = __kernel_sindf(x + (hx > 0 ? -s4pio2 : s4pio2));
+ *pcos = __kernel_cosdf(x + (hx > 0 ? -s4pio2 : s4pio2));
+ return;
+ }
+ } else if(ix>=0x7f800000) { /* sin and cos (Inf or NaN) is NaN */
+ *psin = *pcos = x-x;
+ return;
+ } else {
+ n = __ieee754_rem_pio2f(x,y);
+ switch(n&3) {
+ case 0:
+ *psin = __kernel_sindf((double)y[0]+y[1]);
+ *pcos = __kernel_cosdf((double)y[0]+y[1]);
+ return;
+ case 1:
+ *psin = __kernel_cosdf((double)y[0]+y[1]);
+ *pcos = __kernel_sindf(-(double)y[0]-y[1]);
+ return;
+ case 2:
+ *psin = __kernel_sindf(-(double)y[0]-y[1]);
+ *pcos = -__kernel_cosdf((double)y[0]+y[1]);
+ return;
+ default:
+ *psin = -__kernel_cosdf((double)y[0]+y[1]);
+ *pcos = __kernel_sindf((double)y[0]+y[1]);
+ return;
+ }
+ }
 }

 void sincosl(long double x, long double *psin, long double *pcos)
 {
- *psin = sin(x);
- *pcos = cos(x);
+ *psin = sin(x);
+ *pcos = cos(x);
 }

0004-bionic-libm-Aurora-NEON-optimized-e_pow.patch

From c4d87d5c8168b7259ab0711668e5e574833009dc Mon Sep 17 00:00:00 2001
From: River Zhou <riverzhou2000@gmail.com>
Date: Sat, 6 Oct 2012 14:19:33 +0800
Subject: [PATCH 4/4] bionic: libm: Aurora: NEON optimized e_pow

---
 libm/Android.mk | 11 ++
 libm/arm/e_pow.S | 430 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 libm/src/e_pow.c | 8 +
 3 files changed, 449 insertions(+)
 create mode 100644 libm/arm/e_pow.S

diff --git a/libm/Android.mk b/libm/Android.mk
index 57e4d4c..acb5a67 100644
--- a/libm/Android.mk
+++ b/libm/Android.mk
@@ -152,6 +152,11 @@ libm_common_src_files:= \
  src/s_isnan.c \
  src/s_modf.c

+ifeq ($(TARGET_USE_OMAP4_BIONIC_OPTIMIZATION),true)
+ libm_common_src_files += \
+ arm/e_pow.S
+ libm_common_cflags += -DOMAP4_NEON_OPTIMIZATION
+endif

 ifeq ($(TARGET_ARCH),arm)
   libm_common_src_files += \
@@ -183,6 +188,9 @@ endif

 include $(CLEAR_VARS)

+LOCAL_CFLAGS := \
+ $(libm_common_cflags)
+
 LOCAL_SRC_FILES := \
     $(libm_common_src_files)

@@ -200,6 +208,9 @@ include $(BUILD_STATIC_LIBRARY)

 include $(CLEAR_VARS)

+LOCAL_CFLAGS := \
+ $(libm_common_cflags)
+
 LOCAL_SRC_FILES := \
     $(libm_common_src_files)

diff --git a/libm/arm/e_pow.S b/libm/arm/e_pow.S
new file mode 100644
index 0000000..dd7cd1e
--- /dev/null
+++ b/libm/arm/e_pow.S
@@ -0,0 +1,430 @@
+@ Copyright (c) 2012, Code Aurora Forum. All rights reserved.
+@
+@ Redistribution and use in source and binary forms, with or without
+@ modification, are permitted provided that the following conditions are
+@ met:
+@ * Redistributions of source code must retain the above copyright
+@ notice, this list of conditions and the following disclaimer.
+@ * Redistributions in binary form must reproduce the above
+@ copyright notice, this list of conditions and the following
+@ disclaimer in the documentation and/or other materials provided
+@ with the distribution.
+@ * Neither the name of Code Aurora Forum, Inc. nor the names of its
+@ contributors may be used to endorse or promote products derived
+@ from this software without specific prior written permission.
+@
+@ THIS SOFTWARE IS PROVIDED "AS IS" AND ANY EXPRESS OR IMPLIED
+@ WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+@ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT
+@ ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
+@ BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+@ CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+@ SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+@ BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+@ WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+@ OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+@ IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+#include <machine/cpu-features.h>
+#include <machine/asm.h>
+
+@ Values which exist the program lifetime:
+#define HIGH_WORD_MASK d31
+#define EXPONENT_MASK d30
+#define int_1 d29
+#define double_1 d28
+@ sign and 2^int_n fixup:
+#define expadjustment d7
+#define literals r10
+@ Values which exist within both polynomial implementations:
+#define int_n d2
+#define int_n_low s4
+#define int_n_high s5
+#define double_n d3
+#define k1 d27
+#define k2 d26
+#define k3 d25
+#define k4 d24
+@ Values which cross the boundaries between polynomial implementations:
+#define ss d16
+#define ss2 d17
+#define ss4 d18
+#define Result d0
+#define Return_hw r1
+#define Return_lw r0
+#define ylg2x d0
+@ Intermediate values only needed sometimes:
+@ initial (sorted in approximate order of availability for overwriting):
+#define x_hw r1
+#define x_lw r0
+#define y_hw r3
+#define y_lw r2
+#define x d0
+#define bp d4
+#define y d1
+@ log series:
+#define u d19
+#define v d20
+#define lg2coeff d21
+#define bpa d5
+#define bpb d3
+#define lg2const d6
+#define xmantissa r8
+#define twoto1o5 r4
+#define twoto3o5 r5
+#define ix r6
+#define iEXP_MASK r7
+@ exp input setup:
+#define twoto1o8mask d3
+#define twoto1o4mask d4
+#define twoto1o2mask d1
+#define ylg2x_round_offset d16
+#define ylg2x_temp d17
+#define yn_temp d18
+#define yn_round_offset d19
+#define ln2 d5
+@ Careful, overwriting HIGH_WORD_MASK, reset it if you need it again ...
+#define rounded_exponent d31
+@ exp series:
+#define k5 d23
+#define k6 d22
+#define k7 d21
+#define k8 d20
+#define ss3 d19
+@ overwrite double_1 (we're done with it by now)
+#define k0 d28
+#define twoto1o4 d6
+
+@instructions that gas doesn't like to encode correctly:
+#define vmov_f64 fconstd
+#define vmov_f32 fconsts
+#define vmovne_f64 fconstdne
+
+ENTRY(pow_neon)
+ push {r4, r5, r6, r7, r8, r9, r10, lr}
+
+ @ pre-staged bp values
+ vldr bpa, .LbpA
+ vldr bpb, .LbpB
+ @ load two fifths into constant term in case we need it due to offsets
+ vldr lg2const, .Ltwofifths
+
+ @ bp is initially 1.0, may adjust later based on x value
+ vmov_f64 bp, #0x70
+
+ @ extract the mantissa from x for scaled value comparisons
+ lsl xmantissa, x_hw, #12
+
+ @ twoto1o5 = 2^(1/5) (input bracketing)
+ movw twoto1o5, #0x186c
+ movt twoto1o5, #0x2611
+ @ twoto3o5 = 2^(3/5) (input bracketing)
+ movw twoto3o5, #0x003b
+ movt twoto3o5, #0x8406
+
+ @ finish extracting xmantissa
+ orr xmantissa, xmantissa, x_lw, lsr #20
+
+ @ begin preparing a mask for normalization
+ vmov.i64 HIGH_WORD_MASK, #0xffffffff00000000
+
+ @ double_1 = (double) 1.0
+ vmov_f64 double_1, #0x70
+
+ cmp xmantissa, twoto1o5
+
+ vshl.i64 EXPONENT_MASK, HIGH_WORD_MASK, #20
+ vshr.u64 int_1, HIGH_WORD_MASK, #63
+
+ adr literals, .LliteralTable
+
+ bhi .Lxgt2to1over5
+ @ zero out lg2 constant term if don't offset our input
+ vsub.f64 lg2const, lg2const, lg2const
+ b .Lxle2to1over5
+
+.Lxgt2to1over5:
+ @ if normalized x > 2^(1/5), bp = 1 + (2^(2/5)-1) = 2^(2/5)
+ vadd.f64 bp, bp, bpa
+
+.Lxle2to1over5:
+ @ will need ln2 for various things
+ vldr ln2, .Lln2
+
+ cmp xmantissa, twoto3o5
+@@@@ X Value Normalization @@@@
+
+ @ ss = abs(x) 2^(-1024)
+ vbic.i64 ss, x, EXPONENT_MASK
+
+ @ N = (floor(log2(x)) + 0x3ff) * 2^52
+ vand.i64 int_n, x, EXPONENT_MASK
+
+ bls .Lxle2to3over5
+ @ if normalized x > 2^(3/5), bp = 2^(2/5) + (2^(4/5) - 2^(2/5) = 2^(4/5)
+ vadd.f64 bp, bp, bpb
+ vadd.f64 lg2const, lg2const, lg2const
+
+.Lxle2to3over5:
+
+ @ load log2 polynomial series constants
+ vldm literals!, {k4, k3, k2, k1}
+
+ @ s = abs(x) 2^(-floor(log2(x))) (normalize abs(x) to around 1)
+ vorr.i64 ss, ss, double_1
+
+@@@@ 3/2 (Log(bp(1+s)/(1-s))) input computation (s = (x-bp)/(x+bp)) @@@@
+
+ vsub.f64 u, ss, bp
+ vadd.f64 v, ss, bp
+
+ @ s = (x-1)/(x+1)
+ vdiv.f64 ss, u, v
+
+ @ load 2/(3log2) into lg2coeff
+ vldr lg2coeff, .Ltwooverthreeln2
+
+ @ N = floor(log2(x)) * 2^52
+ vsub.i64 int_n, int_n, double_1
+
+@@@@ 3/2 (Log(bp(1+s)/(1-s))) polynomial series @@@@
+
+ @ ss2 = ((x-dp)/(x+dp))^2
+ vmul.f64 ss2, ss, ss
+ @ ylg2x = 3.0
+ vmov_f64 ylg2x, #8
+ vmul.f64 ss4, ss2, ss2
+
+ @ todo: useful later for two-way clamp
+ vmul.f64 lg2coeff, lg2coeff, y
+
+ @ N = floor(log2(x))
+ vshr.s64 int_n, int_n, #52
+
+ @ k3 = ss^2 * L4 + L3
+ vmla.f64 k3, ss2, k4
+
+ @ k1 = ss^2 * L2 + L1
+ vmla.f64 k1, ss2, k2
+
+ @ scale ss by 2/(3 ln 2)
+ vmul.f64 lg2coeff, ss, lg2coeff
+
+ @ ylg2x = 3.0 + s^2
+ vadd.f64 ylg2x, ylg2x, ss2
+
+ vcvt.f64.s32 double_n, int_n_low
+
+ @ k1 = s^4 (s^2 L4 + L3) + s^2 L2 + L1
+ vmla.f64 k1, ss4, k3
+
+ @ add in constant term
+ vadd.f64 double_n, lg2const
+
+ @ ylg2x = 3.0 + s^2 + s^4 (s^4 (s^2 L4 + L3) + s^2 L2 + L1)
+ vmla.f64 ylg2x, ss4, k1
+
+ @ ylg2x = y 2 s / (3 ln(2)) (3.0 + s^2 + s^4 (s^4(s^2 L4 + L3) + s^2 L2 + L1)
+ vmul.f64 ylg2x, lg2coeff, ylg2x
+
+@@@@ Compute input to Exp(s) (s = y(n + log2(x)) - (floor(8 yn + 1)/8 + floor(8 ylog2(x) + 1)/8) @@@@@
+
+ @ mask to extract bit 1 (2^-2 from our fixed-point representation)
+ vshl.u64 twoto1o4mask, int_1, #1
+
+ @ double_n = y * n
+ vmul.f64 double_n, double_n, y
+
+ @ Load 2^(1/4) for later computations
+ vldr twoto1o4, .Ltwoto1o4
+
+ @ either add or subtract one based on the sign of double_n and ylg2x
+ vshr.s64 ylg2x_round_offset, ylg2x, #62
+ vshr.s64 yn_round_offset, double_n, #62
+
+ @ move unmodified y*lg2x into temp space
+ vmov ylg2x_temp, ylg2x
+ @ compute floor(8 y * n + 1)/8
+ @ and floor(8 y (log2(x)) + 1)/8
+ vcvt.s32.f64 ylg2x, ylg2x, #3
+ @ move unmodified y*n into temp space
+ vmov yn_temp, double_n
+ vcvt.s32.f64 double_n, double_n, #3
+
+ @ load exp polynomial series constants
+ vldm literals!, {k8, k7, k6, k5, k4, k3, k2, k1}
+
+ @ mask to extract bit 2 (2^-1 from our fixed-point representation)
+ vshl.u64 twoto1o2mask, int_1, #2
+
+ @ make rounding offsets either 1 or -1 instead of 0 or -2
+ vorr.u64 ylg2x_round_offset, ylg2x_round_offset, int_1
+ vorr.u64 yn_round_offset, yn_round_offset, int_1
+
+ @ round up to the nearest 1/8th
+ vadd.s32 ylg2x, ylg2x, ylg2x_round_offset
+ vadd.s32 double_n, double_n, yn_round_offset
+
+ @ clear out round-up bit for y log2(x)
+ vbic.s32 ylg2x, ylg2x, int_1
+ @ clear out round-up bit for yn
+ vbic.s32 double_n, double_n, int_1
+ @ add together the (fixed precision) rounded parts
+ vadd.s64 rounded_exponent, double_n, ylg2x
+ @ turn int_n into a double with value 2^int_n
+ vshl.i64 int_n, rounded_exponent, #49
+ @ compute masks for 2^(1/4) and 2^(1/2) fixups for fractional part of fixed-precision rounded values:
+ vand.u64 twoto1o4mask, twoto1o4mask, rounded_exponent
+ vand.u64 twoto1o2mask, twoto1o2mask, rounded_exponent
+
+ @ convert back into floating point, double_n now holds (double) floor(8 y * n + 1)/8
+ @ ylg2x now holds (double) floor(8 y * log2(x) + 1)/8
+ vcvt.f64.s32 ylg2x, ylg2x, #3
+ vcvt.f64.s32 double_n, double_n, #3
+
+ @ put the 2 bit (0.5) through the roof of twoto1o2mask (make it 0x0 or 0xffffffffffffffff)
+ vqshl.u64 twoto1o2mask, twoto1o2mask, #62
+ @ put the 1 bit (0.25) through the roof of twoto1o4mask (make it 0x0 or 0xffffffffffffffff)
+ vqshl.u64 twoto1o4mask, twoto1o4mask, #63
+
+ @ center y*log2(x) fractional part between -0.125 and 0.125 by subtracting (double) floor(8 y * log2(x) + 1)/8
+ vsub.f64 ylg2x_temp, ylg2x_temp, ylg2x
+ @ center y*n fractional part between -0.125 and 0.125 by subtracting (double) floor(8 y * n + 1)/8
+ vsub.f64 yn_temp, yn_temp, double_n
+
+ @ Add fractional parts of yn and y log2(x) together
+ vadd.f64 ss, ylg2x_temp, yn_temp
+
+ @ Result = 1.0 (offset for exp(s) series)
+ vmov_f64 Result, #0x70
+
+ @ multiply fractional part of y * log2(x) by ln(2)
+ vmul.f64 ss, ln2, ss
+
+@@@@ 10th order polynomial series for Exp(s) @@@@
+
+ @ ss2 = (ss)^2
+ vmul.f64 ss2, ss, ss
+
+ @ twoto1o2mask = twoto1o2mask & twoto1o4
+ vand.u64 twoto1o2mask, twoto1o2mask, twoto1o4
+ @ twoto1o2mask = twoto1o2mask & twoto1o4
+ vand.u64 twoto1o4mask, twoto1o4mask, twoto1o4
+
+ @ Result = 1.0 + ss
+ vadd.f64 Result, Result, ss
+
+ @ k7 = ss k8 + k7
+ vmla.f64 k7, ss, k8
+
+ @ ss4 = (ss*ss) * (ss*ss)
+ vmul.f64 ss4, ss2, ss2
+
+ @ twoto1o2mask = twoto1o2mask | (double) 1.0 - results in either 1.0 or 2^(1/4) in twoto1o2mask
+ vorr.u64 twoto1o2mask, twoto1o2mask, double_1
+ @ twoto1o2mask = twoto1o4mask | (double) 1.0 - results in either 1.0 or 2^(1/4) in twoto1o4mask
+ vorr.u64 twoto1o4mask, twoto1o4mask, double_1
+
+ @ TODO: should setup sign here, expadjustment = 1.0
+ vmov_f64 expadjustment, #0x70
+
+ @ ss3 = (ss*ss) * ss
+ vmul.f64 ss3, ss2, ss
+
+ @ k0 = 1/2 (first non-unity coefficient)
+ vmov_f64 k0, #0x60
+
+ @ Mask out non-exponent bits to make sure we have just 2^int_n
+ vand.i64 int_n, int_n, EXPONENT_MASK
+
+ @ square twoto1o2mask to get 1.0 or 2^(1/2)
+ vmul.f64 twoto1o2mask, twoto1o2mask, twoto1o2mask
+ @ multiply twoto2o4mask into the exponent output adjustment value
+ vmul.f64 expadjustment, expadjustment, twoto1o4mask
+
+ @ k5 = ss k6 + k5
+ vmla.f64 k5, ss, k6
+
+ @ k3 = ss k4 + k3
+ vmla.f64 k3, ss, k4
+
+ @ k1 = ss k2 + k1
+ vmla.f64 k1, ss, k2
+
+ @ multiply twoto1o2mask into exponent output adjustment value
+ vmul.f64 expadjustment, expadjustment, twoto1o2mask
+
+ @ k5 = ss^2 ( ss k8 + k7 ) + ss k6 + k5
+ vmla.f64 k5, ss2, k7
+
+ @ k1 = ss^2 ( ss k4 + k3 ) + ss k2 + k1
+ vmla.f64 k1, ss2, k3
+
+ @ Result = 1.0 + ss + 1/2 ss^2
+ vmla.f64 Result, ss2, k0
+
+ @ Adjust int_n so that it's a double precision value that can be multiplied by Result
+ vadd.i64 expadjustment, int_n, expadjustment
+
+ @ k1 = ss^4 ( ss^2 ( ss k8 + k7 ) + ss k6 + k5 ) + ss^2 ( ss k4 + k3 ) + ss k2 + k1
+ vmla.f64 k1, ss4, k5
+
+ @ Result = 1.0 + ss + 1/2 ss^2 + ss^3 ( ss^4 ( ss^2 ( ss k8 + k7 ) + ss k6 + k5 ) + ss^2 ( ss k4 + k3 ) + ss k2 + k1 )
+ vmla.f64 Result, ss3, k1
+
+ @ multiply by adjustment (sign*(rounding ? sqrt(2) : 1) * 2^int_n)
+ vmul.f64 Result, expadjustment, Result
+
+.LleavePow:
+.LleavePowDirect:
+ @ leave directly returning whatever is in Return_lw and Return_hw
+ pop {r4, r5, r6, r7, r8, r9, r10, pc}
+
+.align 6
+.LliteralTable:
+@ Least-sqares tuned constants for 11th order (log2((1+s)/(1-s)):
+.LL4: @ ~3/11
+ .long 0x53a79915, 0x3fd1b108
+.LL3: @ ~1/3
+ .long 0x9ca0567a, 0x3fd554fa
+.LL2: @ ~3/7
+ .long 0x1408e660, 0x3fdb6db7
+.LL1: @ ~3/5
+ .long 0x332D4313, 0x3fe33333
+
+@ Least-squares tuned constants for 10th order exp(s):
+.LE10: @ ~1/3628800
+ .long 0x25c7ba0a, 0x3e92819b
+.LE9: @ ~1/362880
+ .long 0x9499b49c, 0x3ec72294
+.LE8: @ ~1/40320
+ .long 0xabb79d95, 0x3efa019f
+.LE7: @ ~1/5040
+ .long 0x8723aeaa, 0x3f2a019f
+.LE6: @ ~1/720
+ .long 0x16c76a94, 0x3f56c16c
+.LE5: @ ~1/120
+ .long 0x11185da8, 0x3f811111
+.LE4: @ ~1/24
+ .long 0x5555551c, 0x3fa55555
+.LE3: @ ~1/6
+ .long 0x555554db, 0x3fc55555
+
+