Tag: ascii

  • Remove ^M characters and more with repl.bash

    Hey folks, this is a goody but quicky.

    First off, respect the character encoding of a file. I don’t know how many devs out there violate this rule, but if you’re like me and Joel On Software, you’ll agree that you should respect the character encoding of a file.

    If you happen to see that your file has gotten code page 1252 aka Windows-Latin 1 in it, then you’ll have a variety of random characters like ^M or ?~@~Y or ?~@~\ or ?~@~] .

    Well, I wrote a script that removes these guys and makes sure that the file format of Unix is respected. Here it is:

    #!/bin/bash
    #
    # By: barce[a t]codebelay.com
    # ——————-
    # this script replaces microsoft special chars with plain ol’ ascii
    #
    # usage: ./repl.bash filename
    #

    # replace ^M characters
    perl -pi -e ‘s/\x{0D}\x{0A}/\x{0A}/g’ $1

    # replace garbage with single-quotes
    # ?~@~Y
    perl -pi -e ‘s/\x{E2}\x{80}\x{99}/\x{27}/g’ $1
    perl -pi -e ‘s/\x{80}\x{99}/\x{27}/g’ $1
    perl -pi -e ‘s/\x{80}\x{9c}/\x{27}/g’ $1
    perl -pi -e ‘s/\x{80}\x{9d}/\x{27}/g’ $1

    # replace garbage with asterisk
    # ?~@?
    # e280 a2
    perl -pi -e ‘s/\x{E2}\x{80}\x{A2}/\x{2A}/g’ $1

    # replace garbage quotes with plain quotes
    # start: ?~@~\
    # close: ?~@~]
    # e2 809c
    perl -pi -e ‘s/\x{E2}\x{80}\x{9C}/\x{22}/g’ $1
    perl -pi -e ‘s/\x{E2}\x{80}\x{9D}/\x{22}/g’ $1

    # replace garbage hyphens with plain hyphens
    perl -pi -e ‘s/\x{E2}\x{80}\x{93}/\x{2D}/g’ $1

    # replace garbage with ellipsis
    perl -pi -e ‘s/\x{E2}\x{80}\x{A6}/\x{2E}\x{2E}\x{2E}/g’ $1