SQL Server Function: TitleCase
T-SQL TitleCase Function: Beyond the Obvious
Periodically, you may need to convert text into mixed-case in SQL Server. I needed it because our legacy database stores all our customer names in upper case. For example, one of our customers might look like the following:
- Legacy sample customer: “ACME CLIENT, LLC”
- Desired customer name: “Acme Client, LLC”
Anyone can write a simplistic title-case function pretty quickly. I enhanced mine to handle the following special cases:
- State initials ID/WA/OR/WY/UT/NV/MT are capitalized
Select dbo.TitleCase(‘This is id and it is better than mT or Wy’)
This Is ID And It Is Better Than MT OR WY
- Leaves most 3-letter abbreviations capitalized
Select dbo.TitleCase(‘RYO CORRECTIONAL FACILITY’)
RYO Correctional Facility
- Removes repeated spaces
Select dbo.TitleCase(‘Had double spaces’)
Had Double Spaces
- Preserves special characters
Select dbo.TitleCase(‘Has special character ‘ + NChar(8631) + ‘ embedded’)
Has Special Character ↷ Embedded
- Runs reasonably fast (but would run faster if we built a table to hold my list of 3-letter English words)
In order to preserve 3-letter abbreviations, I had to get a list of English words. When my function encounters a 3-letter word, it checks to see if exists in my list. If so, convert to mixed case; otherwise leave all upper case.
Here’s the script
-- Author: ActualRandy
-- Create date: 1/06/2016
-- Description: Sends a string to Title Case while preserving abbreviations, also cleans-up repeated spaces
/*
Usage:
SELECT dbo.TitleCase('This has "quotes" in it. AND IS MIXED upper AND LOWER CASE.')
Select dbo.TitleCase('THIS IS WA')
Select dbo.TitleCase('Has special character ' + NChar(8631) + ' embedded')
*/
ALTER FUNCTION [dbo].[TitleCase] (@InputString NVARCHAR(4000) )
RETURNS NVARCHAR(4000)
AS
BEGIN
DECLARE @Index INT --Position within the input string
DECLARE @Char NCHAR(1) --The character we are examining
Declare @Word NChar(3) --3-letter word that might need to be upper case
DECLARE @OutputString NVARCHAR(4000) = '' --What we return
Declare @IsStartOfWord TinyInt = 1 --When 0, we are in the middle of a word
Declare @StateAbbrv NChar(2) --2-Letter string that might be a state
Declare @LastChar NChar(1) = '' --Previous character we examined
If @InputString Is Not Null
Begin
Set @Index = 1
While @Index <= Len(@InputString)
Begin
Set @Char = Substring(@InputString, @Index, 1)
--If we found repeated spaces...
If Not (@Char = ' ' And @LastChar = ' ')
Begin
If @Char = ''''
Begin
Set @OutputString = @OutputString + @char
End
Else If @Char in (' ', ';', ':', '!', '?', ',', '.', '_', '-', '/', '&', '(' )
Begin
Set @IsStartOfWord = 1
Set @OutputString = @OutputString + @char
End
Else
Begin
If @IsStartOfWord = 1
Begin
--Are we 2 letters from the end?
If (@Index = Len(@InputString) - 1)
Or
--Are we more than 2 letters from end of InputString, but just 2 letters from end of the current word?
(
@Index < Len(@InputString) - 1
And
SUBSTRING(@InputString, @Index + 2, 1) in (' ', ';', ':', '!', '?', ',', '.', '_', '-', '/', '&', '(' )
)
Begin
Set @StateAbbrv = Substring(@InputString, @Index, 2)
If @StateAbbrv in ('ID', 'OR', 'WA', 'MT', 'NV','UT', 'WY')
Begin
Set @OutputString = @OutputString + Upper(@StateAbbrv)
Set @Index = @Index + 1
Set @IsStartOfWord = 1
End
Else
Begin
--We have a 2-letter word, but it is not a state abbrv
Set @OutputString = @OutputString + Upper(@char)
Set @IsStartOfWord = 0
End
End
--Are we looking at a 3-letter word?, i.e. are we within
-- the string and the 4th character hence is not a letter
Else If (@Index <= Len(@InputString) - 2)
And
SUBSTRING(@InputString, @Index + 3, 1) in (' ', ';', ':', '!', '?', ',', '.', '_', '-', '/', '&', '(' )
Begin
Set @Word = SUBSTRING(@InputString, @Index, 3)
--If it is a 3-letter word in English
--For brevity, I shrunk-down my list of words, which is actually quite huge
If @Word In ('aah', 'ace','act','add','adj','ado','zoo')
Begin
--Set it to normal casing because it is an english word:
--uppercase on letter 1 and lowercase for the last 2:
Set @OutputString = @OutputString + Upper(@Char) + Lower(Substring(@InputString, @Index + 1, 2))
Set @IsStartOfWord = 0
Set @Index = @Index + 2
End
Else
--Probably some initials or abbreviation; i.e. a 3-letter string, but not an English word
Begin
Set @OutputString = @OutputString + Upper(@Word)
Set @Index = @Index + 2
End
End
Else
Begin
--We are at the beginning of a word, and it is not state abbrv or 3-letter word
Set @OutputString = @OutputString + Upper(@char)
Set @IsStartOfWord = 0
End
End
Else
--Not beginning of a word, so send it to lower case
Begin
Set @OutputString = @OutputString + Lower(@Char)
End
End
End
Set @LastChar = @Char
Set @Index = @Index + 1
End
End
RETURN @OutputString
END
Comments
The method makes a single pass through your string, so it is reasonably efficient in that regard. Of course, the slowest part of the function is determining whether a 3-letter word is an abbreviation or an English word. If speed is crucial, you can save all the words in a table and put an index on it. As I said above, it runs reasonably fast. When I tested it, it converted roughly 20,000 names in one second.
Usage
Because it is a function, you can use it in any query you use for moving data around. It works great when you have data from older sources that is all upper case. Naturally, my function is not perfect. One enhancement you could make would be to insert handling for 4-letter words, since I have seen some company names with 4-letter abbreviations.