Here, we discuss the process of converting unsymmetrical data from a PDF file into a well-organized format in a spreadsheet or Excel. When copying and pasting data into Google Sheets or Excel, we often expect it to align with the desired format. However, in many cases, the data appears disorganized, as illustrated in the accompanying image.

Organizing Unsymmetrical PDF Data in Google Sheets
To resolve this, we apply a combination of Google Sheets formulas to clean, format, and align the data correctly. These formulas help transform the raw, pasted content into a structured format that meets our requirements.
For instance, splitting data using a custom delimiter, such as a space, may result in unsymmetrical output. The values can become misaligned, failing to correspond to the appropriate headers in each row. This inconsistency often requires additional manual adjustments to structure the data properly. Below, we explore formula-based methods to address such issues effectively.

Here’s a practice sheet to help you explore and understand the process better. Click the link below to access it. To edit, make a copy and save it for your use.
https://docs.google.com/spreadsheets/d/1XQuuIK7zoMmlKbd7wGxv4KKZ5RAkQ7eefTt9lbB-FJA/edit?gid=0#gid=0
To convert PDF file data into a well-formatted table in Google Sheets without losing formatting, especially for table data, here’s a structured approach combining built-in formulas, advanced tools, and manual adjustments:
How to Convert Unsymmetrical PDF Data to Google Sheets
When you paste copied data from a PDF into Google Sheets (e.g., column Q), you may encounter unsymmetrical results where the data appears unformatted or crammed into single cells like below.
Q1:Q4 | Q |
---|---|
1 | 👇 | Paste PDF Data here | 👇 |
2 | 1 2021110848G Kajol Gobar ASSISTANT 70000 38700 108700 51861 56839 300.00 0 0 56539 |
3 | 3 2021311149S Alkesh Dindral SENIOR ASSISTANT 68000 37700 105700 50412 55288 300.00 0 0 54988 |
4 | 2 2021110866L Raj Rani Devi Supdt Gd-II 83600 60548 144148 54060 90088 300.00 0 0 89788 |
Step-by-Step Guide Using Google Sheets Formulas
The following formulas combination can be used to manipulate, clean, and structure data transferred from a PDF to a spreadsheet, ensuring the final result is well-organized and meets your desired format:
- IF(logical_expression, value_if_true, value_if_false)
- LEN(text)
- FIND(search_for, text_to_search, [starting_at])
- TRIM(text)
- SPLIT(text, delimiter)
- RIGHT(text, [num_chars])
- QUERY(data, query, [headers])
- LOWER(text)
- IFERROR(value, [value_if_error])
- TEXTJOIN(delimiter, ignore_empty, text1, [text2], …)
- SUBSTITUTE(text_to_search, search_for, replace_with, [occurrence_number])
- REGEXREPLACE(text, regular_expression, replacement)
- REGEXEXTRACT(text, regular_expression)
Step 1: REGEXEXTRACT, REGEXREPLACE and Combination of IF, LEN, SPLIT and LOWER Function
Syntax:
=REGEXEXTRACT(text, regular_expression)
=REGEXREPLACE(text, regular_expression, replacement)
=LEN(text)
=SPLIT(text, delimiter)=lower(text)
Formula
=REGEXEXTRACT(Q3,"[[:alpha:]]+")
=IFERROR( REGEXEXTRACT(Q3,"[[:alpha:]]+"))
=REGEXREPLACE(Q3,"[^a-zA-Z]", "")
=IFERROR( IF(LEN(T3)=1,REGEXREPLACE(Q3,"[^a-zA-Z]", ""),""))
=IFERROR( IF( LEN(T3)=1, SPLIT( lower(Q3),"qwertyuiopasdfghjklzxcvbnm`-=[]\;',./!@#$%^&*()"),""))

To extract only letter sequences from a text string, the pattern "[[:alpha:]]+"
captures those sequences, while the pattern [^a-zA-Z]
matches non-alphabetic characters. The LOWER
function converts all letters in the text to lowercase. The LEN
function counts the total characters in a text string, including spaces. The SPLIT
function divides the text into parts based on a chosen delimiter, such as a comma or space.
Step 2: IF, RIGHT, LEN, FIND & SUBSTITUTE, (IF, RIGHT & LEN) and (TRIM &TEXTJOIN)
=IF(V3="","",( RIGHT(V3,LEN(V3)-FIND("*",SUBSTITUTE(V3," ","*",LEN(V3)-LEN(SUBSTITUTE(V3," ",""))))))&""&T3)
=IF(U3="","", RIGHT(U3, LEN(U3)-1))
=TRIM( TEXTJOIN(" ",1,W3:AH3))

- The
IF
formula checks a condition and returns one value if the condition is true and another if it’s false. FIND
locates the position of a substring within a text string, returning the starting character’s position. It’s case-sensitive and often used to extract or manipulate parts of a string.- The
RIGHT
function extracts a specified number of characters from the end of a text string. It’s helpful for isolating portions of a string, such as the last few digits of a code or ID. SUBSTITUTE
replaces occurrences of a specific substring within a text string with another substring. It’s useful for modifying text, such as correcting or updating specific parts of a string.TRIM
removes all leading, trailing, and excess spaces from a text string, leaving only single spaces between words.
Step 3: TEXTJOIN
TEXTJOIN
combines multiple text strings into one, using a specified delimiter to separate them. Here cells in the range AM3:AO3
are simplifies the process of combining text, especially when working with ranges that may contain empty cells.
=TEXTJOIN(" ",1,AM3:AO3)

The formula would output "2021110848G KajolGobarASSISTANT 70000 38700 108700 51861 56839 300 00 0 0 56539"
in the cell where the formula is placed.
Q1:Q12 | Q | AQ |
---|---|---|
3 | 1 2021110848G Kajol Gobar ASSISTANT 70000 38700 108700 51861 56839 300.00 0 0 56539 | 2021110848G KajolGobarASSISTANT 70000 38700 108700 51861 56839 300 00 0 0 56539 |
Step 4: IFERROR and SPLIT
The formula uses ArrayFormula
to split text from the range AQ3:AQ
into columns wherever there’s a space, handling multiple rows at once. IFERROR
ensures errors from empty or invalid cells display as blank. Sufficient output space is required to avoid truncation or overwriting.
=ArrayFormula(IFERROR( SPLIT('Convert PDF to Sheet'!AQ3:AQ," ",0,0)))

The formula is valid and effectively splits text into separate parts, provided the output range is adequately prepared.
Step 5: QUERY
The final output process involves converting data from the PDF into Google Sheets and utilizing the QUERY
function for advanced operations. This function simplifies filtering, sorting, and summarizing the extracted data, organizing it into a clean and structured format for analysis.
Syntax:
=
QUERY
(data, query, [headers])
Formula:
=QUERY({'Converted Data Sheet'!E3:Q},"Select * where Col3 is not null",0)
A2:M3 | A | B | C | D | E | F | G | H | IJ | K | L | M | N |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | Sr.no. | emp-id | Name and Rank | Basic Pay | Alwn. | Gross | Deduct | Sub Net | NG Rec | – | O. Gpf | O. Gis | e- Salary |
3 | 1 | 2021110848G | KajolGobarASSISTANT | 70000 | 38700 | 108700 | 51861 | 56839 | 300 | 0 | 0 | 0 | 56539 |
For simple PDFs, data can often be copied directly into Google Sheets without losing formatting. However, complex PDFs with intricate layouts may not translate well through copy-pasting, resulting in altered or lost formatting.
In such cases, the methods and functions in Google Sheets can help with structured data, while specialized online tools are recommended for handling complex tables and ensuring precise extraction.
If you have any questions, need assistance, or encounter challenges, feel free to share your concerns in the comments. The CountLen team is dedicated to providing quick and effective solutions. If you notice any inaccuracies or misleading information, please provide feedback—we’re here to support you!